Croissant also aims to make data more easily accessible and discoverable, as it enables datasets to be loaded into different AI platforms without the need for the lengthy process of reformatting. By taking this step, Croissant hopes to spread best practice no matter what platform is used.
This new format is an extension of existing machine-readable standard schema.org, which is used by over 40 million datasets and enables them to be found through industry standard search engines such as Google Dataset Search and integrated into popular ML frameworks used by industry and academia, like TensorFlow and PyTorch.
The Croissant editor also allows practitioners to inspect, create, or modify Croissant descriptions for their dataset, helping to create a standardised format across industries and teams. The format is also receiving support from major repositories of ML data, including Kaggle, OpenML and Hugging Face.