module documentation

Dataset importers.

Copyright 2017-2025, Voxel51, Inc.

Class BatchDatasetImporter Base interface for importers that load all of their samples in a single call to import_samples.
Class DatasetImporter Base interface for importing datasets stored on disk into FiftyOne.
Class FiftyOneDatasetImporter Importer for FiftyOne datasets stored on disk in serialized JSON format.
Class FiftyOneImageClassificationDatasetImporter Importer for image classification datasets stored on disk in a simple JSON format.
Class FiftyOneImageDetectionDatasetImporter Importer for image detection datasets stored on disk in a simple JSON format.
Class FiftyOneImageLabelsDatasetImporter Importer for labeled image datasets whose labels are stored in ETA ImageLabels format.
Class FiftyOneTemporalDetectionDatasetImporter Importer for temporal video detection datasets stored on disk in a simple JSON format.
Class FiftyOneVideoLabelsDatasetImporter Importer for labeled video datasets whose labels are stored in ETA VideoLabels format.
Class GenericSampleDatasetImporter Interface for importing datasets that contain arbitrary fiftyone.core.sample.Sample instances.
Class GroupDatasetImporter Interface for importing datasets that contain arbitrary grouped fiftyone.core.sample.Sample instances.
Class ImageClassificationDirectoryTreeImporter Importer for an image classification directory tree stored on disk.
Class ImageDirectoryImporter Importer for a directory of images stored on disk.
Class ImageSegmentationDirectoryImporter Importer for image segmentation datasets stored on disk.
Class ImportPathsMixin Mixin for DatasetImporter classes that provides convenience methods for parsing the data_path and labels_path parameters supported by many importers.
Class LabeledImageDatasetImporter Interface for importing datasets of labeled image samples.
Class LabeledVideoDatasetImporter Interface for importing datasets of labeled video samples.
Class LegacyFiftyOneDatasetImporter Legacy importer for FiftyOne datasets stored on disk in a serialized JSON format.
Class MediaDirectoryImporter Importer for a directory of media files stored on disk.
Class UnlabeledImageDatasetImporter Interface for importing datasets of unlabeled image samples.
Class UnlabeledMediaDatasetImporter Interface for importing datasets of unlabeled media samples.
Class UnlabeledVideoDatasetImporter Interface for importing datasets of unlabeled video samples.
Class VideoClassificationDirectoryTreeImporter Importer for a viideo classification directory tree stored on disk.
Class VideoDirectoryImporter Importer for a directory of videos stored on disk.
Function build_dataset_importer Builds the DatasetImporter instance for the given parameters.
Function import_samples Adds the samples from the given DatasetImporter to the dataset.
Function merge_samples Merges the samples from the given DatasetImporter into the dataset.
Function parse_dataset_info Parses the info returned by DatasetImporter.get_dataset_info and stores it on the relevant properties of the dataset.
Variable logger Undocumented
Function _build_parse_sample_fcn Undocumented
Function _generate_group_samples Undocumented
Function _get_rng Undocumented
Function _handle_legacy_formats Undocumented
Function _import_runs Undocumented
Function _import_saved_views Undocumented
Function _import_workspaces Undocumented
Function _load_labeled_dataset_index Undocumented
Function _parse_media_fields Undocumented
Function _parse_nested_media_field Undocumented
Function _set_created_at Undocumented
Function _to_list Undocumented
Function _update_no_overwrite Undocumented
def build_dataset_importer(dataset_type, strip_none=True, warn_unused=True, name=None, **kwargs): (source)

Builds the DatasetImporter instance for the given parameters.

Parameters
dataset_typethe fiftyone.types.Dataset type
strip_none:Truewhether to exclude None-valued items from kwargs
warn_unused:Truewhether to issue warnings for any non-None unused parameters encountered
name:Nonethe name of the dataset being imported into, if known
**kwargskeyword arguments to pass to the dataset importer's constructor via DatasetImporter(**kwargs)
Returns
a tuple of
def import_samples(dataset, dataset_importer, label_field=None, tags=None, expand_schema=True, dynamic=False, add_info=True, progress=None): (source)

Adds the samples from the given DatasetImporter to the dataset.

See :ref:`this guide <custom-dataset-importer>` for more details about importing datasets in custom formats by defining your own DatasetImporter.

Parameters
dataseta fiftyone.core.dataset.Dataset
dataset_importera DatasetImporter
label_field:Nonecontrols the field(s) in which imported labels are stored. Only applicable if dataset_importer is a LabeledImageDatasetImporter or LabeledVideoDatasetImporter. If the importer produces a single fiftyone.core.labels.Label instance per sample/frame, this argument specifies the name of the field to use; the default is "ground_truth". If the importer produces a dictionary of labels per sample, this argument can be either a string prefix to prepend to each label key or a dict mapping label keys to field names; the default in this case is to directly use the keys of the imported label dictionaries as field names
tags:Nonean optional tag or iterable of tags to attach to each sample
expand_schema:Truewhether to dynamically add new sample fields encountered to the dataset schema. If False, an error is raised if a sample's schema is not a subset of the dataset schema
dynamic:Falsewhether to declare dynamic attributes of embedded document fields that are encountered
add_info:Truewhether to add dataset info from the importer (if any) to the dataset
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
Returns
a list of IDs of the samples that were added to the dataset
def merge_samples(dataset, dataset_importer, label_field=None, tags=None, key_field='filepath', key_fcn=None, skip_existing=False, insert_new=True, fields=None, omit_fields=None, merge_lists=True, overwrite=True, expand_schema=True, dynamic=False, add_info=True, progress=None): (source)

Merges the samples from the given DatasetImporter into the dataset.

See :ref:`this guide <custom-dataset-importer>` for more details about importing datasets in custom formats by defining your own DatasetImporter.

By default, samples with the same absolute filepath are merged, but you can customize this behavior via the key_field and key_fcn parameters. For example, you could set key_fcn = lambda sample: os.path.basename(sample.filepath) to merge samples with the same base filename.

The behavior of this method is highly customizable. By default, all top-level fields from the imported samples are merged in, overwriting any existing values for those fields, with the exception of list fields (e.g., tags) and label list fields (e.g., fiftyone.core.labels.Detections fields), in which case the elements of the lists themselves are merged. In the case of label list fields, labels with the same id in both collections are updated rather than duplicated.

To avoid confusion between missing fields and fields whose value is None, None-valued fields are always treated as missing while merging.

This method can be configured in numerous ways, including:

  • Whether existing samples should be modified or skipped
  • Whether new samples should be added or omitted
  • Whether new fields can be added to the dataset schema
  • Whether list fields should be treated as ordinary fields and merged as a whole rather than merging their elements
  • Whether to merge only specific fields, or all but certain fields
  • Mapping input fields to different field names of this dataset
Parameters
dataseta fiftyone.core.dataset.Dataset
dataset_importera DatasetImporter
label_field:Nonecontrols the field(s) in which imported labels are stored. Only applicable if dataset_importer is a LabeledImageDatasetImporter or LabeledVideoDatasetImporter. If the importer produces a single fiftyone.core.labels.Label instance per sample/frame, this argument specifies the name of the field to use; the default is "ground_truth". If the importer produces a dictionary of labels per sample, this argument can be either a string prefix to prepend to each label key or a dict mapping label keys to field names; the default in this case is to directly use the keys of the imported label dictionaries as field names
tags:Nonean optional tag or iterable of tags to attach to each sample
key_field:"filepath"the sample field to use to decide whether to join with an existing sample
key_fcn:Nonea function that accepts a fiftyone.core.sample.Sample instance and computes a key to decide if two samples should be merged. If a key_fcn is provided, key_field is ignored
skip_existing:Falsewhether to skip existing samples (True) or merge them (False)
insert_new:Truewhether to insert new samples (True) or skip them (False)
fields:Nonean optional field or iterable of fields to which to restrict the merge. If provided, fields other than these are omitted from samples when merging or adding samples. One exception is that filepath is always included when adding new samples, since the field is required. This can also be a dict mapping field names of the input collection to field names of this dataset
omit_fields:Nonean optional field or iterable of fields to exclude from the merge. If provided, these fields are omitted from imported samples, if present. One exception is that filepath is always included when adding new samples, since the field is required
merge_lists:Truewhether to merge the elements of list fields (e.g., tags) and label list fields (e.g., fiftyone.core.labels.Detections fields) rather than merging the entire top-level field like other field types. For label lists fields, existing fiftyone.core.label.Label elements are either replaced (when overwrite is True) or kept (when overwrite is False) when their id matches a label from the provided samples
overwrite:Truewhether to overwrite (True) or skip (False) existing fields and label elements
expand_schema:Truewhether to dynamically add new fields encountered to the dataset schema. If False, an error is raised if a sample's schema is not a subset of the dataset schema
dynamic:Falsewhether to declare dynamic attributes of embedded document fields that are encountered
add_info:Truewhether to add dataset info from the importer (if any) to the dataset
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
def parse_dataset_info(dataset, info, overwrite=True): (source)

Parses the info returned by DatasetImporter.get_dataset_info and stores it on the relevant properties of the dataset.

Parameters
dataseta fiftyone.core.dataset.Dataset
infoan info dict
overwrite:Truewhether to overwrite existing dataset info fields

Undocumented

def _build_parse_sample_fcn(dataset, dataset_importer, label_field, tags, expand_schema, dynamic): (source)

Undocumented

def _generate_group_samples(dataset_importer, parse_sample): (source)

Undocumented

def _get_rng(seed): (source)

Undocumented

def _handle_legacy_formats(dataset_importer): (source)

Undocumented

def _import_runs(dataset, runs, results_dir, run_cls): (source)

Undocumented

def _import_saved_views(dataset, views): (source)

Undocumented

def _import_workspaces(dataset, workspaces): (source)

Undocumented

def _load_labeled_dataset_index(dataset_dir): (source)

Undocumented

def _parse_media_fields(sd, media_fields, rel_dir): (source)

Undocumented

def _parse_nested_media_field(d, media_fields, rel_dir, field_name, key): (source)

Undocumented

def _set_created_at(field_dict, created_at): (source)

Undocumented

def _to_list(arg): (source)

Undocumented

def _update_no_overwrite(d, dnew): (source)

Undocumented