fiftyone.utils.data.importers

module documentation

(source)

Dataset importers.

Class	`BatchDatasetImporter`	Base interface for importers that load all of their samples in a single call to `import_samples`.
Class	`DatasetImporter`	Base interface for importing datasets stored on disk into FiftyOne.
Class	`FiftyOneDatasetImporter`	Importer for FiftyOne datasets stored on disk in serialized JSON format.
Class	`FiftyOneImageClassificationDatasetImporter`	Importer for image classification datasets stored on disk in a simple JSON format.
Class	`FiftyOneImageDetectionDatasetImporter`	Importer for image detection datasets stored on disk in a simple JSON format.
Class	`FiftyOneImageLabelsDatasetImporter`	Importer for labeled image datasets whose labels are stored in ETA ImageLabels format.
Class	`FiftyOneTemporalDetectionDatasetImporter`	Importer for temporal video detection datasets stored on disk in a simple JSON format.
Class	`FiftyOneVideoLabelsDatasetImporter`	Importer for labeled video datasets whose labels are stored in ETA VideoLabels format.
Class	`GenericSampleDatasetImporter`	Interface for importing datasets that contain arbitrary `fiftyone.core.sample.Sample` instances.
Class	`GroupDatasetImporter`	Interface for importing datasets that contain arbitrary grouped `fiftyone.core.sample.Sample` instances.
Class	`ImageClassificationDirectoryTreeImporter`	Importer for an image classification directory tree stored on disk.
Class	`ImageDirectoryImporter`	Importer for a directory of images stored on disk.
Class	`ImageSegmentationDirectoryImporter`	Importer for image segmentation datasets stored on disk.
Class	`ImportPathsMixin`	Mixin for `DatasetImporter` classes that provides convenience methods for parsing the `data_path` and `labels_path` parameters supported by many importers.
Class	`LabeledImageDatasetImporter`	Interface for importing datasets of labeled image samples.
Class	`LabeledVideoDatasetImporter`	Interface for importing datasets of labeled video samples.
Class	`LegacyFiftyOneDatasetImporter`	Legacy importer for FiftyOne datasets stored on disk in a serialized JSON format.
Class	`MediaDirectoryImporter`	Importer for a directory of media files stored on disk.
Class	`UnlabeledImageDatasetImporter`	Interface for importing datasets of unlabeled image samples.
Class	`UnlabeledMediaDatasetImporter`	Interface for importing datasets of unlabeled media samples.
Class	`UnlabeledVideoDatasetImporter`	Interface for importing datasets of unlabeled video samples.
Class	`VideoClassificationDirectoryTreeImporter`	Importer for a viideo classification directory tree stored on disk.
Class	`VideoDirectoryImporter`	Importer for a directory of videos stored on disk.
Function	`build_dataset_importer`	Builds the `DatasetImporter` instance for the given parameters.
Function	`import_samples`	Adds the samples from the given `DatasetImporter` to the dataset.
Function	`merge_samples`	Merges the samples from the given `DatasetImporter` into the dataset.
Function	`parse_dataset_info`	Parses the info returned by `DatasetImporter.get_dataset_info` and stores it on the relevant properties of the dataset.
Variable	`logger`	Undocumented
Function	`_build_parse_sample_fcn`	Undocumented
Function	`_generate_group_samples`	Undocumented
Function	`_get_rng`	Undocumented
Function	`_handle_legacy_formats`	Undocumented
Function	`_import_runs`	Undocumented
Function	`_import_saved_views`	Undocumented
Function	`_import_workspaces`	Undocumented
Function	`_load_labeled_dataset_index`	Undocumented
Function	`_parse_media_fields`	Undocumented
Function	`_parse_nested_media_field`	Undocumented
Function	`_set_created_at`	Undocumented
Function	`_to_list`	Undocumented
Function	`_update_no_overwrite`	Undocumented

def build_dataset_importer(dataset_type, strip_none=True, warn_unused=True, name=None, **kwargs): (source) ¶

Builds the DatasetImporter instance for the given parameters.

Parameters
dataset_type	the `fiftyone.types.Dataset` type
strip_none:`True`	whether to exclude None-valued items from `kwargs`
warn_unused:`True`	whether to issue warnings for any non-None unused parameters encountered
name:`None`	the name of the dataset being imported into, if known
**kwargs	keyword arguments to pass to the dataset importer's constructor via `DatasetImporter(**kwargs)`
Returns
a tuple of	the `DatasetImporter` instance a dict of unused keyword arguments

def import_samples(dataset, dataset_importer, label_field=None, tags=None, expand_schema=True, dynamic=False, add_info=True, generator=False, progress=None): (source) ¶

Adds the samples from the given DatasetImporter to the dataset.

See :ref:`this guide <custom-dataset-importer>` for more details about importing datasets in custom formats by defining your own DatasetImporter.

Parameters
dataset	a `fiftyone.core.dataset.Dataset`
dataset_importer	a `DatasetImporter`
label_field:`None`	controls the field(s) in which imported labels are stored. Only applicable if `dataset_importer` is a `LabeledImageDatasetImporter` or `LabeledVideoDatasetImporter`. If the importer produces a single `fiftyone.core.labels.Label` instance per sample/frame, this argument specifies the name of the field to use; the default is `"ground_truth"`. If the importer produces a dictionary of labels per sample, this argument can be either a string prefix to prepend to each label key or a dict mapping label keys to field names; the default in this case is to directly use the keys of the imported label dictionaries as field names
tags:`None`	an optional tag or iterable of tags to attach to each sample
expand_schema:`True`	whether to dynamically add new sample fields encountered to the dataset schema. If False, an error is raised if a sample's schema is not a subset of the dataset schema
dynamic:`False`	whether to declare dynamic attributes of embedded document fields that are encountered
add_info:`True`	whether to add dataset info from the importer (if any) to the dataset
generator:`False`	whether to yield ID batches as a generator as samples are added to the dataset
progress:`None`	whether to render a progress bar (True/False), use the default value `fiftyone.config.show_progress_bars` (None), or a progress callback function to invoke instead
Returns
a list of IDs of the samples that were added to the dataset

def merge_samples(dataset, dataset_importer, label_field=None, tags=None, key_field='filepath', key_fcn=None, skip_existing=False, insert_new=True, fields=None, omit_fields=None, merge_lists=True, overwrite=True, expand_schema=True, dynamic=False, add_info=True, progress=None): (source) ¶

Merges the samples from the given DatasetImporter into the dataset.

See :ref:`this guide <custom-dataset-importer>` for more details about importing datasets in custom formats by defining your own DatasetImporter.

By default, samples with the same absolute filepath are merged, but you can customize this behavior via the key_field and key_fcn parameters. For example, you could set key_fcn = lambda sample: os.path.basename(sample.filepath) to merge samples with the same base filename.

The behavior of this method is highly customizable. By default, all top-level fields from the imported samples are merged in, overwriting any existing values for those fields, with the exception of list fields (e.g., tags) and label list fields (e.g., fiftyone.core.labels.Detections fields), in which case the elements of the lists themselves are merged. In the case of label list fields, labels with the same id in both collections are updated rather than duplicated.

To avoid confusion between missing fields and fields whose value is None, None-valued fields are always treated as missing while merging.

This method can be configured in numerous ways, including:

Whether existing samples should be modified or skipped
Whether new samples should be added or omitted
Whether new fields can be added to the dataset schema
Whether list fields should be treated as ordinary fields and merged as a whole rather than merging their elements
Whether to merge only specific fields, or all but certain fields
Mapping input fields to different field names of this dataset

Parameters
dataset	a `fiftyone.core.dataset.Dataset`
dataset_importer	a `DatasetImporter`
label_field:`None`	controls the field(s) in which imported labels are stored. Only applicable if `dataset_importer` is a `LabeledImageDatasetImporter` or `LabeledVideoDatasetImporter`. If the importer produces a single `fiftyone.core.labels.Label` instance per sample/frame, this argument specifies the name of the field to use; the default is `"ground_truth"`. If the importer produces a dictionary of labels per sample, this argument can be either a string prefix to prepend to each label key or a dict mapping label keys to field names; the default in this case is to directly use the keys of the imported label dictionaries as field names
tags:`None`	an optional tag or iterable of tags to attach to each sample
key_field:"filepath"	the sample field to use to decide whether to join with an existing sample
key_fcn:`None`	a function that accepts a `fiftyone.core.sample.Sample` instance and computes a key to decide if two samples should be merged. If a `key_fcn` is provided, `key_field` is ignored
skip_existing:`False`	whether to skip existing samples (True) or merge them (False)
insert_new:`True`	whether to insert new samples (True) or skip them (False)
fields:`None`	an optional field or iterable of fields to which to restrict the merge. If provided, fields other than these are omitted from `samples` when merging or adding samples. One exception is that `filepath` is always included when adding new samples, since the field is required. This can also be a dict mapping field names of the input collection to field names of this dataset
omit_fields:`None`	an optional field or iterable of fields to exclude from the merge. If provided, these fields are omitted from imported samples, if present. One exception is that `filepath` is always included when adding new samples, since the field is required
merge_lists:`True`	whether to merge the elements of list fields (e.g., `tags`) and label list fields (e.g., `fiftyone.core.labels.Detections` fields) rather than merging the entire top-level field like other field types. For label lists fields, existing `fiftyone.core.label.Label` elements are either replaced (when `overwrite` is True) or kept (when `overwrite` is False) when their `id` matches a label from the provided samples
overwrite:`True`	whether to overwrite (True) or skip (False) existing fields and label elements
expand_schema:`True`	whether to dynamically add new fields encountered to the dataset schema. If False, an error is raised if a sample's schema is not a subset of the dataset schema
dynamic:`False`	whether to declare dynamic attributes of embedded document fields that are encountered
add_info:`True`	whether to add dataset info from the importer (if any) to the dataset
progress:`None`	whether to render a progress bar (True/False), use the default value `fiftyone.config.show_progress_bars` (None), or a progress callback function to invoke instead

def parse_dataset_info(dataset, info, overwrite=True): (source) ¶

Parses the info returned by DatasetImporter.get_dataset_info and stores it on the relevant properties of the dataset.

Parameters
dataset	a `fiftyone.core.dataset.Dataset`
info	an info dict
overwrite:`True`	whether to overwrite existing dataset info fields

logger = (source) ¶

Undocumented

def _build_parse_sample_fcn(dataset, dataset_importer, label_field, tags, expand_schema, dynamic): (source) ¶

Undocumented

def _generate_group_samples(dataset_importer, parse_sample): (source) ¶

Undocumented

def _get_rng(seed): (source) ¶

Undocumented

def _handle_legacy_formats(dataset_importer): (source) ¶

Undocumented

def _import_runs(dataset, runs, results_dir, run_cls): (source) ¶

Undocumented

def _import_saved_views(dataset, views): (source) ¶

Undocumented

def _import_workspaces(dataset, workspaces): (source) ¶

Undocumented

def _load_labeled_dataset_index(dataset_dir): (source) ¶

Undocumented

def _parse_media_fields(sd, media_fields, rel_dir): (source) ¶

Undocumented

def _parse_nested_media_field(d, media_fields, rel_dir, field_name, key): (source) ¶

Undocumented

def _set_created_at(field_dict, created_at): (source) ¶

Undocumented

def _to_list(arg): (source) ¶

Undocumented

def _update_no_overwrite(d, dnew): (source) ¶

Undocumented