class documentation

A FiftyOne dataset.

Datasets represent an ordered collection of fiftyone.core.sample.Sample instances that describe a particular type of raw media (e.g., images or videos) together with a user-defined set of fields.

FiftyOne datasets ingest and store the labels for all samples internally; raw media is stored on disk and the dataset provides paths to the data.

See :ref:`this page <using-datasets>` for an overview of working with FiftyOne datasets.

Parameters
namethe name of the dataset. By default, get_default_dataset_name is used
persistentwhether the dataset should persist in the database after the session terminates
overwritewhether to overwrite an existing dataset of the same name
Class Method from_archive Creates a Dataset from the contents of the given archive.
Class Method from_dict Loads a Dataset from a JSON dictionary generated by fiftyone.core.collections.SampleCollection.to_dict.
Class Method from_dir Creates a Dataset from the contents of the given directory.
Class Method from_images Creates a Dataset from the given images.
Class Method from_images_dir Creates a Dataset from the given directory of images.
Class Method from_images_patt Creates a Dataset from the given glob pattern of images.
Class Method from_importer Creates a Dataset by importing the samples in the given fiftyone.utils.data.importers.DatasetImporter.
Class Method from_json Loads a Dataset from JSON generated by fiftyone.core.collections.SampleCollection.write_json or fiftyone.core.collections.SampleCollection.to_json.
Class Method from_labeled_images Creates a Dataset from the given labeled images.
Class Method from_labeled_videos Creates a Dataset from the given labeled videos.
Class Method from_videos Creates a Dataset from the given videos.
Class Method from_videos_dir Creates a Dataset from the given directory of videos.
Class Method from_videos_patt Creates a Dataset from the given glob pattern of videos.
Method __copy__ Undocumented
Method __deepcopy__ Undocumented
Method __delitem__ Undocumented
Method __eq__ Undocumented
Method __getattribute__ Undocumented
Method __getitem__ Undocumented
Method __init__ Undocumented
Method __len__ Undocumented
Method add_archive Adds the contents of the given archive to the dataset.
Method add_collection Adds the contents of the given collection to the dataset.
Method add_dir Adds the contents of the given directory to the dataset.
Method add_dynamic_frame_fields Adds all dynamic frame fields to the dataset's schema.
Method add_dynamic_sample_fields Adds all dynamic sample fields to the dataset's schema.
Method add_frame_field Adds a new frame-level field or embedded field to the dataset, if necessary.
Method add_group_field Adds a group field to the dataset, if necessary.
Method add_group_slice Adds a group slice with the given media type to the dataset, if necessary.
Method add_images Adds the given images to the dataset.
Method add_images_dir Adds the given directory of images to the dataset.
Method add_images_patt Adds the given glob pattern of images to the dataset.
Method add_importer Adds the samples from the given fiftyone.utils.data.importers.DatasetImporter to the dataset.
Method add_labeled_images Adds the given labeled images to the dataset.
Method add_labeled_videos Adds the given labeled videos to the dataset.
Method add_sample Adds the given sample to the dataset.
Method add_sample_field Adds a new sample field or embedded field to the dataset, if necessary.
Method add_samples Adds the given samples to the dataset.
Method add_videos Adds the given videos to the dataset.
Method add_videos_dir Adds the given directory of videos to the dataset.
Method add_videos_patt Adds the given glob pattern of videos to the dataset.
Method app_config.setter Undocumented
Method check_summary_fields Returns a list of summary fields that may need to be updated.
Method classes.setter Undocumented
Method clear Removes all samples from the dataset.
Method clear_cache Clears the dataset's in-memory cache.
Method clear_frame_field Clears the values of the frame-level field from all samples in the dataset.
Method clear_frame_fields Clears the values of the frame-level fields from all samples in the dataset.
Method clear_frames Removes all frame labels from the dataset.
Method clear_sample_field Clears the values of the field from all samples in the dataset.
Method clear_sample_fields Clears the values of the fields from all samples in the dataset.
Method clone Creates a copy of the dataset.
Method clone_frame_field Clones the frame-level field into a new field.
Method clone_frame_fields Clones the frame-level fields into new fields.
Method clone_sample_field Clones the given sample field into a new field of the dataset.
Method clone_sample_fields Clones the given sample fields into new fields of the dataset.
Method create_summary_field Populates a sample-level field that records the unique values or numeric ranges that appear in the specified field on each sample in the dataset.
Method default_classes.setter Undocumented
Method default_group_slice.setter Undocumented
Method default_mask_targets.setter Undocumented
Method default_skeleton.setter Undocumented
Method delete Deletes the dataset.
Method delete_frame_field Deletes the frame-level field from all samples in the dataset.
Method delete_frame_fields Deletes the frame-level fields from all samples in the dataset.
Method delete_frames Deletes the given frames(s) from the dataset.
Method delete_group_slice Deletes all samples in the given group slice from the dataset.
Method delete_groups Deletes the given groups(s) from the dataset.
Method delete_labels Deletes the specified labels from the dataset.
Method delete_sample_field Deletes the field from all samples in the dataset.
Method delete_sample_fields Deletes the fields from all samples in the dataset.
Method delete_samples Deletes the given sample(s) from the dataset.
Method delete_saved_view Deletes the saved view with the given name.
Method delete_saved_views Deletes all saved views from this dataset.
Method delete_summary_field Deletes the summary field from all samples in the dataset.
Method delete_summary_fields Deletes the summary fields from all samples in the dataset.
Method delete_workspace Deletes the saved workspace with the given name.
Method delete_workspaces Deletes all saved workspaces from this dataset.
Method description.setter Undocumented
Method ensure_frames Ensures that the video dataset contains frame instances for every frame of each sample's source video.
Method first Returns the first sample in the dataset.
Method get_field_schema Returns a schema dictionary describing the fields of the samples in the dataset.
Method get_frame_field_schema Returns a schema dictionary describing the fields of the frames of the samples in the dataset.
Method get_group Returns a dict containing the samples for the given group ID.
Method get_saved_view_info Loads the editable information about the saved view with the given name.
Method get_workspace_info Gets the information about the workspace with the given name.
Method group_slice.setter Undocumented
Method has_saved_view Whether this dataset has a saved view with the given name.
Method has_workspace Whether this dataset has a saved workspace with the given name.
Method head Returns a list of the first few samples in the dataset.
Method info.setter Undocumented
Method ingest_images Ingests the given iterable of images into the dataset.
Method ingest_labeled_images Ingests the given iterable of labeled image samples into the dataset.
Method ingest_labeled_videos Ingests the given iterable of labeled video samples into the dataset.
Method ingest_videos Ingests the given iterable of videos into the dataset.
Method iter_groups Returns an iterator over the groups in the dataset.
Method iter_samples Returns an iterator over the samples in the dataset.
Method last Returns the last sample in the dataset.
Method list_saved_views List saved views on this dataset.
Method list_summary_fields Lists the summary fields on the dataset.
Method list_workspaces List saved workspaces on this dataset.
Method load_saved_view Loads the saved view with the given name.
Method load_workspace Loads the saved workspace with the given name.
Method mask_targets.setter Undocumented
Method media_type.setter Undocumented
Method merge_archive Merges the contents of the given archive into the dataset.
Method merge_dir Merges the contents of the given directory into the dataset.
Method merge_importer Merges the samples from the given fiftyone.utils.data.importers.DatasetImporter into the dataset.
Method merge_sample Merges the fields of the given sample into this dataset.
Method merge_samples Merges the given samples into this dataset.
Method name.setter Undocumented
Method one Returns a single sample in this dataset matching the expression.
Method persistent.setter Undocumented
Method reload Reloads the dataset and any in-memory samples from the database.
Method remove_dynamic_frame_field Removes the dynamic embedded frame field from the dataset's schema.
Method remove_dynamic_frame_fields Removes the dynamic embedded frame fields from the dataset's schema.
Method remove_dynamic_sample_field Removes the dynamic embedded sample field from the dataset's schema.
Method remove_dynamic_sample_fields Removes the dynamic embedded sample fields from the dataset's schema.
Method rename_frame_field Renames the frame-level field to the given new name.
Method rename_frame_fields Renames the frame-level fields to the given new names.
Method rename_group_slice Renames the group slice with the given name.
Method rename_sample_field Renames the sample field to the given new name.
Method rename_sample_fields Renames the sample fields to the given new names.
Method save Saves the dataset to the database.
Method save_view Saves the given view into this dataset under the given name so it can be loaded later via load_saved_view.
Method save_workspace Saves a workspace into this dataset under the given name so it can be loaded later via load_workspace.
Method skeletons.setter Undocumented
Method stats Returns stats about the dataset on disk.
Method summary Returns a string summary of the dataset.
Method tags.setter Undocumented
Method tail Returns a list of the last few samples in the dataset.
Method update_saved_view_info Updates the editable information for the saved view with the given name.
Method update_summary_field Updates the summary field based on the current values of its source field.
Method update_workspace_info Updates the editable information for the saved view with the given name.
Method view Returns a fiftyone.core.view.DatasetView containing the entire dataset.
Class Variable __slots__ Undocumented
Instance Variable group_slice The current group slice of the dataset, or None if the dataset is not grouped.
Instance Variable media_type The media type of the dataset.
Property app_config A fiftyone.core.odm.dataset.DatasetAppConfig that customizes how this dataset is visualized in the :ref:`FiftyOne App <fiftyone-app>`.
Property classes A dict mapping field names to list of class label strings for the corresponding fields of the dataset.
Property created_at The datetime that the dataset was created.
Property default_classes A list of class label strings for all fiftyone.core.labels.Label fields of this dataset that do not have customized classes defined in classes.
Property default_group_slice The default group slice of the dataset, or None if the dataset is not grouped.
Property default_mask_targets A dict defining a default mapping between pixel values (2D masks) or RGB hex strings (3D masks) and label strings for the segmentation masks of all fiftyone.core.labels.Segmentation fields of this dataset that do not have customized mask targets defined in ...
Property default_skeleton A default fiftyone.core.odm.dataset.KeypointSkeleton defining the semantic labels and point connectivity for all fiftyone.core.labels.Keypoint fields of this dataset that do not have customized skeletons defined in ...
Property deleted Whether the dataset is deleted.
Property description A string description on the dataset.
Property group_field The group field of the dataset, or None if the dataset is not grouped.
Property group_media_types A dict mapping group slices to media types, or None if the dataset is not grouped.
Property group_slices The list of group slices of the dataset, or None if the dataset is not grouped.
Property has_saved_views Whether this dataset has any saved views.
Property has_workspaces Whether this dataset has any saved workspaces.
Property info A user-facing dictionary of information about the dataset.
Property last_loaded_at The datetime that the dataset was last loaded.
Property last_modified_at The datetime that the dataset was last modified.
Property mask_targets A dict mapping field names to mask target dicts, each of which defines a mapping between pixel values (2D masks) or RGB hex strings (3D masks) and label strings for the segmentation masks in the corresponding field of the dataset.
Property name The name of the dataset.
Property persistent Whether the dataset persists in the database after a session is terminated.
Property skeletons A dict mapping field names to fiftyone.core.odm.dataset.KeypointSkeleton instances, each of which defines the semantic labels and point connectivity for the fiftyone.core.labels.Keypoint instances in the corresponding field of the dataset.
Property slug The slug of the dataset.
Property tags A list of tags on the dataset.
Property version The version of the fiftyone package for which the dataset is formatted.
Method _add_group_field Undocumented
Method _add_implied_frame_field Undocumented
Method _add_implied_sample_field Undocumented
Method _add_samples_batch Undocumented
Method _add_view_stage Returns a fiftyone.core.view.DatasetView containing the contents of the collection with the given fiftyone.core.stages.ViewStage` appended to its aggregation pipeline.
Method _aggregate Runs the MongoDB aggregation pipeline on the collection and returns the result.
Method _apply_frame_field_schema Undocumented
Method _apply_sample_field_schema Undocumented
Method _attach_frames_pipeline A pipeline that attaches the frame documents for each document.
Method _attach_groups_pipeline A pipeline that attaches the requested group slice(s) for each document and stores them in under groups.<slice> keys.
Method _bulk_write Undocumented
Method _clear Undocumented
Method _clear_frame_fields Undocumented
Method _clear_frames Undocumented
Method _clear_groups Undocumented
Method _clear_sample_fields Undocumented
Method _clone Undocumented
Method _clone_frame_fields Undocumented
Method _clone_sample_fields Undocumented
Method _delete Undocumented
Method _delete_frame_fields Undocumented
Method _delete_labels Undocumented
Method _delete_sample_fields Undocumented
Method _delete_saved_view Undocumented
Method _delete_summary_fields Undocumented
Method _delete_workspace Undocumented
Method _ensure_frames Undocumented
Method _ensure_label_field Undocumented
Method _estimated_count Undocumented
Method _expand_frame_schema Undocumented
Method _expand_group_schema Undocumented
Method _expand_schema Undocumented
Method _frame_collstats Undocumented
Method _frame_dict_to_doc Undocumented
Method _get_default_summary_field_name Undocumented
Method _get_frame_collection Undocumented
Method _get_sample_collection Undocumented
Method _get_saved_view_doc Undocumented
Method _get_summarized_fields_map Undocumented
Method _get_workspace_doc Undocumented
Method _group_select_pipeline A pipeline that selects only the given slice's documents from the pipeline.
Method _groups_only_pipeline A pipeline that looks up the requested group slices for each document and returns (only) the unwound group slices.
Method _init_frames Undocumented
Method _iter_groups Undocumented
Method _iter_samples Undocumented
Method _keep Undocumented
Method _keep_fields Undocumented
Method _keep_frames Undocumented
Method _load_saved_view_from_doc Undocumented
Method _make_dict Undocumented
Method _make_frame Undocumented
Method _make_sample Undocumented
Method _merge_doc Undocumented
Method _merge_frame_field_schema Undocumented
Method _merge_sample_field_schema Undocumented
Method _pipeline Returns the MongoDB aggregation pipeline for the collection.
Method _populate_summary_field Undocumented
Method _reload Undocumented
Method _reload_docs Undocumented
Method _remove_dynamic_frame_fields Undocumented
Method _remove_dynamic_sample_fields Undocumented
Method _rename_frame_fields Undocumented
Method _rename_sample_fields Undocumented
Method _sample_collstats Undocumented
Method _sample_dict_to_doc Undocumented
Method _save Undocumented
Method _save_field Undocumented
Method _serialize Undocumented
Method _set_media_type Undocumented
Method _unwind_frames_pipeline A pipeline that returns (only) the unwound frames documents.
Method _unwind_groups_pipeline A pipeline that returns (only) the unwound groups documents.
Method _update_last_loaded_at Undocumented
Method _update_metadata_field Undocumented
Method _upsert_samples Undocumented
Method _upsert_samples_batch Undocumented
Method _validate_samples Undocumented
Method _validate_saved_view_name Undocumented
Method _validate_workspace_name Undocumented
Instance Variable _annotation_cache Undocumented
Instance Variable _brain_cache Undocumented
Instance Variable _deleted Undocumented
Instance Variable _doc Undocumented
Instance Variable _evaluation_cache Undocumented
Instance Variable _frame_doc_cls Undocumented
Instance Variable _group_slice Undocumented
Instance Variable _run_cache Undocumented
Instance Variable _sample_doc_cls Undocumented
Property _dataset The fiftyone.core.dataset.Dataset that serves the samples in this collection.
Property _frame_collection Undocumented
Property _frame_collection_name Undocumented
Property _is_clips Whether this collection contains clips.
Property _is_dynamic_groups Whether this collection contains dynamic groups.
Property _is_frames Whether this collection contains frames of a video dataset.
Property _is_generated Whether this collection's contents is generated from another collection.
Property _is_patches Whether this collection contains patches.
Property _root_dataset The root fiftyone.core.dataset.Dataset from which this collection is derived.
Property _sample_collection Undocumented
Property _sample_collection_name Undocumented

Inherited from SampleCollection:

Class Method list_aggregations Returns a list of all available methods on this collection that apply fiftyone.core.aggregations.Aggregation operations to this collection.
Class Method list_view_stages Returns a list of all available methods on this collection that apply fiftyone.core.stages.ViewStage operations to this collection.
Method __add__ Undocumented
Method __bool__ Undocumented
Method __contains__ Undocumented
Method __iter__ Undocumented
Method __repr__ Undocumented
Method __str__ Undocumented
Method add_stage Applies the given fiftyone.core.stages.ViewStage to the collection.
Method aggregate Aggregates one or more fiftyone.core.aggregations.Aggregation instances.
Method annotate Exports the samples and optional label field(s) in this collection to the given annotation backend.
Method apply_model Applies the model to the samples in the collection.
Method bounds Computes the bounds of a numeric field of the collection.
Method compute_embeddings Computes embeddings for the samples in the collection using the given model.
Method compute_metadata Populates the metadata field of all samples in the collection.
Method compute_patch_embeddings Computes embeddings for the image patches defined by patches_field of the samples in the collection using the given model.
Method concat Concatenates the contents of the given SampleCollection to this collection.
Method count Counts the number of field values in the collection.
Method count_label_tags Counts the occurrences of all label tags in the specified label field(s) of this collection.
Method count_sample_tags Counts the occurrences of sample tags in this collection.
Method count_values Counts the occurrences of field values in the collection.
Method create_index Creates an index on the given field or with the given specification, if necessary.
Method delete_annotation_run Deletes the annotation run with the given key from this collection.
Method delete_annotation_runs Deletes all annotation runs from this collection.
Method delete_brain_run Deletes the brain method run with the given key from this collection.
Method delete_brain_runs Deletes all brain method runs from this collection.
Method delete_evaluation Deletes the evaluation results associated with the given evaluation key from this collection.
Method delete_evaluations Deletes all evaluation results from this collection.
Method delete_run Deletes the run with the given key from this collection.
Method delete_runs Deletes all runs from this collection.
Method distinct Computes the distinct values of a field in the collection.
Method draw_labels Renders annotated versions of the media in the collection with the specified label data overlaid to the given directory.
Method drop_index Drops the index for the given field or name, if necessary.
Method evaluate_classifications Evaluates the classification predictions in this collection with respect to the specified ground truth labels.
Method evaluate_detections Evaluates the specified predicted detections in this collection with respect to the specified ground truth detections.
Method evaluate_regressions Evaluates the regression predictions in this collection with respect to the specified ground truth values.
Method evaluate_segmentations Evaluates the specified semantic segmentation masks in this collection with respect to the specified ground truth masks.
Method exclude Excludes the samples with the given IDs from the collection.
Method exclude_by Excludes the samples with the given field values from the collection.
Method exclude_fields Excludes the fields with the given names from the samples in the collection.
Method exclude_frames Excludes the frames with the given IDs from the video collection.
Method exclude_groups Excludes the groups with the given IDs from the grouped collection.
Method exclude_labels Excludes the specified labels from the collection.
Method exists Returns a view containing the samples in the collection that have (or do not have) a non-None value for the given field or embedded field.
Method export Exports the samples in the collection to disk.
Method filter_field Filters the values of a field or embedded field of each sample in the collection.
Method filter_keypoints Filters the individual fiftyone.core.labels.Keypoint.points elements in the specified keypoints field of each sample in the collection.
Method filter_labels Filters the fiftyone.core.labels.Label field of each sample in the collection.
Method flatten Returns a flattened view that contains all samples in the dynamic grouped collection.
Method geo_near Sorts the samples in the collection by their proximity to a specified geolocation.
Method geo_within Filters the samples in this collection to only include samples whose geolocation is within a specified boundary.
Method get_annotation_info Returns information about the annotation run with the given key on this collection.
Method get_brain_info Returns information about the brain method run with the given key on this collection.
Method get_classes Gets the classes list for the given field, or None if no classes are available.
Method get_dynamic_field_schema Returns a schema dictionary describing the dynamic fields of the samples in the collection.
Method get_dynamic_frame_field_schema Returns a schema dictionary describing the dynamic fields of the frames in the collection.
Method get_evaluation_info Returns information about the evaluation with the given key on this collection.
Method get_field Returns the field instance of the provided path, or None if one does not exist.
Method get_index_information Returns a dictionary of information about the indexes on this collection.
Method get_mask_targets Gets the mask targets for the given field, or None if no mask targets are available.
Method get_run_info Returns information about the run with the given key on this collection.
Method get_skeleton Gets the keypoint skeleton for the given field, or None if no skeleton is available.
Method group_by Creates a view that groups the samples in the collection by a specified field or expression.
Method has_annotation_run Whether this collection has an annotation run with the given key.
Method has_brain_run Whether this collection has a brain method run with the given key.
Method has_classes Determines whether this collection has a classes list for the given field.
Method has_evaluation Whether this collection has an evaluation with the given key.
Method has_field Determines whether the collection has a field with the given name.
Method has_frame_field Determines whether the collection has a frame-level field with the given name.
Method has_mask_targets Determines whether this collection has mask targets for the given field.
Method has_run Whether this collection has a run with the given key.
Method has_sample_field Determines whether the collection has a sample field with the given name.
Method has_skeleton Determines whether this collection has a keypoint skeleton for the given field.
Method histogram_values Computes a histogram of the field values in the collection.
Method init_run Initializes a config instance for a new run.
Method init_run_results Initializes a results instance for the run with the given key.
Method limit Returns a view with at most the given number of samples.
Method limit_labels Limits the number of fiftyone.core.labels.Label instances in the specified labels list field of each sample in the collection.
Method list_annotation_runs Returns a list of annotation keys on this collection.
Method list_brain_runs Returns a list of brain keys on this collection.
Method list_evaluations Returns a list of evaluation keys on this collection.
Method list_indexes Returns the list of index names on this collection.
Method list_runs Returns a list of run keys on this collection.
Method list_schema Extracts the value type(s) in a specified list field across all samples in the collection.
Method load_annotation_results Loads the results for the annotation run with the given key on this collection.
Method load_annotation_view Loads the fiftyone.core.view.DatasetView on which the specified annotation run was performed on this collection.
Method load_annotations Downloads the labels from the given annotation run from the annotation backend and merges them into this collection.
Method load_brain_results Loads the results for the brain method run with the given key on this collection.
Method load_brain_view Loads the fiftyone.core.view.DatasetView on which the specified brain method run was performed on this collection.
Method load_evaluation_results Loads the results for the evaluation with the given key on this collection.
Method load_evaluation_view Loads the fiftyone.core.view.DatasetView on which the specified evaluation was performed on this collection.
Method load_run_results Loads the results for the run with the given key on this collection.
Method load_run_view Loads the fiftyone.core.view.DatasetView on which the specified run was performed on this collection.
Method make_unique_field_name Makes a unique field name with the given root name for the collection.
Method map_labels Maps the label values of a fiftyone.core.labels.Label field to new values for each sample in the collection.
Method match Filters the samples in the collection by the given filter.
Method match_frames Filters the frames in the video collection by the given filter.
Method match_labels Selects the samples from the collection that contain (or do not contain) at least one label that matches the specified criteria.
Method match_tags Returns a view containing the samples in the collection that have or don't have any/all of the given tag(s).
Method max Computes the maximum of a numeric field of the collection.
Method mean Computes the arithmetic mean of the field values of the collection.
Method merge_labels Merges the labels from the given input field into the given output field of the collection.
Method min Computes the minimum of a numeric field of the collection.
Method mongo Adds a view stage defined by a raw MongoDB aggregation pipeline.
Method quantiles Computes the quantile(s) of the field values of a collection.
Method register_run Registers a run under the given key on this collection.
Method rename_annotation_run Replaces the key for the given annotation run with a new key.
Method rename_brain_run Replaces the key for the given brain run with a new key.
Method rename_evaluation Replaces the key for the given evaluation with a new key.
Method rename_run Replaces the key for the given run with a new key.
Method save_context Returns a context that can be used to save samples from this collection according to a configurable batching strategy.
Method save_run_results Saves run results for the run with the given key.
Method schema Extracts the names and types of the attributes of a specified embedded document field across all samples in the collection.
Method select Selects the samples with the given IDs from the collection.
Method select_by Selects the samples with the given field values from the collection.
Method select_fields Selects only the fields with the given names from the samples in the collection. All other fields are excluded.
Method select_frames Selects the frames with the given IDs from the video collection.
Method select_group_slices Selects the samples in the group collection from the given slice(s).
Method select_groups Selects the groups with the given IDs from the grouped collection.
Method select_labels Selects only the specified labels from the collection.
Method set_field Sets a field or embedded field on each sample in a collection by evaluating the given expression.
Method set_label_values Sets the fields of the specified labels in the collection to the given values.
Method set_values Sets the field or embedded field on each sample or frame in the collection to the given values.
Method shuffle Randomly shuffles the samples in the collection.
Method skip Omits the given number of samples from the head of the collection.
Method sort_by Sorts the samples in the collection by the given field(s) or expression(s).
Method sort_by_similarity Sorts the collection by similarity to a specified query.
Method split_labels Splits the labels from the given input field into the given output field of the collection.
Method std Computes the standard deviation of the field values of the collection.
Method sum Computes the sum of the field values of the collection.
Method sync_last_modified_at Syncs the last_modified_at property(s) of the dataset.
Method tag_labels Adds the tag(s) to all labels in the specified label field(s) of this collection, if necessary.
Method tag_samples Adds the tag(s) to all samples in this collection, if necessary.
Method take Randomly samples the given number of samples from the collection.
Method to_clips Creates a view that contains one sample per clip defined by the given field or expression in the video collection.
Method to_dict Returns a JSON dictionary representation of the collection.
Method to_evaluation_patches Creates a view based on the results of the evaluation with the given key that contains one sample for each true positive, false positive, and false negative example in the collection, respectively.
Method to_frames Creates a view that contains one sample per frame in the video collection.
Method to_json Returns a JSON string representation of the collection.
Method to_patches Creates a view that contains one sample per object patch in the specified field of the collection.
Method to_trajectories Creates a view that contains one clip for each unique object trajectory defined by their (label, index) in a frame-level field of a video collection.
Method untag_labels Removes the tag from all labels in the specified label field(s) of this collection, if necessary.
Method untag_samples Removes the tag(s) from all samples in this collection, if necessary.
Method update_run_config Updates the run config for the run with the given key.
Method validate_field_type Validates that the collection has a field of the given type.
Method validate_fields_exist Validates that the collection has field(s) with the given name(s).
Method values Extracts the values of a field from all samples in the collection.
Method write_json Writes the colllection to disk in JSON format.
Property has_annotation_runs Whether this collection has any annotation runs.
Property has_brain_runs Whether this collection has any brain runs.
Property has_evaluations Whether this collection has any evaluation results.
Property has_runs Whether this collection has any runs.
Async Method _async_aggregate Undocumented
Method _build_aggregation Undocumented
Method _build_batch_pipeline Undocumented
Method _build_big_pipeline Undocumented
Method _build_facets Undocumented
Method _contains_media_type Undocumented
Method _contains_videos Undocumented
Method _do_get_dynamic_field_schema Undocumented
Method _edit_label_tags Undocumented
Method _edit_sample_tags Undocumented
Method _expand_schema_from_values Undocumented
Method _get_db_fields_map Undocumented
Method _get_default_field Undocumented
Method _get_default_frame_fields Undocumented
Method _get_default_indexes Undocumented
Method _get_default_sample_fields Undocumented
Method _get_dynamic_field_schema Undocumented
Method _get_extremum Undocumented
Method _get_frame_label_field_schema Undocumented
Method _get_frames_bytes Computes the total size of the frame documents in the collection.
Method _get_geo_location_field Undocumented
Method _get_group_media_types Undocumented
Method _get_group_slices Undocumented
Method _get_label_attributes_schema Undocumented
Method _get_label_field_path Undocumented
Method _get_label_field_root Undocumented
Method _get_label_field_schema Undocumented
Method _get_label_field_type Undocumented
Method _get_label_fields Undocumented
Method _get_label_ids Undocumented
Method _get_media_fields Undocumented
Method _get_per_frame_bytes Returns a dictionary mapping frame IDs to document sizes (in bytes) for each frame in the video collection.
Method _get_per_sample_bytes Returns a dictionary mapping sample IDs to document sizes (in bytes) for each sample in the collection.
Method _get_per_sample_frames_bytes Returns a dictionary mapping sample IDs to total frame document sizes (in bytes) for each sample in the video collection.
Method _get_root_field_type Undocumented
Method _get_root_fields Undocumented
Method _get_samples_bytes Computes the total size of the sample documents in the collection.
Method _get_selected_labels Undocumented
Method _get_store Undocumented
Method _get_values_by_id Undocumented
Method _handle_db_field Undocumented
Method _handle_db_fields Undocumented
Method _handle_frame_field Undocumented
Method _handle_group_field Undocumented
Method _handle_id_fields Undocumented
Method _has_field Undocumented
Method _has_frame_fields Undocumented
Method _has_stores Undocumented
Method _is_default_field Undocumented
Method _is_frame_field Undocumented
Method _is_full_collection Undocumented
Method _is_group_field Undocumented
Method _is_label_field Undocumented
Method _is_read_only_field Undocumented
Method _list_stores Undocumented
Method _make_and_aggregate Undocumented
Method _make_set_field_pipeline Undocumented
Method _max Undocumented
Method _min Undocumented
Method _parse_aggregations Undocumented
Method _parse_big_result Undocumented
Method _parse_default_mask_targets Undocumented
Method _parse_default_skeleton Undocumented
Method _parse_faceted_result Undocumented
Method _parse_field Undocumented
Method _parse_field_name Undocumented
Method _parse_frame_labels_field Undocumented
Method _parse_label_field Undocumented
Method _parse_mask_targets Undocumented
Method _parse_media_field Undocumented
Method _parse_skeletons Undocumented
Method _process_aggregations Undocumented
Method _serialize_default_mask_targets Undocumented
Method _serialize_default_skeleton Undocumented
Method _serialize_field_schema Undocumented
Method _serialize_frame_field_schema Undocumented
Method _serialize_mask_targets Undocumented
Method _serialize_schema Undocumented
Method _serialize_skeletons Undocumented
Method _set_doc_values Undocumented
Method _set_frame_values Undocumented
Method _set_label_list_values Undocumented
Method _set_labels Undocumented
Method _set_list_values_by_id Undocumented
Method _set_sample_values Undocumented
Method _set_values Undocumented
Method _split_frame_fields Undocumented
Method _sync_dataset_last_modified_at Undocumented
Method _sync_samples_last_modified_at Undocumented
Method _tag_labels Undocumented
Method _to_fields_str Undocumented
Method _untag_labels Undocumented
Method _unwind_values Undocumented
Method _validate_root_field Undocumented
Constant _FRAMES_PREFIX Undocumented
Constant _GROUPS_PREFIX Undocumented
Property _element_str Undocumented
Property _elements_str Undocumented
@classmethod
def from_archive(cls, archive_path, dataset_type=None, data_path=None, labels_path=None, name=None, persistent=False, overwrite=False, label_field=None, tags=None, dynamic=False, cleanup=True, progress=None, **kwargs): (source)

Creates a Dataset from the contents of the given archive.

If a directory with the same root name as archive_path exists, it is assumed that this directory contains the extracted contents of the archive, and thus the archive is not re-extracted.

See :ref:`this guide <loading-datasets-from-disk>` for example usages of this method and descriptions of the available dataset types.

Note

The following archive formats are explicitly supported:

.zip, .tar, .tar.gz, .tgz, .tar.bz, .tbz

If an archive not in the above list is found, extraction will be attempted via the patool package, which supports many formats but may require that additional system packages be installed.

Parameters
archive_paththe path to an archive of a dataset directory
dataset_type:Nonethe fiftyone.types.Dataset type of the dataset in archive_path
data_path:None

an optional parameter that enables explicit control over the location of the media for certain dataset types. Can be any of the following:

  • a folder name like "data" or "data/" specifying a subfolder of dataset_dir in which the media lies
  • an absolute directory path in which the media lies. In this case, the archive_path has no effect on the location of the data
  • a filename like "data.json" specifying the filename of a JSON manifest file in archive_path that maps UUIDs to media filepaths. Files of this format are generated when passing the export_media="manifest" option to fiftyone.core.collections.SampleCollection.export
  • an absolute filepath to a JSON manifest file. In this case, archive_path has no effect on the location of the data
  • a dict mapping filenames to absolute filepaths

By default, it is assumed that the data can be located in the default location within archive_path for the dataset type

labels_path:None

an optional parameter that enables explicit control over the location of the labels. Only applicable when importing certain labeled dataset formats. Can be any of the following:

  • a type-specific folder name like "labels" or "labels/" or a filename like "labels.json" or "labels.xml" specifying the location in archive_path of the labels file(s)
  • an absolute directory or filepath containing the labels file(s). In this case, archive_path has no effect on the location of the labels

For labeled datasets, this parameter defaults to the location in archive_path of the labels for the default layout of the dataset type being imported

name:Nonea name for the dataset. By default, get_default_dataset_name is used
persistent:Falsewhether the dataset should persist in the database after the session terminates
overwrite:Falsewhether to overwrite an existing dataset of the same name
label_field:Nonecontrols the field(s) in which imported labels are stored. Only applicable if dataset_importer is a fiftyone.utils.data.importers.LabeledImageDatasetImporter or fiftyone.utils.data.importers.LabeledVideoDatasetImporter. If the importer produces a single fiftyone.core.labels.Label instance per sample/frame, this argument specifies the name of the field to use; the default is "ground_truth". If the importer produces a dictionary of labels per sample, this argument can be either a string prefix to prepend to each label key or a dict mapping label keys to field names; the default in this case is to directly use the keys of the imported label dictionaries as field names
tags:Nonean optional tag or iterable of tags to attach to each sample
dynamic:Falsewhether to declare dynamic attributes of embedded document fields that are encountered
cleanup:Truewhether to delete the archive after extracting it
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
**kwargsoptional keyword arguments to pass to the constructor of the fiftyone.utils.data.importers.DatasetImporter for the specified dataset_type
Returns
a Dataset
@classmethod
def from_dict(cls, d, name=None, persistent=False, overwrite=False, rel_dir=None, frame_labels_dir=None, progress=None): (source)

Loads a Dataset from a JSON dictionary generated by fiftyone.core.collections.SampleCollection.to_dict.

The JSON dictionary can contain an export of any fiftyone.core.collections.SampleCollection, e.g., Dataset or fiftyone.core.view.DatasetView.

Parameters
da JSON dictionary
name:Nonea name for the new dataset
persistent:Falsewhether the dataset should persist in the database after the session terminates
overwrite:Falsewhether to overwrite an existing dataset of the same name
rel_dir:Nonea relative directory to prepend to the filepath of each sample if the filepath is not absolute (begins with a path separator). The path is converted to an absolute path (if necessary) via fiftyone.core.storage.normalize_path
frame_labels_dir:Nonea directory of per-sample JSON files containing the frame labels for video samples. If omitted, it is assumed that the frame labels are included directly in the provided JSON dict. Only applicable to datasets that contain videos
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
Returns
a Dataset
@classmethod
def from_dir(cls, dataset_dir=None, dataset_type=None, data_path=None, labels_path=None, name=None, persistent=False, overwrite=False, label_field=None, tags=None, dynamic=False, progress=None, **kwargs): (source)

Creates a Dataset from the contents of the given directory.

You can create datasets with this method via the following basic patterns:

  1. Provide dataset_dir and dataset_type to import the contents of a directory that is organized in the default layout for the dataset type as documented in :ref:`this guide <loading-datasets-from-disk>`
  2. Provide dataset_type along with data_path, labels_path, or other type-specific parameters to perform a customized import. This syntax provides the flexibility to, for example, perform labels-only imports or imports where the source media lies in a different location than the labels

In either workflow, the remaining parameters of this method can be provided to further configure the import.

See :ref:`this guide <loading-datasets-from-disk>` for example usages of this method and descriptions of the available dataset types.

Parameters
dataset_dir:Nonethe dataset directory. This can be omitted if you provide arguments such as data_path and labels_path
dataset_type:Nonethe fiftyone.types.Dataset type of the dataset
data_path:None

an optional parameter that enables explicit control over the location of the media for certain dataset types. Can be any of the following:

  • a folder name like "data" or "data/" specifying a subfolder of dataset_dir in which the media lies
  • an absolute directory path in which the media lies. In this case, the dataset_dir has no effect on the location of the data
  • a filename like "data.json" specifying the filename of a JSON manifest file in dataset_dir that maps UUIDs to media filepaths. Files of this format are generated when passing the export_media="manifest" option to fiftyone.core.collections.SampleCollection.export
  • an absolute filepath to a JSON manifest file. In this case, dataset_dir has no effect on the location of the data
  • a dict mapping filenames to absolute filepaths

By default, it is assumed that the data can be located in the default location within dataset_dir for the dataset type

labels_path:None

an optional parameter that enables explicit control over the location of the labels. Only applicable when importing certain labeled dataset formats. Can be any of the following:

  • a type-specific folder name like "labels" or "labels/" or a filename like "labels.json" or "labels.xml" specifying the location in dataset_dir of the labels file(s)
  • an absolute directory or filepath containing the labels file(s). In this case, dataset_dir has no effect on the location of the labels

For labeled datasets, this parameter defaults to the location in dataset_dir of the labels for the default layout of the dataset type being imported

name:Nonea name for the dataset. By default, get_default_dataset_name is used
persistent:Falsewhether the dataset should persist in the database after the session terminates
overwrite:Falsewhether to overwrite an existing dataset of the same name
label_field:Nonecontrols the field(s) in which imported labels are stored. Only applicable if dataset_importer is a fiftyone.utils.data.importers.LabeledImageDatasetImporter or fiftyone.utils.data.importers.LabeledVideoDatasetImporter. If the importer produces a single fiftyone.core.labels.Label instance per sample/frame, this argument specifies the name of the field to use; the default is "ground_truth". If the importer produces a dictionary of labels per sample, this argument can be either a string prefix to prepend to each label key or a dict mapping label keys to field names; the default in this case is to directly use the keys of the imported label dictionaries as field names
tags:Nonean optional tag or iterable of tags to attach to each sample
dynamic:Falsewhether to declare dynamic attributes of embedded document fields that are encountered
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
**kwargsoptional keyword arguments to pass to the constructor of the fiftyone.utils.data.importers.DatasetImporter for the specified dataset_type
Returns
a Dataset
@classmethod
def from_images(cls, paths_or_samples, sample_parser=None, name=None, persistent=False, overwrite=False, tags=None, progress=None): (source)

Creates a Dataset from the given images.

This operation does not read the images.

See :ref:`this guide <custom-sample-parser>` for more details about providing a custom UnlabeledImageSampleParser to load image samples into FiftyOne.

Parameters
paths_or_samplesan iterable of data. If no sample_parser is provided, this must be an iterable of image paths. If a sample_parser is provided, this can be an arbitrary iterable whose elements can be parsed by the sample parser
sample_parser:Nonea fiftyone.utils.data.parsers.UnlabeledImageSampleParser instance to use to parse the samples
name:Nonea name for the dataset. By default, get_default_dataset_name is used
persistent:Falsewhether the dataset should persist in the database after the session terminates
overwrite:Falsewhether to overwrite an existing dataset of the same name
tags:Nonean optional tag or iterable of tags to attach to each sample
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
Returns
a Dataset
@classmethod
def from_images_dir(cls, images_dir, name=None, persistent=False, overwrite=False, tags=None, recursive=True, progress=None): (source)

Creates a Dataset from the given directory of images.

This operation does not read the images.

Parameters
images_dira directory of images
name:Nonea name for the dataset. By default, get_default_dataset_name is used
persistent:Falsewhether the dataset should persist in the database after the session terminates
overwrite:Falsewhether to overwrite an existing dataset of the same name
tags:Nonean optional tag or iterable of tags to attach to each sample
recursive:Truewhether to recursively traverse subdirectories
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
Returns
a Dataset
@classmethod
def from_images_patt(cls, images_patt, name=None, persistent=False, overwrite=False, tags=None, progress=None): (source)

Creates a Dataset from the given glob pattern of images.

This operation does not read the images.

Parameters
images_patta glob pattern of images like /path/to/images/*.jpg
name:Nonea name for the dataset. By default, get_default_dataset_name is used
persistent:Falsewhether the dataset should persist in the database after the session terminates
overwrite:Falsewhether to overwrite an existing dataset of the same name
tags:Nonean optional tag or iterable of tags to attach to each sample
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
Returns
a Dataset
@classmethod
def from_importer(cls, dataset_importer, name=None, persistent=False, overwrite=False, label_field=None, tags=None, dynamic=False, progress=None): (source)

Creates a Dataset by importing the samples in the given fiftyone.utils.data.importers.DatasetImporter.

See :ref:`this guide <custom-dataset-importer>` for more details about providing a custom DatasetImporter to import datasets into FiftyOne.

Parameters
dataset_importera fiftyone.utils.data.importers.DatasetImporter
name:Nonea name for the dataset. By default, get_default_dataset_name is used
persistent:Falsewhether the dataset should persist in the database after the session terminates
overwrite:Falsewhether to overwrite an existing dataset of the same name
label_field:Nonecontrols the field(s) in which imported labels are stored. Only applicable if dataset_importer is a fiftyone.utils.data.importers.LabeledImageDatasetImporter or fiftyone.utils.data.importers.LabeledVideoDatasetImporter. If the importer produces a single fiftyone.core.labels.Label instance per sample/frame, this argument specifies the name of the field to use; the default is "ground_truth". If the importer produces a dictionary of labels per sample, this argument can be either a string prefix to prepend to each label key or a dict mapping label keys to field names; the default in this case is to directly use the keys of the imported label dictionaries as field names
tags:Nonean optional tag or iterable of tags to attach to each sample
dynamic:Falsewhether to declare dynamic attributes of embedded document fields that are encountered
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
Returns
a Dataset
@classmethod
def from_json(cls, path_or_str, name=None, persistent=False, overwrite=False, rel_dir=None, frame_labels_dir=None, progress=None): (source)

Loads a Dataset from JSON generated by fiftyone.core.collections.SampleCollection.write_json or fiftyone.core.collections.SampleCollection.to_json.

The JSON file can contain an export of any fiftyone.core.collections.SampleCollection, e.g., Dataset or fiftyone.core.view.DatasetView.

Parameters
path_or_strthe path to a JSON file on disk or a JSON string
name:Nonea name for the new dataset
persistent:Falsewhether the dataset should persist in the database after the session terminates
overwrite:Falsewhether to overwrite an existing dataset of the same name
rel_dir:Nonea relative directory to prepend to the filepath of each sample, if the filepath is not absolute (begins with a path separator). The path is converted to an absolute path (if necessary) via fiftyone.core.storage.normalize_path
frame_labels_dirUndocumented
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
Returns
a Dataset
@classmethod
def from_labeled_images(cls, samples, sample_parser, name=None, persistent=False, overwrite=False, label_field=None, tags=None, dynamic=False, progress=None): (source)

Creates a Dataset from the given labeled images.

This operation will iterate over all provided samples, but the images will not be read.

See :ref:`this guide <custom-sample-parser>` for more details about providing a custom LabeledImageSampleParser to load labeled image samples into FiftyOne.

Parameters
samplesan iterable of data
sample_parsera fiftyone.utils.data.parsers.LabeledImageSampleParser instance to use to parse the samples
name:Nonea name for the dataset. By default, get_default_dataset_name is used
persistent:Falsewhether the dataset should persist in the database after the session terminates
overwrite:Falsewhether to overwrite an existing dataset of the same name
label_field:Nonecontrols the field(s) in which imported labels are stored. If the parser produces a single fiftyone.core.labels.Label instance per sample, this argument specifies the name of the field to use; the default is "ground_truth". If the parser produces a dictionary of labels per sample, this argument can be either a string prefix to prepend to each label key or a dict mapping label keys to field names; the default in this case is to directly use the keys of the imported label dictionaries as field names
tags:Nonean optional tag or iterable of tags to attach to each sample
dynamic:Falsewhether to declare dynamic attributes of embedded document fields that are encountered
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
Returns
a Dataset
@classmethod
def from_labeled_videos(cls, samples, sample_parser, name=None, persistent=False, overwrite=False, label_field=None, tags=None, dynamic=False, progress=None): (source)

Creates a Dataset from the given labeled videos.

This operation will iterate over all provided samples, but the videos will not be read/decoded/etc.

See :ref:`this guide <custom-sample-parser>` for more details about providing a custom LabeledVideoSampleParser to load labeled video samples into FiftyOne.

Parameters
samplesan iterable of data
sample_parsera fiftyone.utils.data.parsers.LabeledVideoSampleParser instance to use to parse the samples
name:Nonea name for the dataset. By default, get_default_dataset_name is used
persistent:Falsewhether the dataset should persist in the database after the session terminates
overwrite:Falsewhether to overwrite an existing dataset of the same name
label_field:Nonecontrols the field(s) in which imported labels are stored. If the parser produces a single fiftyone.core.labels.Label instance per sample/frame, this argument specifies the name of the field to use; the default is "ground_truth". If the parser produces a dictionary of labels per sample/frame, this argument can be either a string prefix to prepend to each label key or a dict mapping label keys to field names; the default in this case is to directly use the keys of the imported label dictionaries as field names
tags:Nonean optional tag or iterable of tags to attach to each sample
dynamic:Falsewhether to declare dynamic attributes of embedded document fields that are encountered
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
Returns
a Dataset
@classmethod
def from_videos(cls, paths_or_samples, sample_parser=None, name=None, persistent=False, overwrite=False, tags=None, progress=None): (source)

Creates a Dataset from the given videos.

This operation does not read/decode the videos.

See :ref:`this guide <custom-sample-parser>` for more details about providing a custom UnlabeledVideoSampleParser to load video samples into FiftyOne.

Parameters
paths_or_samplesan iterable of data. If no sample_parser is provided, this must be an iterable of video paths. If a sample_parser is provided, this can be an arbitrary iterable whose elements can be parsed by the sample parser
sample_parser:Nonea fiftyone.utils.data.parsers.UnlabeledVideoSampleParser instance to use to parse the samples
name:Nonea name for the dataset. By default, get_default_dataset_name is used
persistent:Falsewhether the dataset should persist in the database after the session terminates
overwrite:Falsewhether to overwrite an existing dataset of the same name
tags:Nonean optional tag or iterable of tags to attach to each sample
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
Returns
a Dataset
@classmethod
def from_videos_dir(cls, videos_dir, name=None, persistent=False, overwrite=False, tags=None, recursive=True, progress=None): (source)

Creates a Dataset from the given directory of videos.

This operation does not read/decode the videos.

Parameters
videos_dira directory of videos
name:Nonea name for the dataset. By default, get_default_dataset_name is used
persistent:Falsewhether the dataset should persist in the database after the session terminates
overwrite:Falsewhether to overwrite an existing dataset of the same name
tags:Nonean optional tag or iterable of tags to attach to each sample
recursive:Truewhether to recursively traverse subdirectories
progressUndocumented
Returns
a Dataset
@classmethod
def from_videos_patt(cls, videos_patt, name=None, persistent=False, overwrite=False, tags=None, progress=None): (source)

Creates a Dataset from the given glob pattern of videos.

This operation does not read/decode the videos.

Parameters
videos_patta glob pattern of videos like /path/to/videos/*.mp4
name:Nonea name for the dataset. By default, get_default_dataset_name is used
persistent:Falsewhether the dataset should persist in the database after the session terminates
overwrite:Falsewhether to overwrite an existing dataset of the same name
tags:Nonean optional tag or iterable of tags to attach to each sample
progressUndocumented
Returns
a Dataset
def __copy__(self): (source)

Undocumented

def __deepcopy__(self, memo): (source)

Undocumented

def __delitem__(self, samples_or_ids): (source)

Undocumented

def __eq__(self, other): (source)

Undocumented

def __getattribute__(self, name): (source)

Undocumented

def __getitem__(self, id_filepath_slice): (source)
def __init__(self, name=None, persistent=False, overwrite=False, _create=True, _virtual=False, **kwargs): (source)

Undocumented

def add_archive(self, archive_path, dataset_type=None, data_path=None, labels_path=None, label_field=None, tags=None, expand_schema=True, dynamic=False, add_info=True, cleanup=True, progress=None, **kwargs): (source)

Adds the contents of the given archive to the dataset.

If a directory with the same root name as archive_path exists, it is assumed that this directory contains the extracted contents of the archive, and thus the archive is not re-extracted.

See :ref:`this guide <loading-datasets-from-disk>` for example usages of this method and descriptions of the available dataset types.

Note

The following archive formats are explicitly supported:

.zip, .tar, .tar.gz, .tgz, .tar.bz, .tbz

If an archive not in the above list is found, extraction will be attempted via the patool package, which supports many formats but may require that additional system packages be installed.

Parameters
archive_paththe path to an archive of a dataset directory
dataset_type:Nonethe fiftyone.types.Dataset type of the dataset in archive_path
data_path:None

an optional parameter that enables explicit control over the location of the media for certain dataset types. Can be any of the following:

  • a folder name like "data" or "data/" specifying a subfolder of dataset_dir in which the media lies
  • an absolute directory path in which the media lies. In this case, the archive_path has no effect on the location of the data
  • a filename like "data.json" specifying the filename of a JSON manifest file in archive_path that maps UUIDs to media filepaths. Files of this format are generated when passing the export_media="manifest" option to fiftyone.core.collections.SampleCollection.export
  • an absolute filepath to a JSON manifest file. In this case, archive_path has no effect on the location of the data
  • a dict mapping filenames to absolute filepaths

By default, it is assumed that the data can be located in the default location within archive_path for the dataset type

labels_path:None

an optional parameter that enables explicit control over the location of the labels. Only applicable when importing certain labeled dataset formats. Can be any of the following:

  • a type-specific folder name like "labels" or "labels/" or a filename like "labels.json" or "labels.xml" specifying the location in archive_path of the labels file(s)
  • an absolute directory or filepath containing the labels file(s). In this case, archive_path has no effect on the location of the labels

For labeled datasets, this parameter defaults to the location in archive_path of the labels for the default layout of the dataset type being imported

label_field:Nonecontrols the field(s) in which imported labels are stored. Only applicable if dataset_importer is a fiftyone.utils.data.importers.LabeledImageDatasetImporter or fiftyone.utils.data.importers.LabeledVideoDatasetImporter. If the importer produces a single fiftyone.core.labels.Label instance per sample/frame, this argument specifies the name of the field to use; the default is "ground_truth". If the importer produces a dictionary of labels per sample, this argument can be either a string prefix to prepend to each label key or a dict mapping label keys to field names; the default in this case is to directly use the keys of the imported label dictionaries as field names
tags:Nonean optional tag or iterable of tags to attach to each sample
expand_schema:Truewhether to dynamically add new sample fields encountered to the dataset schema. If False, an error is raised if a sample's schema is not a subset of the dataset schema
dynamic:Falsewhether to declare dynamic attributes of embedded document fields that are encountered
add_info:Truewhether to add dataset info from the importer (if any) to the dataset's info
cleanup:Truewhether to delete the archive after extracting it
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
**kwargsoptional keyword arguments to pass to the constructor of the fiftyone.utils.data.importers.DatasetImporter for the specified dataset_type
Returns
a list of IDs of the samples that were added to the dataset
def add_collection(self, sample_collection, include_info=True, overwrite_info=False, new_ids=False, progress=None): (source)

Adds the contents of the given collection to the dataset.

This method is a special case of Dataset.merge_samples that adds samples with new IDs to this dataset and omits any samples with existing IDs (the latter would only happen in rare cases).

Use Dataset.merge_samples if you have multiple datasets whose samples refer to the same source media.

Parameters
sample_collectiona fiftyone.core.collections.SampleCollection
include_info:Truewhether to merge dataset-level information such as info and classes
overwrite_info:Falsewhether to overwrite existing dataset-level information. Only applicable when include_info is True
new_ids:Falsewhether to generate new sample/frame/group IDs. By default, the IDs of the input collection are retained
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
Returns
a list of IDs of the samples that were added to this dataset
def add_dir(self, dataset_dir=None, dataset_type=None, data_path=None, labels_path=None, label_field=None, tags=None, expand_schema=True, dynamic=False, add_info=True, progress=None, **kwargs): (source)

Adds the contents of the given directory to the dataset.

You can perform imports with this method via the following basic patterns:

  1. Provide dataset_dir and dataset_type to import the contents of a directory that is organized in the default layout for the dataset type as documented in :ref:`this guide <loading-datasets-from-disk>`
  2. Provide dataset_type along with data_path, labels_path, or other type-specific parameters to perform a customized import. This syntax provides the flexibility to, for example, perform labels-only imports or imports where the source media lies in a different location than the labels

In either workflow, the remaining parameters of this method can be provided to further configure the import.

See :ref:`this guide <loading-datasets-from-disk>` for example usages of this method and descriptions of the available dataset types.

Parameters
dataset_dir:Nonethe dataset directory. This can be omitted for certain dataset formats if you provide arguments such as data_path and labels_path
dataset_type:Nonethe fiftyone.types.Dataset type of the dataset
data_path:None

an optional parameter that enables explicit control over the location of the media for certain dataset types. Can be any of the following:

  • a folder name like "data" or "data/" specifying a subfolder of dataset_dir in which the media lies
  • an absolute directory path in which the media lies. In this case, the dataset_dir has no effect on the location of the data
  • a filename like "data.json" specifying the filename of a JSON manifest file in dataset_dir that maps UUIDs to media filepaths. Files of this format are generated when passing the export_media="manifest" option to fiftyone.core.collections.SampleCollection.export
  • an absolute filepath to a JSON manifest file. In this case, dataset_dir has no effect on the location of the data
  • a dict mapping filenames to absolute filepaths

By default, it is assumed that the data can be located in the default location within dataset_dir for the dataset type

labels_path:None

an optional parameter that enables explicit control over the location of the labels. Only applicable when importing certain labeled dataset formats. Can be any of the following:

  • a type-specific folder name like "labels" or "labels/" or a filename like "labels.json" or "labels.xml" specifying the location in dataset_dir of the labels file(s)
  • an absolute directory or filepath containing the labels file(s). In this case, dataset_dir has no effect on the location of the labels

For labeled datasets, this parameter defaults to the location in dataset_dir of the labels for the default layout of the dataset type being imported

label_field:Nonecontrols the field(s) in which imported labels are stored. Only applicable if dataset_importer is a fiftyone.utils.data.importers.LabeledImageDatasetImporter or fiftyone.utils.data.importers.LabeledVideoDatasetImporter. If the importer produces a single fiftyone.core.labels.Label instance per sample/frame, this argument specifies the name of the field to use; the default is "ground_truth". If the importer produces a dictionary of labels per sample, this argument can be either a string prefix to prepend to each label key or a dict mapping label keys to field names; the default in this case is to directly use the keys of the imported label dictionaries as field names
tags:Nonean optional tag or iterable of tags to attach to each sample
expand_schema:Truewhether to dynamically add new sample fields encountered to the dataset schema. If False, an error is raised if a sample's schema is not a subset of the dataset schema
dynamic:Falsewhether to declare dynamic attributes of embedded document fields that are encountered
add_info:Truewhether to add dataset info from the importer (if any) to the dataset's info
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
**kwargsoptional keyword arguments to pass to the constructor of the fiftyone.utils.data.importers.DatasetImporter for the specified dataset_type
Returns
a list of IDs of the samples that were added to the dataset
def add_dynamic_frame_fields(self, fields=None, recursive=True, add_mixed=False): (source)

Adds all dynamic frame fields to the dataset's schema.

Dynamic fields are embedded document fields with at least one non-None value that have not been declared on the dataset's schema.

Parameters
fields:Nonean optional field or iterable of fields for which to add dynamic fields. By default, all fields are considered
recursive:Truewhether to recursively inspect nested lists and embedded documents for dynamic fields
add_mixed:Falsewhether to declare fields that contain values of mixed types as generic fiftyone.core.fields.Field instances (True) or to skip such fields (False)
def add_dynamic_sample_fields(self, fields=None, recursive=True, add_mixed=False): (source)

Adds all dynamic sample fields to the dataset's schema.

Dynamic fields are embedded document fields with at least one non-None value that have not been declared on the dataset's schema.

Parameters
fields:Nonean optional field or iterable of fields for which to add dynamic fields. By default, all fields are considered
recursive:Truewhether to recursively inspect nested lists and embedded documents for dynamic fields
add_mixed:Falsewhether to declare fields that contain values of mixed types as generic fiftyone.core.fields.Field instances (True) or to skip such fields (False)
def add_frame_field(self, field_name, ftype, embedded_doc_type=None, subfield=None, fields=None, description=None, info=None, read_only=False, **kwargs): (source)

Adds a new frame-level field or embedded field to the dataset, if necessary.

Only applicable to datasets that contain videos.

Parameters
field_namethe field name or embedded.field.name
ftypethe field type to create. Must be a subclass of fiftyone.core.fields.Field
embedded_doc_type:Nonethe fiftyone.core.odm.BaseEmbeddedDocument type of the field. Only applicable when ftype is fiftyone.core.fields.EmbeddedDocumentField
subfield:Nonethe fiftyone.core.fields.Field type of the contained field. Only applicable when ftype is fiftyone.core.fields.ListField or fiftyone.core.fields.DictField
fields:Nonea list of fiftyone.core.fields.Field instances defining embedded document attributes. Only applicable when ftype is fiftyone.core.fields.EmbeddedDocumentField
description:Nonean optional description
info:Nonean optional info dict
read_only:Falsewhether the field should be read-only
**kwargsUndocumented
Raises
ValueErrorif a field of the same name already exists and it is not compliant with the specified values
def add_group_field(self, field_name, default=None, description=None, info=None, read_only=False): (source)

Adds a group field to the dataset, if necessary.

Parameters
field_namethe field name
default:Nonea default group slice for the field
description:Nonean optional description
info:Nonean optional info dict
read_only:Falsewhether the field should be read-only
Raises
ValueErrorif a group field with another name already exists
def add_group_slice(self, name, media_type): (source)

Adds a group slice with the given media type to the dataset, if necessary.

Parameters
namea group slice name
media_typethe media type of the slice
def add_images(self, paths_or_samples, sample_parser=None, tags=None, progress=None): (source)

Adds the given images to the dataset.

This operation does not read the images.

See :ref:`this guide <custom-sample-parser>` for more details about adding images to a dataset by defining your own UnlabeledImageSampleParser.

Parameters
paths_or_samplesan iterable of data. If no sample_parser is provided, this must be an iterable of image paths. If a sample_parser is provided, this can be an arbitrary iterable whose elements can be parsed by the sample parser
sample_parser:Nonea fiftyone.utils.data.parsers.UnlabeledImageSampleParser instance to use to parse the samples
tags:Nonean optional tag or iterable of tags to attach to each sample
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
Returns
a list of IDs of the samples that were added to the dataset
def add_images_dir(self, images_dir, tags=None, recursive=True, progress=None): (source)

Adds the given directory of images to the dataset.

See fiftyone.types.ImageDirectory for format details. In particular, note that files with non-image MIME types are omitted.

This operation does not read the images.

Parameters
images_dira directory of images
tags:Nonean optional tag or iterable of tags to attach to each sample
recursive:Truewhether to recursively traverse subdirectories
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
Returns
a list of IDs of the samples in the dataset
def add_images_patt(self, images_patt, tags=None, progress=None): (source)

Adds the given glob pattern of images to the dataset.

This operation does not read the images.

Parameters
images_patta glob pattern of images like /path/to/images/*.jpg
tags:Nonean optional tag or iterable of tags to attach to each sample
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
Returns
a list of IDs of the samples in the dataset
def add_importer(self, dataset_importer, label_field=None, tags=None, expand_schema=True, dynamic=False, add_info=True, progress=None): (source)

Adds the samples from the given fiftyone.utils.data.importers.DatasetImporter to the dataset.

See :ref:`this guide <custom-dataset-importer>` for more details about importing datasets in custom formats by defining your own DatasetImporter.

Parameters
dataset_importera fiftyone.utils.data.importers.DatasetImporter
label_field:Nonecontrols the field(s) in which imported labels are stored. Only applicable if dataset_importer is a fiftyone.utils.data.importers.LabeledImageDatasetImporter or fiftyone.utils.data.importers.LabeledVideoDatasetImporter. If the importer produces a single fiftyone.core.labels.Label instance per sample/frame, this argument specifies the name of the field to use; the default is "ground_truth". If the importer produces a dictionary of labels per sample, this argument can be either a string prefix to prepend to each label key or a dict mapping label keys to field names; the default in this case is to directly use the keys of the imported label dictionaries as field names
tags:Nonean optional tag or iterable of tags to attach to each sample
expand_schema:Truewhether to dynamically add new sample fields encountered to the dataset schema. If False, an error is raised if a sample's schema is not a subset of the dataset schema
dynamic:Falsewhether to declare dynamic attributes of embedded document fields that are encountered
add_info:Truewhether to add dataset info from the importer (if any) to the dataset's info
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
Returns
a list of IDs of the samples that were added to the dataset
def add_labeled_images(self, samples, sample_parser, label_field=None, tags=None, expand_schema=True, dynamic=False, progress=None): (source)

Adds the given labeled images to the dataset.

This operation will iterate over all provided samples, but the images will not be read (unless the sample parser requires it in order to compute image metadata).

See :ref:`this guide <custom-sample-parser>` for more details about adding labeled images to a dataset by defining your own LabeledImageSampleParser.

Parameters
samplesan iterable of data
sample_parsera fiftyone.utils.data.parsers.LabeledImageSampleParser instance to use to parse the samples
label_field:Nonecontrols the field(s) in which imported labels are stored. If the parser produces a single fiftyone.core.labels.Label instance per sample, this argument specifies the name of the field to use; the default is "ground_truth". If the parser produces a dictionary of labels per sample, this argument can be either a string prefix to prepend to each label key or a dict mapping label keys to field names; the default in this case is to directly use the keys of the imported label dictionaries as field names
tags:Nonean optional tag or iterable of tags to attach to each sample
expand_schema:Truewhether to dynamically add new sample fields encountered to the dataset schema. If False, an error is raised if a sample's schema is not a subset of the dataset schema
dynamic:Falsewhether to declare dynamic attributes of embedded document fields that are encountered
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
Returns
a list of IDs of the samples that were added to the dataset
def add_labeled_videos(self, samples, sample_parser, label_field=None, tags=None, expand_schema=True, dynamic=False, progress=None): (source)

Adds the given labeled videos to the dataset.

This operation will iterate over all provided samples, but the videos will not be read/decoded/etc.

See :ref:`this guide <custom-sample-parser>` for more details about adding labeled videos to a dataset by defining your own LabeledVideoSampleParser.

Parameters
samplesan iterable of data
sample_parsera fiftyone.utils.data.parsers.LabeledVideoSampleParser instance to use to parse the samples
label_field:"ground_truth"the name (or root name) of the frame field(s) to use for the labels
tags:Nonean optional tag or iterable of tags to attach to each sample
expand_schema:Truewhether to dynamically add new sample fields encountered to the dataset schema. If False, an error is raised if a sample's schema is not a subset of the dataset schema
dynamic:Falsewhether to declare dynamic attributes of embedded document fields that are encountered
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
Returns
a list of IDs of the samples that were added to the dataset
def add_sample(self, sample, expand_schema=True, dynamic=False, validate=True): (source)

Adds the given sample to the dataset.

If the sample instance does not belong to a dataset, it is updated in-place to reflect its membership in this dataset. If the sample instance belongs to another dataset, it is not modified.

Parameters
samplea fiftyone.core.sample.Sample
expand_schema:Truewhether to dynamically add new sample fields encountered to the dataset schema. If False, an error is raised if the sample's schema is not a subset of the dataset schema
dynamic:Falsewhether to declare dynamic attributes of embedded document fields that are encountered
validate:Truewhether to validate that the fields of the sample are compliant with the dataset schema before adding it
Returns
the ID of the sample in the dataset
def add_sample_field(self, field_name, ftype, embedded_doc_type=None, subfield=None, fields=None, description=None, info=None, read_only=False, **kwargs): (source)

Adds a new sample field or embedded field to the dataset, if necessary.

Parameters
field_namethe field name or embedded.field.name
ftypethe field type to create. Must be a subclass of fiftyone.core.fields.Field
embedded_doc_type:Nonethe fiftyone.core.odm.BaseEmbeddedDocument type of the field. Only applicable when ftype is fiftyone.core.fields.EmbeddedDocumentField
subfield:Nonethe fiftyone.core.fields.Field type of the contained field. Only applicable when ftype is fiftyone.core.fields.ListField or fiftyone.core.fields.DictField
fields:Nonea list of fiftyone.core.fields.Field instances defining embedded document attributes. Only applicable when ftype is fiftyone.core.fields.EmbeddedDocumentField
description:Nonean optional description
info:Nonean optional info dict
read_only:Falsewhether the field should be read-only
**kwargsUndocumented
Raises
ValueErrorif a field of the same name already exists and it is not compliant with the specified values
def add_samples(self, samples, expand_schema=True, dynamic=False, validate=True, progress=None, num_samples=None): (source)

Adds the given samples to the dataset.

Any sample instances that do not belong to a dataset are updated in-place to reflect membership in this dataset. Any sample instances that belong to other datasets are not modified.

Parameters
samplesan iterable of fiftyone.core.sample.Sample instances or a fiftyone.core.collections.SampleCollection
expand_schema:Truewhether to dynamically add new sample fields encountered to the dataset schema. If False, an error is raised if a sample's schema is not a subset of the dataset schema
dynamic:Falsewhether to declare dynamic attributes of embedded document fields that are encountered
validate:Truewhether to validate that the fields of each sample are compliant with the dataset schema before adding it
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
num_samples:Nonethe number of samples in samples. If not provided, this is computed (if possible) via len(samples) if needed for progress tracking
Returns
a list of IDs of the samples in the dataset
def add_videos(self, paths_or_samples, sample_parser=None, tags=None, progress=None): (source)

Adds the given videos to the dataset.

This operation does not read the videos.

See :ref:`this guide <custom-sample-parser>` for more details about adding videos to a dataset by defining your own UnlabeledVideoSampleParser.

Parameters
paths_or_samplesan iterable of data. If no sample_parser is provided, this must be an iterable of video paths. If a sample_parser is provided, this can be an arbitrary iterable whose elements can be parsed by the sample parser
sample_parser:Nonea fiftyone.utils.data.parsers.UnlabeledVideoSampleParser instance to use to parse the samples
tags:Nonean optional tag or iterable of tags to attach to each sample
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
Returns
a list of IDs of the samples that were added to the dataset
def add_videos_dir(self, videos_dir, tags=None, recursive=True, progress=None): (source)

Adds the given directory of videos to the dataset.

See fiftyone.types.VideoDirectory for format details. In particular, note that files with non-video MIME types are omitted.

This operation does not read/decode the videos.

Parameters
videos_dira directory of videos
tags:Nonean optional tag or iterable of tags to attach to each sample
recursive:Truewhether to recursively traverse subdirectories
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
Returns
a list of IDs of the samples in the dataset
def add_videos_patt(self, videos_patt, tags=None, progress=None): (source)

Adds the given glob pattern of videos to the dataset.

This operation does not read/decode the videos.

Parameters
videos_patta glob pattern of videos like /path/to/videos/*.mp4
tags:Nonean optional tag or iterable of tags to attach to each sample
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
Returns
a list of IDs of the samples in the dataset
@app_config.setter
def app_config(self, config): (source)

Undocumented

def check_summary_fields(self): (source)

Returns a list of summary fields that may need to be updated.

Summary fields may need to be updated whenever there have been modifications to the dataset's samples since the summaries were last generated.

Note that inclusion in this list is only a heuristic, as any sample modifications may not have affected the summary's source field.

Returns
list of summary field names
@classes.setter
def classes(self, classes): (source)

Undocumented

def clear(self): (source)

Removes all samples from the dataset.

If reference to a sample exists in memory, the sample will be updated such that sample.in_dataset is False.

def clear_cache(self): (source)

Clears the dataset's in-memory cache.

Dataset caches may contain sample/frame singletons and annotation/brain/evaluation/custom runs.

def clear_frame_field(self, field_name): (source)

Clears the values of the frame-level field from all samples in the dataset.

The field will remain in the dataset's frame schema, and all frames will have the value None for the field.

You can use dot notation (embedded.field.name) to clear embedded frame fields.

Only applicable to datasets that contain videos.

Parameters
field_namethe field name or embedded.field.name
def clear_frame_fields(self, field_names): (source)

Clears the values of the frame-level fields from all samples in the dataset.

The fields will remain in the dataset's frame schema, and all frames will have the value None for the field.

You can use dot notation (embedded.field.name) to clear embedded frame fields.

Only applicable to datasets that contain videos.

Parameters
field_namesthe field name or iterable of field names
def clear_frames(self): (source)

Removes all frame labels from the dataset.

If reference to a frame exists in memory, the frame will be updated such that frame.in_dataset is False.

def clear_sample_field(self, field_name): (source)

Clears the values of the field from all samples in the dataset.

The field will remain in the dataset's schema, and all samples will have the value None for the field.

You can use dot notation (embedded.field.name) to clear embedded fields.

Parameters
field_namethe field name or embedded.field.name
def clear_sample_fields(self, field_names): (source)

Clears the values of the fields from all samples in the dataset.

The field will remain in the dataset's schema, and all samples will have the value None for the field.

You can use dot notation (embedded.field.name) to clear embedded fields.

Parameters
field_namesthe field name or iterable of field names
def clone(self, name=None, persistent=False): (source)

Creates a copy of the dataset.

Dataset clones contain deep copies of all samples and dataset-level information in the source dataset. The source media files, however, are not copied.

Parameters
name:Nonea name for the cloned dataset. By default, get_default_dataset_name is used
persistent:Falsewhether the cloned dataset should be persistent
Returns
the new Dataset
def clone_frame_field(self, field_name, new_field_name): (source)

Clones the frame-level field into a new field.

You can use dot notation (embedded.field.name) to clone embedded frame fields.

Only applicable to datasets that contain videos.

Parameters
field_namethe field name or embedded.field.name
new_field_namethe new field name or embedded.field.name
def clone_frame_fields(self, field_mapping): (source)

Clones the frame-level fields into new fields.

You can use dot notation (embedded.field.name) to clone embedded frame fields.

Only applicable to datasets that contain videos.

Parameters
field_mappinga dict mapping field names to new field names into which to clone each field
def clone_sample_field(self, field_name, new_field_name): (source)

Clones the given sample field into a new field of the dataset.

You can use dot notation (embedded.field.name) to clone embedded fields.

Parameters
field_namethe field name or embedded.field.name
new_field_namethe new field name or embedded.field.name
def clone_sample_fields(self, field_mapping): (source)

Clones the given sample fields into new fields of the dataset.

You can use dot notation (embedded.field.name) to clone embedded fields.

Parameters
field_mappinga dict mapping field names to new field names into which to clone each field
def create_summary_field(self, path, field_name=None, sidebar_group=None, include_counts=False, group_by=None, read_only=True, create_index=True): (source)

Populates a sample-level field that records the unique values or numeric ranges that appear in the specified field on each sample in the dataset.

This method is particularly useful for summarizing frame-level fields of video datasets, in which case the sample-level field records the unique values or numeric ranges that appear in the specified frame-level field across all frames of that sample. This summary field can then be efficiently queried to retrieve samples that contain specific values of interest in at least one frame.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F

dataset = foz.load_zoo_dataset("quickstart-video")
dataset.set_field("frames.detections.detections.confidence", F.rand()).save()

# Generate a summary field for object labels
dataset.create_summary_field("frames.detections.detections.label")

# Generate a summary field for [min, max] confidences
dataset.create_summary_field("frames.detections.detections.confidence")

# Generate a summary field for object labels and counts
dataset.create_summary_field(
    "frames.detections.detections.label",
    field_name="frames_detections_label2",
    include_counts=True,
)

# Generate a summary field for per-label [min, max] confidences
dataset.create_summary_field(
    "frames.detections.detections.confidence",
    field_name="frames_detections_confidence2",
    group_by="label",
)

print(dataset.list_summary_fields())
Parameters
pathan input field path
field_name:Nonethe sample-level field in which to store the summary data. By default, a suitable name is derived from the given path
sidebar_group:Nonethe name of a :ref:`App sidebar group <app-sidebar-groups>` to which to add the summary field. By default, all summary fields are added to a "summaries" group. You can pass False to skip sidebar group modification
include_counts:Falsewhether to include per-value counts when summarizing categorical fields
group_by:Nonean optional attribute to group by when path is a numeric field to generate per-attribute [min, max] ranges. This may either be an absolute path or an attribute name that is interpreted relative to path
read_only:Truewhether to mark the summary field as read-only
create_index:Truewhether to create database index(es) for the summary field
Returns
the summary field name
@default_classes.setter
def default_classes(self, classes): (source)

Undocumented

@default_group_slice.setter
def default_group_slice(self, slice_name): (source)

Undocumented

@default_mask_targets.setter
def default_mask_targets(self, targets): (source)

Undocumented

@default_skeleton.setter
def default_skeleton(self, skeleton): (source)

Undocumented

def delete(self): (source)

Deletes the dataset.

Once deleted, only the name and deleted attributes of a dataset may be accessed.

If reference to a sample exists in memory, the sample will be updated such that sample.in_dataset is False.

def delete_frame_field(self, field_name, error_level=0): (source)

Deletes the frame-level field from all samples in the dataset.

You can use dot notation (embedded.field.name) to delete embedded frame fields.

Only applicable to datasets that contain videos.

Parameters
field_namethe field name or embedded.field.name
error_level:0the error level to use. Valid values are:
- 0raise error if a top-level field cannot be deleted
- 1log warning if a top-level field cannot be deleted
- 2ignore top-level fields that cannot be deleted
def delete_frame_fields(self, field_names, error_level=0): (source)

Deletes the frame-level fields from all samples in the dataset.

You can use dot notation (embedded.field.name) to delete embedded frame fields.

Only applicable to datasets that contain videos.

Parameters
field_namesa field name or iterable of field names
error_level:0the error level to use. Valid values are:
- 0raise error if a top-level field cannot be deleted
- 1log warning if a top-level field cannot be deleted
- 2ignore top-level fields that cannot be deleted
def delete_frames(self, frames_or_ids): (source)

Deletes the given frames(s) from the dataset.

If reference to a frame exists in memory, the frame will be updated such that frame.in_dataset is False.

Parameters
frames_or_ids

the frame(s) to delete. Can be any of the following:

def delete_group_slice(self, name): (source)

Deletes all samples in the given group slice from the dataset.

Parameters
namea group slice name
def delete_groups(self, groups_or_ids): (source)

Deletes the given groups(s) from the dataset.

If reference to a sample exists in memory, the sample will be updated such that sample.in_dataset is False.

Parameters
groups_or_ids

the group(s) to delete. Can be any of the following:

def delete_labels(self, labels=None, ids=None, tags=None, view=None, fields=None): (source)

Deletes the specified labels from the dataset.

You can specify the labels to delete via any of the following methods:

  • Provide the labels argument, which should contain a list of dicts in the format returned by fiftyone.core.session.Session.selected_labels
  • Provide the ids or tags arguments to specify the labels to delete via their IDs and/or tags
  • Provide the view argument to delete all of the labels in a view into this dataset. This syntax is useful if you have constructed a fiftyone.core.view.DatasetView defining the labels to delete

Additionally, you can specify the fields argument to restrict deletion to specific field(s), either for efficiency or to ensure that labels from other fields are not deleted if their contents are included in the other arguments.

Parameters
labels:Nonea list of dicts specifying the labels to delete in the format returned by fiftyone.core.session.Session.selected_labels
ids:Nonean ID or iterable of IDs of the labels to delete
tags:Nonea tag or iterable of tags of the labels to delete
view:Nonea fiftyone.core.view.DatasetView into this dataset containing the labels to delete
fields:Nonea field or iterable of fields from which to delete labels
def delete_sample_field(self, field_name, error_level=0): (source)

Deletes the field from all samples in the dataset.

You can use dot notation (embedded.field.name) to delete embedded fields.

Parameters
field_namethe field name or embedded.field.name
error_level:0the error level to use. Valid values are:
- 0raise error if a top-level field cannot be deleted
- 1log warning if a top-level field cannot be deleted
- 2ignore top-level fields that cannot be deleted
def delete_sample_fields(self, field_names, error_level=0): (source)

Deletes the fields from all samples in the dataset.

You can use dot notation (embedded.field.name) to delete embedded fields.

Parameters
field_namesthe field name or iterable of field names
error_level:0the error level to use. Valid values are:
- 0raise error if a top-level field cannot be deleted
- 1log warning if a top-level field cannot be deleted
- 2ignore top-level fields that cannot be deleted
def delete_samples(self, samples_or_ids): (source)

Deletes the given sample(s) from the dataset.

If reference to a sample exists in memory, the sample will be updated such that sample.in_dataset is False.

Parameters
samples_or_ids

the sample(s) to delete. Can be any of the following:

def delete_saved_view(self, name): (source)

Deletes the saved view with the given name.

Parameters
namethe name of a saved view
def delete_saved_views(self): (source)

Deletes all saved views from this dataset.

def delete_summary_field(self, field_name, error_level=0): (source)

Deletes the summary field from all samples in the dataset.

Parameters
field_namethe summary field
error_level:0the error level to use. Valid values are:
- 0raise error if a summary field cannot be deleted
- 1log warning if a summary field cannot be deleted
- 2ignore summary fields that cannot be deleted
def delete_summary_fields(self, field_names, error_level=0): (source)

Deletes the summary fields from all samples in the dataset.

Parameters
field_namesthe summary field or iterable of summary fields
error_level:0the error level to use. Valid values are:
- 0raise error if a summary field cannot be deleted
- 1log warning if a summary field cannot be deleted
- 2ignore summary fields that cannot be deleted
def delete_workspace(self, name): (source)

Deletes the saved workspace with the given name.

Parameters
namethe name of a saved workspace
Raises
ValueErrorif name is not a saved workspace
def delete_workspaces(self): (source)

Deletes all saved workspaces from this dataset.

def ensure_frames(self): (source)

Ensures that the video dataset contains frame instances for every frame of each sample's source video.

Empty frames will be inserted for missing frames, and already existing frames are left unchanged.

def first(self): (source)

Returns the first sample in the dataset.

Returns
a fiftyone.core.sample.Sample
def get_field_schema(self, ftype=None, embedded_doc_type=None, read_only=None, info_keys=None, created_after=None, include_private=False, flat=False, mode=None): (source)

Returns a schema dictionary describing the fields of the samples in the dataset.

Parameters
ftype:Nonean optional field type or iterable of types to which to restrict the returned schema. Must be subclass(es) of fiftyone.core.fields.Field
embedded_doc_type:Nonean optional embedded document type or iterable of types to which to restrict the returned schema. Must be subclass(es) of fiftyone.core.odm.BaseEmbeddedDocument
read_only:Nonewhether to restrict to (True) or exclude (False) read-only fields. By default, all fields are included
info_keys:Nonean optional key or list of keys that must be in the field's info dict
created_after:Nonean optional datetime specifying a minimum creation date
include_private:Falsewhether to include fields that start with _ in the returned schema
flat:Falsewhether to return a flattened schema where all embedded document fields are included as top-level keys
mode:Nonewhether to apply the above constraints before and/or after flattening the schema. Only applicable when flat is True. Supported values are ("before", "after", "both"). The default is "after"
Returns
a dict mapping field names to fiftyone.core.fields.Field instances
def get_frame_field_schema(self, ftype=None, embedded_doc_type=None, read_only=None, info_keys=None, created_after=None, include_private=False, flat=False, mode=None): (source)

Returns a schema dictionary describing the fields of the frames of the samples in the dataset.

Only applicable for datasets that contain videos.

Parameters
ftype:Nonean optional field type or iterable of types to which to restrict the returned schema. Must be subclass(es) of fiftyone.core.fields.Field
embedded_doc_type:Nonean optional embedded document type or iterable of types to which to restrict the returned schema. Must be subclass(es) of fiftyone.core.odm.BaseEmbeddedDocument
read_only:Nonewhether to restrict to (True) or exclude (False) read-only fields. By default, all fields are included
info_keys:Nonean optional key or list of keys that must be in the field's info dict
created_after:Nonean optional datetime specifying a minimum creation date
include_private:Falsewhether to include fields that start with _ in the returned schema
flat:Falsewhether to return a flattened schema where all embedded document fields are included as top-level keys
mode:Nonewhether to apply the above constraints before and/or after flattening the schema. Only applicable when flat is True. Supported values are ("before", "after", "both"). The default is "after"
Returns
a dict mapping field names to fiftyone.core.fields.Field instances, or None if the dataset does not contain videos
def get_group(self, group_id, group_slices=None): (source)

Returns a dict containing the samples for the given group ID.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart-groups")

group_id = dataset.take(1).first().group.id
group = dataset.get_group(group_id)

print(group.keys())
# ['left', 'right', 'pcd']
Parameters
group_ida group ID
group_slices:Nonean optional subset of group slices to load
Returns
a dict mapping group names to fiftyone.core.sample.Sample instances
Raises
KeyErrorif the group ID is not found
def get_saved_view_info(self, name): (source)

Loads the editable information about the saved view with the given name.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart")

view = dataset.limit(10)
dataset.save_view("test", view)

print(dataset.get_saved_view_info("test"))
Parameters
namethe name of a saved view
Returns
a dict of editable info
def get_workspace_info(self, name): (source)

Gets the information about the workspace with the given name.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart")

workspace = fo.Space()
description = "A really cool (apparently empty?) workspace"
dataset.save_workspace("test", workspace, description=description)

print(dataset.get_workspace_info("test"))
Parameters
namethe name of a saved view
Returns
a dict of editable info
@group_slice.setter
def group_slice(self, slice_name): (source)

Undocumented

def has_saved_view(self, name): (source)

Whether this dataset has a saved view with the given name.

Parameters
namea saved view name
Returns
True/False
def has_workspace(self, name): (source)

Whether this dataset has a saved workspace with the given name.

Parameters
namea saved workspace name
Returns
True/False
def head(self, num_samples=3): (source)

Returns a list of the first few samples in the dataset.

If fewer than num_samples samples are in the dataset, only the available samples are returned.

Parameters
num_samples:3the number of samples
Returns
a list of fiftyone.core.sample.Sample objects
@info.setter
def info(self, info): (source)

Undocumented

def ingest_images(self, paths_or_samples, sample_parser=None, tags=None, dataset_dir=None, image_format=None, progress=None): (source)

Ingests the given iterable of images into the dataset.

The images are read in-memory and written to dataset_dir.

See :ref:`this guide <custom-sample-parser>` for more details about ingesting images into a dataset by defining your own UnlabeledImageSampleParser.

Parameters
paths_or_samplesan iterable of data. If no sample_parser is provided, this must be an iterable of image paths. If a sample_parser is provided, this can be an arbitrary iterable whose elements can be parsed by the sample parser
sample_parser:Nonea fiftyone.utils.data.parsers.UnlabeledImageSampleParser instance to use to parse the samples
tags:Nonean optional tag or iterable of tags to attach to each sample
dataset_dir:Nonethe directory in which the images will be written. By default, get_default_dataset_dir is used
image_format:Nonethe image format to use to write the images to disk. By default, fiftyone.config.default_image_ext is used
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
Returns
a list of IDs of the samples in the dataset
def ingest_labeled_images(self, samples, sample_parser, label_field=None, tags=None, expand_schema=True, dynamic=False, dataset_dir=None, image_format=None, progress=None): (source)

Ingests the given iterable of labeled image samples into the dataset.

The images are read in-memory and written to dataset_dir.

See :ref:`this guide <custom-sample-parser>` for more details about ingesting labeled images into a dataset by defining your own LabeledImageSampleParser.

Parameters
samplesan iterable of data
sample_parsera fiftyone.utils.data.parsers.LabeledImageSampleParser instance to use to parse the samples
label_field:Nonecontrols the field(s) in which imported labels are stored. If the parser produces a single fiftyone.core.labels.Label instance per sample, this argument specifies the name of the field to use; the default is "ground_truth". If the parser produces a dictionary of labels per sample, this argument can be either a string prefix to prepend to each label key or a dict mapping label keys to field names; the default in this case is to directly use the keys of the imported label dictionaries as field names
tags:Nonean optional tag or iterable of tags to attach to each sample
expand_schema:Truewhether to dynamically add new sample fields encountered to the dataset schema. If False, an error is raised if the sample's schema is not a subset of the dataset schema
dynamic:Falsewhether to declare dynamic attributes of embedded document fields that are encountered
dataset_dir:Nonethe directory in which the images will be written. By default, get_default_dataset_dir is used
image_format:Nonethe image format to use to write the images to disk. By default, fiftyone.config.default_image_ext is used
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
Returns
a list of IDs of the samples in the dataset
def ingest_labeled_videos(self, samples, sample_parser, tags=None, expand_schema=True, dynamic=False, dataset_dir=None, progress=None): (source)

Ingests the given iterable of labeled video samples into the dataset.

The videos are copied to dataset_dir.

See :ref:`this guide <custom-sample-parser>` for more details about ingesting labeled videos into a dataset by defining your own LabeledVideoSampleParser.

Parameters
samplesan iterable of data
sample_parsera fiftyone.utils.data.parsers.LabeledVideoSampleParser instance to use to parse the samples
tags:Nonean optional tag or iterable of tags to attach to each sample
expand_schema:Truewhether to dynamically add new sample fields encountered to the dataset schema. If False, an error is raised if the sample's schema is not a subset of the dataset schema
dynamic:Falsewhether to declare dynamic attributes of embedded document fields that are encountered
dataset_dir:Nonethe directory in which the videos will be written. By default, get_default_dataset_dir is used
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
Returns
a list of IDs of the samples in the dataset
def ingest_videos(self, paths_or_samples, sample_parser=None, tags=None, dataset_dir=None, progress=None): (source)

Ingests the given iterable of videos into the dataset.

The videos are copied to dataset_dir.

See :ref:`this guide <custom-sample-parser>` for more details about ingesting videos into a dataset by defining your own UnlabeledVideoSampleParser.

Parameters
paths_or_samplesan iterable of data. If no sample_parser is provided, this must be an iterable of video paths. If a sample_parser is provided, this can be an arbitrary iterable whose elements can be parsed by the sample parser
sample_parser:Nonea fiftyone.utils.data.parsers.UnlabeledVideoSampleParser instance to use to parse the samples
tags:Nonean optional tag or iterable of tags to attach to each sample
dataset_dir:Nonethe directory in which the videos will be written. By default, get_default_dataset_dir is used
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
Returns
a list of IDs of the samples in the dataset
def iter_groups(self, group_slices=None, progress=False, autosave=False, batch_size=None, batching_strategy=None): (source)

Returns an iterator over the groups in the dataset.

Examples:

import random as r
import string as s

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart-groups")

def make_label():
    return "".join(r.choice(s.ascii_letters) for i in range(10))

# No save context
for group in dataset.iter_groups(progress=True):
    for sample in group.values():
        sample["test"] = make_label()
        sample.save()

# Save using default batching strategy
for group in dataset.iter_groups(progress=True, autosave=True):
    for sample in group.values():
        sample["test"] = make_label()

# Save in batches of 10
for group in dataset.iter_groups(
    progress=True, autosave=True, batch_size=10
):
    for sample in group.values():
        sample["test"] = make_label()

# Save every 0.5 seconds
for group in dataset.iter_groups(
    progress=True, autosave=True, batch_size=0.5
):
    for sample in group.values():
        sample["test"] = make_label()
Parameters
group_slices:Nonean optional subset of group slices to load
progress:Falsewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
autosave:Falsewhether to automatically save changes to samples emitted by this iterator
batch_size:Nonethe batch size to use when autosaving samples. If a batching_strategy is provided, this parameter configures the strategy as described below. If no batching_strategy is provided, this can either be an integer specifying the number of samples to save in a batch (in which case batching_strategy is implicitly set to "static") or a float number of seconds between batched saves (in which case batching_strategy is implicitly set to "latency")
batching_strategy:None

the batching strategy to use for each save operation when autosaving samples. Supported values are:

  • "static": a fixed sample batch size for each save
  • "size": a target batch size, in bytes, for each save
  • "latency": a target latency, in seconds, between saves

By default, fo.config.default_batcher is used

Returns
an iterator that emits dicts mapping group slice names to fiftyone.core.sample.Sample instances, one per group
def iter_samples(self, progress=False, autosave=False, batch_size=None, batching_strategy=None): (source)

Returns an iterator over the samples in the dataset.

Examples:

import random as r
import string as s

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("cifar10", split="test")

def make_label():
    return "".join(r.choice(s.ascii_letters) for i in range(10))

# No save context
for sample in dataset.iter_samples(progress=True):
    sample.ground_truth.label = make_label()
    sample.save()

# Save using default batching strategy
for sample in dataset.iter_samples(progress=True, autosave=True):
    sample.ground_truth.label = make_label()

# Save in batches of 10
for sample in dataset.iter_samples(
    progress=True, autosave=True, batch_size=10
):
    sample.ground_truth.label = make_label()

# Save every 0.5 seconds
for sample in dataset.iter_samples(
    progress=True, autosave=True, batch_size=0.5
):
    sample.ground_truth.label = make_label()
Parameters
progress:Falsewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
autosave:Falsewhether to automatically save changes to samples emitted by this iterator
batch_size:Nonethe batch size to use when autosaving samples. If a batching_strategy is provided, this parameter configures the strategy as described below. If no batching_strategy is provided, this can either be an integer specifying the number of samples to save in a batch (in which case batching_strategy is implicitly set to "static") or a float number of seconds between batched saves (in which case batching_strategy is implicitly set to "latency")
batching_strategy:None

the batching strategy to use for each save operation when autosaving samples. Supported values are:

  • "static": a fixed sample batch size for each save
  • "size": a target batch size, in bytes, for each save
  • "latency": a target latency, in seconds, between saves

By default, fo.config.default_batcher is used

Returns
an iterator over fiftyone.core.sample.Sample instances
def last(self): (source)

Returns the last sample in the dataset.

Returns
a fiftyone.core.sample.Sample
def list_saved_views(self, info=False): (source)

List saved views on this dataset.

Parameters
info:Falsewhether to return info dicts describing each saved view rather than just their names
Returns
a list of saved view names or info dicts
def list_summary_fields(self): (source)

Lists the summary fields on the dataset.

Use create_summary_field to create summary fields, and use delete_summary_field to delete them.

Returns
a list of summary field names
def list_workspaces(self, info=False): (source)

List saved workspaces on this dataset.

Parameters
info:Falsewhether to return info dicts describing each saved workspace rather than just their names
Returns
a list of saved workspace names or info dicts
def load_saved_view(self, name): (source)

Loads the saved view with the given name.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F

dataset = foz.load_zoo_dataset("quickstart")
view = dataset.filter_labels("ground_truth", F("label") == "cat")

dataset.save_view("cats", view)

also_view = dataset.load_saved_view("cats")
assert view == also_view
Parameters
namethe name of a saved view
Returns
a fiftyone.core.view.DatasetView
def load_workspace(self, name): (source)

Loads the saved workspace with the given name.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart")

embeddings_panel = fo.Panel(
    type="Embeddings",
    state=dict(brainResult="img_viz", colorByField="metadata.size_bytes"),
)
workspace = fo.Space(children=[embeddings_panel])
workspace_name = "embeddings-workspace"
dataset.save_workspace(workspace_name, workspace)

# Some time later ... load the workspace
loaded_workspace = dataset.load_workspace(workspace_name)
assert workspace == loaded_workspace

# Launch app with the loaded workspace!
session = fo.launch_app(dataset, spaces=loaded_workspace)

# Or set via session later on
session.spaces = loaded_workspace
Parameters
namethe name of a saved workspace
Returns
a fiftyone.core.odm.workspace.Space
Raises
ValueErrorif name is not a saved workspace
@mask_targets.setter
def mask_targets(self, targets): (source)

Undocumented

@media_type.setter
def media_type(self, media_type): (source)

Undocumented

def merge_archive(self, archive_path, dataset_type=None, data_path=None, labels_path=None, label_field=None, tags=None, key_field='filepath', key_fcn=None, skip_existing=False, insert_new=True, fields=None, omit_fields=None, merge_lists=True, overwrite=True, expand_schema=True, dynamic=False, add_info=True, cleanup=True, progress=None, **kwargs): (source)

Merges the contents of the given archive into the dataset.

Note

This method requires the ability to create unique indexes on the key_field of each collection.

See add_archive if you want to add samples without a uniqueness constraint.

If a directory with the same root name as archive_path exists, it is assumed that this directory contains the extracted contents of the archive, and thus the archive is not re-extracted.

See :ref:`this guide <loading-datasets-from-disk>` for example usages of this method and descriptions of the available dataset types.

Note

The following archive formats are explicitly supported:

.zip, .tar, .tar.gz, .tgz, .tar.bz, .tbz

If an archive not in the above list is found, extraction will be attempted via the patool package, which supports many formats but may require that additional system packages be installed.

By default, samples with the same absolute filepath are merged, but you can customize this behavior via the key_field and key_fcn parameters. For example, you could set key_fcn = lambda sample: os.path.basename(sample.filepath) to merge samples with the same base filename.

The behavior of this method is highly customizable. By default, all top-level fields from the imported samples are merged in, overwriting any existing values for those fields, with the exception of list fields (e.g., tags) and label list fields (e.g., fiftyone.core.labels.Detections fields), in which case the elements of the lists themselves are merged. In the case of label list fields, labels with the same id in both collections are updated rather than duplicated.

To avoid confusion between missing fields and fields whose value is None, None-valued fields are always treated as missing while merging.

This method can be configured in numerous ways, including:

  • Whether existing samples should be modified or skipped
  • Whether new samples should be added or omitted
  • Whether new fields can be added to the dataset schema
  • Whether list fields should be treated as ordinary fields and merged as a whole rather than merging their elements
  • Whether to merge only specific fields, or all but certain fields
  • Mapping input fields to different field names of this dataset
Parameters
archive_paththe path to an archive of a dataset directory
dataset_type:Nonethe fiftyone.types.Dataset type of the dataset in archive_path
data_path:None

an optional parameter that enables explicit control over the location of the media for certain dataset types. Can be any of the following:

  • a folder name like "data" or "data/" specifying a subfolder of dataset_dir in which the media lies
  • an absolute directory path in which the media lies. In this case, the archive_path has no effect on the location of the data
  • a filename like "data.json" specifying the filename of a JSON manifest file in archive_path that maps UUIDs to media filepaths. Files of this format are generated when passing the export_media="manifest" option to fiftyone.core.collections.SampleCollection.export
  • an absolute filepath to a JSON manifest file. In this case, archive_path has no effect on the location of the data
  • a dict mapping filenames to absolute filepaths

By default, it is assumed that the data can be located in the default location within archive_path for the dataset type

labels_path:None

an optional parameter that enables explicit control over the location of the labels. Only applicable when importing certain labeled dataset formats. Can be any of the following:

  • a type-specific folder name like "labels" or "labels/" or a filename like "labels.json" or "labels.xml" specifying the location in archive_path of the labels file(s)
  • an absolute directory or filepath containing the labels file(s). In this case, archive_path has no effect on the location of the labels

For labeled datasets, this parameter defaults to the location in archive_path of the labels for the default layout of the dataset type being imported

label_field:Nonecontrols the field(s) in which imported labels are stored. Only applicable if dataset_importer is a fiftyone.utils.data.importers.LabeledImageDatasetImporter or fiftyone.utils.data.importers.LabeledVideoDatasetImporter. If the importer produces a single fiftyone.core.labels.Label instance per sample/frame, this argument specifies the name of the field to use; the default is "ground_truth". If the importer produces a dictionary of labels per sample, this argument can be either a string prefix to prepend to each label key or a dict mapping label keys to field names; the default in this case is to directly use the keys of the imported label dictionaries as field names
tags:Nonean optional tag or iterable of tags to attach to each sample
key_field:"filepath"the sample field to use to decide whether to join with an existing sample
key_fcn:Nonea function that accepts a fiftyone.core.sample.Sample instance and computes a key to decide if two samples should be merged. If a key_fcn is provided, key_field is ignored
skip_existing:Falsewhether to skip existing samples (True) or merge them (False)
insert_new:Truewhether to insert new samples (True) or skip them (False)
fields:Nonean optional field or iterable of fields to which to restrict the merge. If provided, fields other than these are omitted from samples when merging or adding samples. One exception is that filepath is always included when adding new samples, since the field is required. This can also be a dict mapping field names of the input collection to field names of this dataset
omit_fields:Nonean optional field or iterable of fields to exclude from the merge. If provided, these fields are omitted from imported samples, if present. One exception is that filepath is always included when adding new samples, since the field is required
merge_lists:Truewhether to merge the elements of list fields (e.g., tags) and label list fields (e.g., fiftyone.core.labels.Detections fields) rather than merging the entire top-level field like other field types. For label lists fields, existing fiftyone.core.label.Label elements are either replaced (when overwrite is True) or kept (when overwrite is False) when their id matches a label from the provided samples
overwrite:Truewhether to overwrite (True) or skip (False) existing fields and label elements
expand_schema:Truewhether to dynamically add new fields encountered to the dataset schema. If False, an error is raised if a sample's schema is not a subset of the dataset schema
dynamic:Falsewhether to declare dynamic attributes of embedded document fields that are encountered
add_info:Truewhether to add dataset info from the importer (if any) to the dataset
cleanup:Truewhether to delete the archive after extracting it
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
**kwargsoptional keyword arguments to pass to the constructor of the fiftyone.utils.data.importers.DatasetImporter for the specified dataset_type
def merge_dir(self, dataset_dir=None, dataset_type=None, data_path=None, labels_path=None, label_field=None, tags=None, key_field='filepath', key_fcn=None, skip_existing=False, insert_new=True, fields=None, omit_fields=None, merge_lists=True, overwrite=True, expand_schema=True, dynamic=False, add_info=True, progress=None, **kwargs): (source)

Merges the contents of the given directory into the dataset.

Note

This method requires the ability to create unique indexes on the key_field of each collection.

See add_dir if you want to add samples without a uniqueness constraint.

You can perform imports with this method via the following basic patterns:

  1. Provide dataset_dir and dataset_type to import the contents of a directory that is organized in the default layout for the dataset type as documented in :ref:`this guide <loading-datasets-from-disk>`
  2. Provide dataset_type along with data_path, labels_path, or other type-specific parameters to perform a customized import. This syntax provides the flexibility to, for example, perform labels-only imports or imports where the source media lies in a different location than the labels

In either workflow, the remaining parameters of this method can be provided to further configure the import.

See :ref:`this guide <loading-datasets-from-disk>` for example usages of this method and descriptions of the available dataset types.

By default, samples with the same absolute filepath are merged, but you can customize this behavior via the key_field and key_fcn parameters. For example, you could set key_fcn = lambda sample: os.path.basename(sample.filepath) to merge samples with the same base filename.

The behavior of this method is highly customizable. By default, all top-level fields from the imported samples are merged in, overwriting any existing values for those fields, with the exception of list fields (e.g., tags) and label list fields (e.g., fiftyone.core.labels.Detections fields), in which case the elements of the lists themselves are merged. In the case of label list fields, labels with the same id in both collections are updated rather than duplicated.

To avoid confusion between missing fields and fields whose value is None, None-valued fields are always treated as missing while merging.

This method can be configured in numerous ways, including:

  • Whether existing samples should be modified or skipped
  • Whether new samples should be added or omitted
  • Whether new fields can be added to the dataset schema
  • Whether list fields should be treated as ordinary fields and merged as a whole rather than merging their elements
  • Whether to merge only specific fields, or all but certain fields
  • Mapping input fields to different field names of this dataset
Parameters
dataset_dir:Nonethe dataset directory. This can be omitted for certain dataset formats if you provide arguments such as data_path and labels_path
dataset_type:Nonethe fiftyone.types.Dataset type of the dataset
data_path:None

an optional parameter that enables explicit control over the location of the media for certain dataset types. Can be any of the following:

  • a folder name like "data" or "data/" specifying a subfolder of dataset_dir in which the media lies
  • an absolute directory path in which the media lies. In this case, the dataset_dir has no effect on the location of the data
  • a filename like "data.json" specifying the filename of a JSON manifest file in dataset_dir that maps UUIDs to media filepaths. Files of this format are generated when passing the export_media="manifest" option to fiftyone.core.collections.SampleCollection.export
  • an absolute filepath to a JSON manifest file. In this case, dataset_dir has no effect on the location of the data
  • a dict mapping filenames to absolute filepaths

By default, it is assumed that the data can be located in the default location within dataset_dir for the dataset type

labels_path:None

an optional parameter that enables explicit control over the location of the labels. Only applicable when importing certain labeled dataset formats. Can be any of the following:

  • a type-specific folder name like "labels" or "labels/" or a filename like "labels.json" or "labels.xml" specifying the location in dataset_dir of the labels file(s)
  • an absolute directory or filepath containing the labels file(s). In this case, dataset_dir has no effect on the location of the labels

For labeled datasets, this parameter defaults to the location in dataset_dir of the labels for the default layout of the dataset type being imported

label_field:Nonecontrols the field(s) in which imported labels are stored. Only applicable if dataset_importer is a fiftyone.utils.data.importers.LabeledImageDatasetImporter or fiftyone.utils.data.importers.LabeledVideoDatasetImporter. If the importer produces a single fiftyone.core.labels.Label instance per sample/frame, this argument specifies the name of the field to use; the default is "ground_truth". If the importer produces a dictionary of labels per sample, this argument can be either a string prefix to prepend to each label key or a dict mapping label keys to field names; the default in this case is to directly use the keys of the imported label dictionaries as field names
tags:Nonean optional tag or iterable of tags to attach to each sample
key_field:"filepath"the sample field to use to decide whether to join with an existing sample
key_fcn:Nonea function that accepts a fiftyone.core.sample.Sample instance and computes a key to decide if two samples should be merged. If a key_fcn is provided, key_field is ignored
skip_existing:Falsewhether to skip existing samples (True) or merge them (False)
insert_new:Truewhether to insert new samples (True) or skip them (False)
fields:Nonean optional field or iterable of fields to which to restrict the merge. If provided, fields other than these are omitted from samples when merging or adding samples. One exception is that filepath is always included when adding new samples, since the field is required. This can also be a dict mapping field names of the input collection to field names of this dataset
omit_fields:Nonean optional field or iterable of fields to exclude from the merge. If provided, these fields are omitted from imported samples, if present. One exception is that filepath is always included when adding new samples, since the field is required
merge_lists:Truewhether to merge the elements of list fields (e.g., tags) and label list fields (e.g., fiftyone.core.labels.Detections fields) rather than merging the entire top-level field like other field types. For label lists fields, existing fiftyone.core.label.Label elements are either replaced (when overwrite is True) or kept (when overwrite is False) when their id matches a label from the provided samples
overwrite:Truewhether to overwrite (True) or skip (False) existing fields and label elements
expand_schema:Truewhether to dynamically add new fields encountered to the dataset schema. If False, an error is raised if a sample's schema is not a subset of the dataset schema
dynamic:Falsewhether to declare dynamic attributes of embedded document fields that are encountered
add_info:Truewhether to add dataset info from the importer (if any) to the dataset
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
**kwargsoptional keyword arguments to pass to the constructor of the fiftyone.utils.data.importers.DatasetImporter for the specified dataset_type
def merge_importer(self, dataset_importer, label_field=None, tags=None, key_field='filepath', key_fcn=None, skip_existing=False, insert_new=True, fields=None, omit_fields=None, merge_lists=True, overwrite=True, expand_schema=True, dynamic=False, add_info=True, progress=None): (source)

Merges the samples from the given fiftyone.utils.data.importers.DatasetImporter into the dataset.

Note

This method requires the ability to create unique indexes on the key_field of each collection.

See add_importer if you want to add samples without a uniqueness constraint.

See :ref:`this guide <custom-dataset-importer>` for more details about importing datasets in custom formats by defining your own DatasetImporter.

By default, samples with the same absolute filepath are merged, but you can customize this behavior via the key_field and key_fcn parameters. For example, you could set key_fcn = lambda sample: os.path.basename(sample.filepath) to merge samples with the same base filename.

The behavior of this method is highly customizable. By default, all top-level fields from the imported samples are merged in, overwriting any existing values for those fields, with the exception of list fields (e.g., tags) and label list fields (e.g., fiftyone.core.labels.Detections fields), in which case the elements of the lists themselves are merged. In the case of label list fields, labels with the same id in both collections are updated rather than duplicated.

To avoid confusion between missing fields and fields whose value is None, None-valued fields are always treated as missing while merging.

This method can be configured in numerous ways, including:

  • Whether existing samples should be modified or skipped
  • Whether new samples should be added or omitted
  • Whether new fields can be added to the dataset schema
  • Whether list fields should be treated as ordinary fields and merged as a whole rather than merging their elements
  • Whether to merge only specific fields, or all but certain fields
  • Mapping input fields to different field names of this dataset
Parameters
dataset_importera fiftyone.utils.data.importers.DatasetImporter
label_field:Nonecontrols the field(s) in which imported labels are stored. Only applicable if dataset_importer is a fiftyone.utils.data.importers.LabeledImageDatasetImporter or fiftyone.utils.data.importers.LabeledVideoDatasetImporter. If the importer produces a single fiftyone.core.labels.Label instance per sample/frame, this argument specifies the name of the field to use; the default is "ground_truth". If the importer produces a dictionary of labels per sample, this argument can be either a string prefix to prepend to each label key or a dict mapping label keys to field names; the default in this case is to directly use the keys of the imported label dictionaries as field names
tags:Nonean optional tag or iterable of tags to attach to each sample
key_field:"filepath"the sample field to use to decide whether to join with an existing sample
key_fcn:Nonea function that accepts a fiftyone.core.sample.Sample instance and computes a key to decide if two samples should be merged. If a key_fcn is provided, key_field is ignored
skip_existing:Falsewhether to skip existing samples (True) or merge them (False)
insert_new:Truewhether to insert new samples (True) or skip them (False)
fields:Nonean optional field or iterable of fields to which to restrict the merge. If provided, fields other than these are omitted from samples when merging or adding samples. One exception is that filepath is always included when adding new samples, since the field is required. This can also be a dict mapping field names of the input collection to field names of this dataset
omit_fields:Nonean optional field or iterable of fields to exclude from the merge. If provided, these fields are omitted from imported samples, if present. One exception is that filepath is always included when adding new samples, since the field is required
merge_lists:Truewhether to merge the elements of list fields (e.g., tags) and label list fields (e.g., fiftyone.core.labels.Detections fields) rather than merging the entire top-level field like other field types. For label lists fields, existing fiftyone.core.label.Label elements are either replaced (when overwrite is True) or kept (when overwrite is False) when their id matches a label from the provided samples
overwrite:Truewhether to overwrite (True) or skip (False) existing fields and label elements
expand_schema:Truewhether to dynamically add new fields encountered to the dataset schema. If False, an error is raised if a sample's schema is not a subset of the dataset schema
dynamic:Falsewhether to declare dynamic attributes of embedded document fields that are encountered
add_info:Truewhether to add dataset info from the importer (if any) to the dataset
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
def merge_sample(self, sample, key_field='filepath', skip_existing=False, insert_new=True, fields=None, omit_fields=None, merge_lists=True, overwrite=True, expand_schema=True, validate=True, dynamic=False): (source)

Merges the fields of the given sample into this dataset.

By default, the sample is merged with an existing sample with the same absolute filepath, if one exists. Otherwise a new sample is inserted. You can customize this behavior via the key_field, skip_existing, and insert_new parameters.

The behavior of this method is highly customizable. By default, all top-level fields from the provided sample are merged in, overwriting any existing values for those fields, with the exception of list fields (e.g., tags) and label list fields (e.g., fiftyone.core.labels.Detections fields), in which case the elements of the lists themselves are merged. In the case of label list fields, labels with the same id in both samples are updated rather than duplicated.

To avoid confusion between missing fields and fields whose value is None, None-valued fields are always treated as missing while merging.

This method can be configured in numerous ways, including:

  • Whether new fields can be added to the dataset schema
  • Whether list fields should be treated as ordinary fields and merged as a whole rather than merging their elements
  • Whether to merge only specific fields, or all but certain fields
  • Mapping input sample fields to different field names of this sample
Parameters
samplea fiftyone.core.sample.Sample
key_field:"filepath"the sample field to use to decide whether to join with an existing sample
skip_existing:Falsewhether to skip existing samples (True) or merge them (False)
insert_new:Truewhether to insert new samples (True) or skip them (False)
fields:Nonean optional field or iterable of fields to which to restrict the merge. May contain frame fields for video samples. This can also be a dict mapping field names of the input sample to field names of this dataset
omit_fields:Nonean optional field or iterable of fields to exclude from the merge. May contain frame fields for video samples
merge_lists:Truewhether to merge the elements of list fields (e.g., tags) and label list fields (e.g., fiftyone.core.labels.Detections fields) rather than merging the entire top-level field like other field types. For label lists fields, existing fiftyone.core.label.Label elements are either replaced (when overwrite is True) or kept (when overwrite is False) when their id matches a label from the provided sample
overwrite:Truewhether to overwrite (True) or skip (False) existing fields and label elements
expand_schema:Truewhether to dynamically add new fields encountered to the dataset schema. If False, an error is raised if any fields are not in the dataset schema
validate:Truewhether to validate values for existing fields
dynamic:Falsewhether to declare dynamic embedded document fields
def merge_samples(self, samples, key_field='filepath', key_fcn=None, skip_existing=False, insert_new=True, fields=None, omit_fields=None, merge_lists=True, overwrite=True, expand_schema=True, dynamic=False, include_info=True, overwrite_info=False, progress=None, num_samples=None): (source)

Merges the given samples into this dataset.

Note

This method requires the ability to create unique indexes on the key_field of each collection.

See add_collection if you want to add samples from one collection to another dataset without a uniqueness constraint.

By default, samples with the same absolute filepath are merged, but you can customize this behavior via the key_field and key_fcn parameters. For example, you could set key_fcn = lambda sample: os.path.basename(sample.filepath) to merge samples with the same base filename.

The behavior of this method is highly customizable. By default, all top-level fields from the provided samples are merged in, overwriting any existing values for those fields, with the exception of list fields (e.g., tags) and label list fields (e.g., fiftyone.core.labels.Detections fields), in which case the elements of the lists themselves are merged. In the case of label list fields, labels with the same id in both collections are updated rather than duplicated.

To avoid confusion between missing fields and fields whose value is None, None-valued fields are always treated as missing while merging.

This method can be configured in numerous ways, including:

  • Whether existing samples should be modified or skipped
  • Whether new samples should be added or omitted
  • Whether new fields can be added to the dataset schema
  • Whether list fields should be treated as ordinary fields and merged as a whole rather than merging their elements
  • Whether to merge only specific fields, or all but certain fields
  • Mapping input fields to different field names of this dataset
Parameters
samplesa fiftyone.core.collections.SampleCollection or iterable of fiftyone.core.sample.Sample instances
key_field:"filepath"the sample field to use to decide whether to join with an existing sample
key_fcn:Nonea function that accepts a fiftyone.core.sample.Sample instance and computes a key to decide if two samples should be merged. If a key_fcn is provided, key_field is ignored
skip_existing:Falsewhether to skip existing samples (True) or merge them (False)
insert_new:Truewhether to insert new samples (True) or skip them (False)
fields:Nonean optional field or iterable of fields to which to restrict the merge. If provided, fields other than these are omitted from samples when merging or adding samples. One exception is that filepath is always included when adding new samples, since the field is required. This can also be a dict mapping field names of the input collection to field names of this dataset
omit_fields:Nonean optional field or iterable of fields to exclude from the merge. If provided, these fields are omitted from samples, if present, when merging or adding samples. One exception is that filepath is always included when adding new samples, since the field is required
merge_lists:Truewhether to merge the elements of list fields (e.g., tags) and label list fields (e.g., fiftyone.core.labels.Detections fields) rather than merging the entire top-level field like other field types. For label lists fields, existing fiftyone.core.label.Label elements are either replaced (when overwrite is True) or kept (when overwrite is False) when their id matches a label from the provided samples
overwrite:Truewhether to overwrite (True) or skip (False) existing fields and label elements
expand_schema:Truewhether to dynamically add new fields encountered to the dataset schema. If False, an error is raised if a sample's schema is not a subset of the dataset schema
dynamic:Falsewhether to declare dynamic attributes of embedded document fields that are encountered. Only applicable when samples is not a fiftyone.core.collections.SampleCollection
include_info:Truewhether to merge dataset-level information such as info and classes. Only applicable when samples is a fiftyone.core.collections.SampleCollection
overwrite_info:Falsewhether to overwrite existing dataset-level information. Only applicable when samples is a fiftyone.core.collections.SampleCollection and include_info is True
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
num_samples:Nonethe number of samples in samples. If not provided, this is computed (if possible) via len(samples) if needed for progress tracking
@name.setter
def name(self, name): (source)

Undocumented

def one(self, expr, exact=False): (source)

Returns a single sample in this dataset matching the expression.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F

dataset = foz.load_zoo_dataset("quickstart")

#
# Get a sample by filepath
#

# A random filepath in the dataset
filepath = dataset.take(1).first().filepath

# Get sample by filepath
sample = dataset.one(F("filepath") == filepath)

#
# Dealing with multiple matches
#

# Get a sample whose image is JPEG
sample = dataset.one(F("filepath").ends_with(".jpg"))

# Raises an error since there are multiple JPEGs
dataset.one(F("filepath").ends_with(".jpg"), exact=True)
Parameters
expra fiftyone.core.expressions.ViewExpression or MongoDB expression that evaluates to True for the sample to match
exact:Falsewhether to raise an error if multiple samples match the expression
Returns
a fiftyone.core.sample.Sample
@persistent.setter
def persistent(self, value): (source)

Undocumented

def reload(self): (source)

Reloads the dataset and any in-memory samples from the database.

def remove_dynamic_frame_field(self, field_name, error_level=0): (source)

Removes the dynamic embedded frame field from the dataset's schema.

The underlying data is not deleted from the frames.

Parameters
field_namethe embedded.field.name
error_level:0the error level to use. Valid values are:
- 0raise error if a top-level field cannot be removed
- 1log warning if a top-level field cannot be removed
- 2ignore top-level fields that cannot be removed
def remove_dynamic_frame_fields(self, field_names, error_level=0): (source)

Removes the dynamic embedded frame fields from the dataset's schema.

The underlying data is not deleted from the frames.

Parameters
field_namesthe embedded.field.name or iterable of field names
error_level:0the error level to use. Valid values are:
- 0raise error if a top-level field cannot be removed
- 1log warning if a top-level field cannot be removed
- 2ignore top-level fields that cannot be removed
def remove_dynamic_sample_field(self, field_name, error_level=0): (source)

Removes the dynamic embedded sample field from the dataset's schema.

The underlying data is not deleted from the samples.

Parameters
field_namethe embedded.field.name
error_level:0the error level to use. Valid values are:
- 0raise error if a top-level field cannot be removed
- 1log warning if a top-level field cannot be removed
- 2ignore top-level fields that cannot be removed
def remove_dynamic_sample_fields(self, field_names, error_level=0): (source)

Removes the dynamic embedded sample fields from the dataset's schema.

The underlying data is not deleted from the samples.

Parameters
field_namesthe embedded.field.name or iterable of field names
error_level:0the error level to use. Valid values are:
- 0raise error if a top-level field cannot be removed
- 1log warning if a top-level field cannot be removed
- 2ignore top-level fields that cannot be removed
def rename_frame_field(self, field_name, new_field_name): (source)

Renames the frame-level field to the given new name.

You can use dot notation (embedded.field.name) to rename embedded frame fields.

Only applicable to datasets that contain videos.

Parameters
field_namethe field name or embedded.field.name
new_field_namethe new field name or embedded.field.name
def rename_frame_fields(self, field_mapping): (source)

Renames the frame-level fields to the given new names.

You can use dot notation (embedded.field.name) to rename embedded frame fields.

Parameters
field_mappinga dict mapping field names to new field names
def rename_group_slice(self, name, new_name): (source)

Renames the group slice with the given name.

Parameters
namethe group slice name
new_namethe new group slice name
def rename_sample_field(self, field_name, new_field_name): (source)

Renames the sample field to the given new name.

You can use dot notation (embedded.field.name) to rename embedded fields.

Parameters
field_namethe field name or embedded.field.name
new_field_namethe new field name or embedded.field.name
def rename_sample_fields(self, field_mapping): (source)

Renames the sample fields to the given new names.

You can use dot notation (embedded.field.name) to rename embedded fields.

Parameters
field_mappinga dict mapping field names to new field names
def save(self): (source)

Saves the dataset to the database.

This only needs to be called when dataset-level information such as its Dataset.info is modified.

def save_view(self, name, view, description=None, color=None, overwrite=False): (source)

Saves the given view into this dataset under the given name so it can be loaded later via load_saved_view.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F

dataset = foz.load_zoo_dataset("quickstart")
view = dataset.filter_labels("ground_truth", F("label") == "cat")

dataset.save_view("cats", view)

also_view = dataset.load_saved_view("cats")
assert view == also_view
Parameters
namea name for the saved view
viewa fiftyone.core.view.DatasetView
description:Nonean optional string description
color:Nonean optional RGB hex string like '#FF6D04'
overwrite:Falsewhether to overwrite an existing saved view with the same name
def save_workspace(self, name, workspace, description=None, color=None, overwrite=False): (source)

Saves a workspace into this dataset under the given name so it can be loaded later via load_workspace.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart")

embeddings_panel = fo.Panel(
    type="Embeddings",
    state=dict(
        brainResult="img_viz",
        colorByField="metadata.size_bytes"
    ),
)
workspace = fo.Space(children=[embeddings_panel])

workspace_name = "embeddings-workspace"
description = "Show embeddings only"
dataset.save_workspace(
    workspace_name,
    workspace,
    description=description
)
assert dataset.has_workspace(workspace_name)

also_workspace = dataset.load_workspace(workspace_name)
assert workspace == also_workspace
Parameters
namea name for the saved workspace
workspacea fiftyone.core.odm.workspace.Space
description:Nonean optional string description
color:Nonean optional RGB hex string like '#FF6D04'
overwrite:Falsewhether to overwrite an existing workspace with the same name
Raises
ValueErrorif overwrite==False and workspace with name already exists
@skeletons.setter
def skeletons(self, skeletons): (source)

Undocumented

def stats(self, include_media=False, include_indexes=False, compressed=False): (source)

Returns stats about the dataset on disk.

The samples keys refer to the sample documents stored in the database.

For video datasets, the frames keys refer to the frame documents stored in the database.

The media keys refer to the raw media associated with each sample on disk.

The index[es] keys refer to the indexes associated with the dataset.

Note that dataset-level metadata such as annotation runs are not included in this computation.

Parameters
include_media:Falsewhether to include stats about the size of the raw media in the dataset
include_indexes:Falsewhether to include stats on the dataset's indexes
compressed:Falsewhether to return the sizes of collections in their compressed form on disk (True) or the logical uncompressed size of the collections (False)
Returns
a stats dict
def summary(self): (source)

Returns a string summary of the dataset.

Returns
a string summary
def tail(self, num_samples=3): (source)

Returns a list of the last few samples in the dataset.

If fewer than num_samples samples are in the dataset, only the available samples are returned.

Parameters
num_samples:3the number of samples
Returns
a list of fiftyone.core.sample.Sample objects
def update_saved_view_info(self, name, info): (source)

Updates the editable information for the saved view with the given name.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart")

view = dataset.limit(10)
dataset.save_view("test", view)

# Update the saved view's name and add a description
info = dict(
    name="a new name",
    description="a description",
)
dataset.update_saved_view_info("test", info)
Parameters
namethe name of a saved view
infoa dict whose keys are a subset of the keys returned by get_saved_view_info
def update_summary_field(self, field_name): (source)

Updates the summary field based on the current values of its source field.

Parameters
field_namethe summary field
def update_workspace_info(self, name, info): (source)

Updates the editable information for the saved view with the given name.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart")

workspace = fo.Space()
dataset.save_workspace("test", view)

# Update the workspace's name and add a description, color
info = dict(
    name="a new name",
    color="#FF6D04",
    description="a description",
)
dataset.update_workspace_info("test", info)
Parameters
namethe name of a saved workspace
infoa dict whose keys are a subset of the keys returned by get_workspace_info
@property
group_slice = (source)

The current group slice of the dataset, or None if the dataset is not grouped.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart-groups")

print(dataset.group_slices)
# ['left', 'right', 'pcd']

print(dataset.group_slice)
# left

# Change the current group slice
dataset.group_slice = "right"

print(dataset.group_slice)
# right

The media type of the dataset.

A fiftyone.core.odm.dataset.DatasetAppConfig that customizes how this dataset is visualized in the :ref:`FiftyOne App <fiftyone-app>`.

Examples:

import fiftyone as fo
import fiftyone.utils.image as foui
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart")

# View the dataset's current App config
print(dataset.app_config)

# Generate some thumbnail images
foui.transform_images(
    dataset,
    size=(-1, 32),
    output_field="thumbnail_path",
    output_dir="/tmp/thumbnails",
)

# Modify the dataset's App config
dataset.app_config.media_fields = ["filepath", "thumbnail_path"]
dataset.app_config.grid_media_field = "thumbnail_path"
dataset.save()  # must save after edits

session = fo.launch_app(dataset)

A dict mapping field names to list of class label strings for the corresponding fields of the dataset.

Examples:

import fiftyone as fo

dataset = fo.Dataset()

# Set classes for the `ground_truth` and `predictions` fields
dataset.classes = {
    "ground_truth": ["cat", "dog"],
    "predictions": ["cat", "dog", "other"],
}

# Edit an existing classes list
dataset.classes["ground_truth"].append("other")
dataset.save()  # must save after edits

The datetime that the dataset was created.

@property
default_classes = (source)

A list of class label strings for all fiftyone.core.labels.Label fields of this dataset that do not have customized classes defined in classes.

Examples:

import fiftyone as fo

dataset = fo.Dataset()

# Set default classes
dataset.default_classes = ["cat", "dog"]

# Edit the default classes
dataset.default_classes.append("rabbit")
dataset.save()  # must save after edits
@property
default_group_slice = (source)

The default group slice of the dataset, or None if the dataset is not grouped.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart-groups")

print(dataset.default_group_slice)
# left

# Change the default group slice
dataset.default_group_slice = "right"

print(dataset.default_group_slice)
# right
@property
default_mask_targets = (source)

A dict defining a default mapping between pixel values (2D masks) or RGB hex strings (3D masks) and label strings for the segmentation masks of all fiftyone.core.labels.Segmentation fields of this dataset that do not have customized mask targets defined in mask_targets.

Examples:

import fiftyone as fo

#
# 2D masks
#

dataset = fo.Dataset()

# Set default mask targets
dataset.default_mask_targets = {1: "cat", 2: "dog"}

# Or, for RGB mask targets
dataset.default_mask_targets = {"#3f0a44": "road", "#eeffee": "building", "#ffffff": "other"}

# Edit the default mask targets
dataset.default_mask_targets[255] = "other"
dataset.save()  # must save after edits

#
# 3D masks
#

dataset = fo.Dataset()

# Set default mask targets
dataset.default_mask_targets = {"#499CEF": "cat", "#6D04FF": "dog"}

# Edit the default mask targets
dataset.default_mask_targets["#FF6D04"] = "person"
dataset.save()  # must save after edits
@property
default_skeleton = (source)

A default fiftyone.core.odm.dataset.KeypointSkeleton defining the semantic labels and point connectivity for all fiftyone.core.labels.Keypoint fields of this dataset that do not have customized skeletons defined in skeleton.

Examples:

import fiftyone as fo

dataset = fo.Dataset()

# Set default keypoint skeleton
dataset.default_skeleton = fo.KeypointSkeleton(
    labels=[
        "left hand" "left shoulder", "right shoulder", "right hand",
        "left eye", "right eye", "mouth",
    ],
    edges=[[0, 1, 2, 3], [4, 5, 6]],
)

# Edit the default skeleton
dataset.default_skeleton.labels[-1] = "lips"
dataset.save()  # must save after edits

Whether the dataset is deleted.

@property
description = (source)

A string description on the dataset.

Examples:

import fiftyone as fo

dataset = fo.Dataset()

# Store a description on the dataset
dataset.description = "Your description here"
@property
group_field = (source)

The group field of the dataset, or None if the dataset is not grouped.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart-groups")

print(dataset.group_field)
# group
@property
group_media_types = (source)

A dict mapping group slices to media types, or None if the dataset is not grouped.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart-groups")

print(dataset.group_media_types)
# {'left': 'image', 'right': 'image', 'pcd': 'point-cloud'}
@property
group_slices = (source)

The list of group slices of the dataset, or None if the dataset is not grouped.

Examples:

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart-groups")

print(dataset.group_slices)
# ['left', 'right', 'pcd']
@property
has_saved_views = (source)

Whether this dataset has any saved views.

@property
has_workspaces = (source)

Whether this dataset has any saved workspaces.

A user-facing dictionary of information about the dataset.

Examples:

import fiftyone as fo

dataset = fo.Dataset()

# Store a class list in the dataset's info
dataset.info = {"classes": ["cat", "dog"]}

# Edit the info
dataset.info["other_classes"] = ["bird", "plane"]
dataset.save()  # must save after edits
@property
last_loaded_at = (source)

The datetime that the dataset was last loaded.

@property
last_modified_at = (source)

The datetime that the dataset was last modified.

@property
mask_targets = (source)

A dict mapping field names to mask target dicts, each of which defines a mapping between pixel values (2D masks) or RGB hex strings (3D masks) and label strings for the segmentation masks in the corresponding field of the dataset.

Examples:

import fiftyone as fo

#
# 2D masks
#

dataset = fo.Dataset()

# Set mask targets for the `ground_truth` and `predictions` fields
dataset.mask_targets = {
    "ground_truth": {1: "cat", 2: "dog"},
    "predictions": {1: "cat", 2: "dog", 255: "other"},
}

# Or, for RGB mask targets
dataset.mask_targets = {
    "segmentations": {"#3f0a44": "road", "#eeffee": "building", "#ffffff": "other"}
}

# Edit an existing mask target
dataset.mask_targets["ground_truth"][255] = "other"
dataset.save()  # must save after edits

#
# 3D masks
#

dataset = fo.Dataset()

# Set mask targets for the `ground_truth` and `predictions` fields
dataset.mask_targets = {
    "ground_truth": {"#499CEF": "cat", "#6D04FF": "dog"},
    "predictions": {
        "#499CEF": "cat", "#6D04FF": "dog", "#FF6D04": "person"
    },
}

# Edit an existing mask target
dataset.mask_targets["ground_truth"]["#FF6D04"] = "person"
dataset.save()  # must save after edits

Whether the dataset persists in the database after a session is terminated.

A dict mapping field names to fiftyone.core.odm.dataset.KeypointSkeleton instances, each of which defines the semantic labels and point connectivity for the fiftyone.core.labels.Keypoint instances in the corresponding field of the dataset.

Examples:

import fiftyone as fo

dataset = fo.Dataset()

# Set keypoint skeleton for the `ground_truth` field
dataset.skeletons = {
    "ground_truth": fo.KeypointSkeleton(
        labels=[
            "left hand" "left shoulder", "right shoulder", "right hand",
            "left eye", "right eye", "mouth",
        ],
        edges=[[0, 1, 2, 3], [4, 5, 6]],
    )
}

# Edit an existing skeleton
dataset.skeletons["ground_truth"].labels[-1] = "lips"
dataset.save()  # must save after edits

The slug of the dataset.

A list of tags on the dataset.

Examples:

import fiftyone as fo

dataset = fo.Dataset()

# Add some tags
dataset.tags = ["test", "projectA"]

# Edit the tags
dataset.tags.pop()
dataset.tags.append("projectB")
dataset.save()  # must save after edits

The version of the fiftyone package for which the dataset is formatted.

def _add_group_field(self, field_name, default=None, **kwargs): (source)

Undocumented

def _add_implied_frame_field(self, field_name, value, dynamic=False, validate=True): (source)

Undocumented

def _add_implied_sample_field(self, field_name, value, dynamic=False, validate=True): (source)

Undocumented

def _add_samples_batch(self, samples, expand_schema, dynamic, validate, batcher=None): (source)

Undocumented

def _add_view_stage(self, stage): (source)

Returns a fiftyone.core.view.DatasetView containing the contents of the collection with the given fiftyone.core.stages.ViewStage` appended to its aggregation pipeline.

Subclasses are responsible for performing any validation on the view stage to ensure that it is a valid stage to add to this collection.

Parameters
stagea fiftyone.core.stages.ViewStage`
Returns
a fiftyone.core.view.DatasetView
def _aggregate(self, pipeline=None, media_type=None, attach_frames=False, detach_frames=False, frames_only=False, support=None, group_slice=None, group_slices=None, detach_groups=False, groups_only=False, manual_group_select=False, post_pipeline=None): (source)

Runs the MongoDB aggregation pipeline on the collection and returns the result.

Parameters
pipeline:Nonea MongoDB aggregation pipeline (list of dicts) to append to the current pipeline
media_type:Nonethe media type of the collection, if different than the source dataset's media type
attach_frames:Falsewhether to attach the frame documents immediately prior to executing pipeline. Only applicable to datasets that contain videos
detach_frames:Falsewhether to detach the frame documents at the end of the pipeline. Only applicable to datasets that contain videos
frames_only:Falsewhether to generate a pipeline that contains only the frames in the collection
support:Nonean optional [first, last] range of frames to attach. Only applicable when attaching frames
group_slice:Nonethe current group slice of the collection, if different than the source dataset's group slice. Only applicable for grouped collections
group_slices:Nonean optional list of group slices to attach when groups_only is True
detach_groups:Falsewhether to detach the group documents at the end of the pipeline. Only applicable to grouped collections
groups_only:Falsewhether to generate a pipeline that contains only the flattened group documents for the collection
manual_group_select:Falsewhether the pipeline has manually handled the initial group selection. Only applicable to grouped collections
post_pipeline:Nonea MongoDB aggregation pipeline (list of dicts) to append to the very end of the pipeline, after all other arguments are applied
Returns
the aggregation result dict
def _apply_frame_field_schema(self, schema): (source)

Undocumented

def _apply_sample_field_schema(self, schema): (source)

Undocumented

def _attach_frames_pipeline(self, support=None): (source)

A pipeline that attaches the frame documents for each document.

def _attach_groups_pipeline(self, group_slices=None): (source)

A pipeline that attaches the requested group slice(s) for each document and stores them in under groups.<slice> keys.

def _bulk_write(self, ops, ids=None, frames=False, ordered=False, progress=False): (source)

Undocumented

def _clear(self, view=None, sample_ids=None): (source)

Undocumented

def _clear_frame_fields(self, field_names, view=None): (source)

Undocumented

def _clear_frames(self, view=None, sample_ids=None, frame_ids=None): (source)

Undocumented

def _clear_groups(self, view=None, group_ids=None): (source)

Undocumented

def _clear_sample_fields(self, field_names, view=None): (source)

Undocumented

def _clone(self, name=None, persistent=False, view=None): (source)

Undocumented

def _clone_frame_fields(self, field_mapping, view=None): (source)

Undocumented

def _clone_sample_fields(self, field_mapping, view=None): (source)

Undocumented

def _delete(self): (source)

Undocumented

def _delete_frame_fields(self, field_names, error_level): (source)

Undocumented

def _delete_labels(self, labels, fields=None): (source)
def _delete_sample_fields(self, field_names, error_level): (source)

Undocumented

def _delete_saved_view(self, name): (source)

Undocumented

def _delete_summary_fields(self, field_names, error_level): (source)

Undocumented

def _delete_workspace(self, name): (source)

Undocumented

def _ensure_frames(self, view=None): (source)

Undocumented

def _ensure_label_field(self, label_field, label_cls): (source)

Undocumented

def _estimated_count(self, frames=False): (source)

Undocumented

def _expand_frame_schema(self, frames, dynamic): (source)

Undocumented

def _expand_group_schema(self, field_name, slice_name, media_type): (source)

Undocumented

def _expand_schema(self, samples, dynamic): (source)

Undocumented

def _frame_collstats(self): (source)

Undocumented

def _frame_dict_to_doc(self, d): (source)

Undocumented

def _get_default_summary_field_name(self, path): (source)

Undocumented

def _get_frame_collection(self, write_concern=None): (source)

Undocumented

def _get_sample_collection(self, write_concern=None): (source)

Undocumented

def _get_saved_view_doc(self, name, pop=False, slug=False): (source)

Undocumented

def _get_summarized_fields_map(self): (source)

Undocumented

def _get_workspace_doc(self, name, pop=False, slug=False): (source)

Undocumented

def _group_select_pipeline(self, slice_name): (source)

A pipeline that selects only the given slice's documents from the pipeline.

def _groups_only_pipeline(self, group_slices=None): (source)

A pipeline that looks up the requested group slices for each document and returns (only) the unwound group slices.

def _init_frames(self): (source)

Undocumented

def _iter_groups(self, group_slices=None, pipeline=None): (source)

Undocumented

def _iter_samples(self, pipeline=None): (source)

Undocumented

def _keep(self, view=None, sample_ids=None): (source)

Undocumented

def _keep_fields(self, view=None): (source)

Undocumented

def _keep_frames(self, view=None, frame_ids=None): (source)

Undocumented

def _load_saved_view_from_doc(self, view_doc): (source)

Undocumented

def _make_dict(self, sample, include_id=False, created_at=None, last_modified_at=None): (source)

Undocumented

def _make_frame(self, d): (source)

Undocumented

def _make_sample(self, d): (source)

Undocumented

def _merge_doc(self, doc, fields=None, omit_fields=None, expand_schema=True, merge_info=True, overwrite_info=False): (source)

Undocumented

def _merge_frame_field_schema(self, schema, expand_schema=True, recursive=True, validate=True): (source)

Undocumented

def _merge_sample_field_schema(self, schema, expand_schema=True, recursive=True, validate=True): (source)

Undocumented

def _pipeline(self, pipeline=None, media_type=None, attach_frames=False, detach_frames=False, frames_only=False, support=None, group_slice=None, group_slices=None, detach_groups=False, groups_only=False, manual_group_select=False, post_pipeline=None): (source)

Returns the MongoDB aggregation pipeline for the collection.

Parameters
pipeline:Nonea MongoDB aggregation pipeline (list of dicts) to append to the current pipeline
media_type:Nonethe media type of the collection, if different than the source dataset's media type
attach_frames:Falsewhether to attach the frame documents immediately prior to executing pipeline. Only applicable to datasets that contain videos
detach_frames:Falsewhether to detach the frame documents at the end of the pipeline. Only applicable to datasets that contain videos
frames_only:Falsewhether to generate a pipeline that contains only the frames in the collection
support:Nonean optional [first, last] range of frames to attach. Only applicable when attaching frames
group_slice:Nonethe current group slice of the collection, if different than the source dataset's group slice. Only applicable for grouped collections
group_slices:Nonean optional list of group slices to attach when groups_only is True
detach_groups:Falsewhether to detach the group documents at the end of the pipeline. Only applicable to grouped collections
groups_only:Falsewhether to generate a pipeline that contains only the flattened group documents for the collection
manual_group_select:Falsewhether the pipeline has manually handled the initial group selection. Only applicable to grouped collections
post_pipeline:Nonea MongoDB aggregation pipeline (list of dicts) to append to the very end of the pipeline, after all other arguments are applied
Returns
the aggregation pipeline
def _populate_summary_field(self, field_name, summary_info): (source)

Undocumented

def _reload(self, hard=False): (source)

Undocumented

def _reload_docs(self, hard=False): (source)

Undocumented

def _remove_dynamic_frame_fields(self, field_names, error_level): (source)

Undocumented

def _remove_dynamic_sample_fields(self, field_names, error_level): (source)

Undocumented

def _rename_frame_fields(self, field_mapping, view=None): (source)

Undocumented

def _rename_sample_fields(self, field_mapping, view=None): (source)

Undocumented

def _sample_collstats(self): (source)

Undocumented

def _sample_dict_to_doc(self, d): (source)

Undocumented

def _save(self, view=None, fields=None): (source)

Undocumented

def _save_field(self, field, _enforce_read_only=True): (source)

Undocumented

def _set_media_type(self, media_type): (source)

Undocumented

def _unwind_frames_pipeline(self): (source)

A pipeline that returns (only) the unwound frames documents.

def _unwind_groups_pipeline(self): (source)

A pipeline that returns (only) the unwound groups documents.

def _update_last_loaded_at(self, force=False): (source)

Undocumented

def _update_metadata_field(self, media_type): (source)

Undocumented

def _upsert_samples(self, samples, expand_schema=True, dynamic=False, validate=True, progress=None, num_samples=None): (source)

Undocumented

def _upsert_samples_batch(self, samples, expand_schema, dynamic, validate, batcher=None): (source)

Undocumented

def _validate_samples(self, samples): (source)

Undocumented

def _validate_saved_view_name(self, name, skip=None, overwrite=False): (source)

Undocumented

def _validate_workspace_name(self, name, skip=None, overwrite=False): (source)

Undocumented

_annotation_cache = (source)

Undocumented

_brain_cache = (source)

Undocumented

_deleted: bool = (source)

Undocumented

Undocumented

_evaluation_cache = (source)

Undocumented

_frame_doc_cls = (source)

Undocumented

_group_slice = (source)

Undocumented

_run_cache = (source)

Undocumented

_sample_doc_cls = (source)

Undocumented

The fiftyone.core.dataset.Dataset that serves the samples in this collection.

@property
_frame_collection = (source)

Undocumented

@property
_frame_collection_name = (source)

Undocumented

Whether this collection contains clips.

@property
_is_dynamic_groups = (source)

Whether this collection contains dynamic groups.

Whether this collection contains frames of a video dataset.

@property
_is_generated = (source)

Whether this collection's contents is generated from another collection.

@property
_is_patches = (source)

Whether this collection contains patches.

@property
_root_dataset = (source)

The root fiftyone.core.dataset.Dataset from which this collection is derived.

This is typically the same as _dataset but may differ in cases such as patches views.

@property
_sample_collection = (source)

Undocumented

@property
_sample_collection_name = (source)

Undocumented