«
package documentation

The FiftyOne Dataset Zoo.

This package defines a collection of open source datasets made available for download via FiftyOne.

Copyright 2017-2025, Voxel51, Inc.

Module base FiftyOne Zoo Datasets provided natively by the library.
Module tf FiftyOne Zoo Datasets provided by tensorflow_datasets.
Module torch FiftyOne Zoo Datasets provided by torchvision:torchvision.datasets.

From __init__.py:

Class DeprecatedZooDataset Class representing a zoo dataset that no longer exists in the FiftyOne Dataset Zoo.
Class RemoteZooDataset Class for working with remotely-sourced datasets that are compatible with the FiftyOne Dataset Zoo.
Class ZooDataset Base class for datasets made available in the FiftyOne Dataset Zoo.
Class ZooDatasetInfo Class containing info about a dataset in the FiftyOne Dataset Zoo.
Class ZooDatasetSplitInfo Class containing info about a split of a dataset in the FiftyOne Dataset Zoo.
Function delete_zoo_dataset Deletes the zoo dataset from local disk, if necessary.
Function download_zoo_dataset Downloads the specified dataset from the FiftyOne Dataset Zoo.
Function find_zoo_dataset Returns the directory containing the given zoo dataset.
Function get_zoo_dataset Returns the ZooDataset instance for the given dataset.
Function list_downloaded_zoo_datasets Returns information about the zoo datasets that have been downloaded.
Function list_zoo_dataset_sources Returns the list of available zoo dataset sources.
Function list_zoo_datasets Lists the available datasets in the FiftyOne Dataset Zoo.
Function load_zoo_dataset Loads the specified dataset from the FiftyOne Dataset Zoo.
Function load_zoo_dataset_info Loads the ZooDatasetInfo for the specified zoo dataset.
Constant DATASET_METADATA_FILENAMES Undocumented
Variable logger Undocumented
Function _download_archive Undocumented
Function _download_dataset_metadata Undocumented
Function _find_dataset_metadata Undocumented
Function _get_zoo_dataset_dir Undocumented
Function _get_zoo_dataset_sources Undocumented
Function _get_zoo_datasets Undocumented
Function _init_zoo_datasets Undocumented
Function _list_zoo_datasets Undocumented
Function _load_dataset_metadata Undocumented
Function _load_zoo_dataset_manifest Undocumented
Function _migrate_zoo_dataset_info Undocumented
Function _normalize_ref Undocumented
Function _overwrite_download Undocumented
Function _parse_dataset_details Undocumented
Function _parse_dataset_identifier Undocumented
Function _parse_splits Undocumented
def delete_zoo_dataset(name_or_url, split=None): (source)

Deletes the zoo dataset from local disk, if necessary.

If a split is provided, only that split is deleted.

Parameters
name_or_url

the name of the zoo dataset, or its remote source, which can be:

  • a GitHub repo URL like https://github.com/<user>/<repo>
  • a GitHub ref like https://github.com/<user>/<repo>/tree/<branch> or https://github.com/<user>/<repo>/commit/<commit>
  • a GitHub ref string like <user>/<repo>[/<ref>]
  • a publicly accessible URL of an archive (eg zip or tar) file
split:None
def download_zoo_dataset(name_or_url, split=None, splits=None, overwrite=False, cleanup=True, **kwargs): (source)

Downloads the specified dataset from the FiftyOne Dataset Zoo.

Any dataset splits that have already been downloaded are not re-downloaded, unless overwrite == True is specified.

Note

To download from a private GitHub repository that you have access to, provide your GitHub personal access token by setting the GITHUB_TOKEN environment variable.

Parameters
name_or_url

the name of the zoo dataset to download, or the remote source to download it from, which can be:

  • a GitHub repo URL like https://github.com/<user>/<repo>
  • a GitHub ref like https://github.com/<user>/<repo>/tree/<branch> or https://github.com/<user>/<repo>/commit/<commit>
  • a GitHub ref string like <user>/<repo>[/<ref>]
  • a publicly accessible URL of an archive (eg zip or tar) file
split:None("train", "validation", "test"). If neither split nor splits are provided, all available splits are downloaded. Consult the documentation for the ZooDataset you specified to see the supported splits
splits:Nonea list of splits to download, if applicable. Typical values are ("train", "validation", "test"). If neither split nor splits are provided, all available splits are downloaded. Consult the documentation for the ZooDataset you specified to see the supported splits
overwrite:Falsewhether to overwrite any existing files
cleanup:Truewhether to cleanup any temporary files generated during download
**kwargsoptional arguments for the ZooDataset constructor or the remote dataset's download_and_prepare() method
Returns
a tuple of
  • info: the ZooDatasetInfo for the dataset
  • dataset_dir: the directory containing the dataset
def find_zoo_dataset(name_or_url, split=None): (source)

Returns the directory containing the given zoo dataset.

If a split is provided, the path to the dataset split is returned; otherwise, the path to the root directory is returned.

The dataset must be downloaded. Use download_zoo_dataset to download datasets.

Parameters
name_or_url

the name of the zoo dataset or its remote source, which can be:

  • a GitHub repo URL like https://github.com/<user>/<repo>
  • a GitHub ref like https://github.com/<user>/<repo>/tree/<branch> or https://github.com/<user>/<repo>/commit/<commit>
  • a GitHub ref string like <user>/<repo>[/<ref>]
  • a publicly accessible URL of an archive (eg zip or tar) file
split:Nonea specific split to locate
Returns
the directory containing the dataset or split
Raises
ValueErrorif the dataset or split does not exist or has not been downloaded
def get_zoo_dataset(name_or_url, overwrite=False, **kwargs): (source)

Returns the ZooDataset instance for the given dataset.

If the dataset is available from multiple sources, the default source is used.

Parameters
name_or_url

the name of the zoo dataset, or its remote source, which can be:

  • a GitHub repo URL like https://github.com/<user>/<repo>
  • a GitHub ref like https://github.com/<user>/<repo>/tree/<branch> or https://github.com/<user>/<repo>/commit/<commit>
  • a GitHub ref string like <user>/<repo>[/<ref>]
  • a publicly accessible URL of an archive (eg zip or tar) file
overwrite:Falsewhether to overwrite existing metadata if it has already been downloaded. Only applicable when name_or_url is a remote source
**kwargsoptional arguments for ZooDataset
Returns
the ZooDataset instance
def list_downloaded_zoo_datasets(): (source)

Returns information about the zoo datasets that have been downloaded.

Returns
a dict mapping dataset names to (dataset_dir, ZooDatasetInfo) tuples
def list_zoo_dataset_sources(): (source)

Returns the list of available zoo dataset sources.

Returns
a list of sources
def list_zoo_datasets(tags=None, source=None): (source)

Lists the available datasets in the FiftyOne Dataset Zoo.

Also includes any remotely-sourced zoo datasets that you've downloaded.

Example usage:

import fiftyone as fo
import fiftyone.zoo as foz

#
# List all zoo datasets
#

names = foz.list_zoo_datasets()
print(names)

#
# List all zoo datasets with (both of) the specified tags
#

names = foz.list_zoo_datasets(tags=["image", "detection"])
print(names)

#
# List all zoo datasets available via the given source
#

names = foz.list_zoo_datasets(source="torch")
print(names)
Parameters
tags:Noneonly include datasets that have the specified tag or list of tags
source:Noneonly include datasets available via the given source or list of sources
Returns
a sorted list of dataset names
def load_zoo_dataset(name_or_url, split=None, splits=None, label_field=None, dataset_name=None, download_if_necessary=True, drop_existing_dataset=False, persistent=False, overwrite=False, cleanup=True, progress=None, **kwargs): (source)

Loads the specified dataset from the FiftyOne Dataset Zoo.

By default, the dataset will be downloaded if necessary.

Note

To download from a private GitHub repository that you have access to, provide your GitHub personal access token by setting the GITHUB_TOKEN environment variable.

If you do not specify a custom dataset_name and you have previously loaded the same zoo dataset and split(s) into FiftyOne, the existing dataset will be returned.

Parameters
name_or_url

the name of the zoo dataset to load, or the remote source to load it from, which can be:

  • a GitHub repo URL like https://github.com/<user>/<repo>
  • a GitHub ref like https://github.com/<user>/<repo>/tree/<branch> or https://github.com/<user>/<repo>/commit/<commit>
  • a GitHub ref string like <user>/<repo>[/<ref>]
  • a publicly accessible URL of an archive (eg zip or tar) file
split:None("train", "validation", "test"). If neither split nor splits are provided, all available splits are loaded. Consult the documentation for the ZooDataset you specified to see the supported splits
splits:Nonea list of splits to load, if applicable. Typical values are ("train", "validation", "test"). If neither split nor splits are provided, all available splits are loaded. Consult the documentation for the ZooDataset you specified to see the supported splits
label_field:Nonethe label field (or prefix, if the dataset contains multiple label fields) in which to store the dataset's labels. By default, this is "ground_truth" if the dataset contains a single label field. If the dataset contains multiple label fields and this value is not provided, the labels will be stored under dataset-specific field names
dataset_name:Nonean optional name to give the returned fiftyone.core.dataset.Dataset. By default, a name will be constructed based on the dataset and split(s) you are loading
download_if_necessary:Truewhether to download the dataset if it is not found in the specified dataset directory
drop_existing_dataset:Falsewhether to drop an existing dataset with the same name if it exists
persistent:Falsewhether the dataset should persist in the database after the session terminates
overwrite:Falsewhether to overwrite any existing files if the dataset is to be downloaded
cleanup:Truewhether to cleanup any temporary files generated during download
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
**kwargsoptional arguments to pass to the fiftyone.utils.data.importers.DatasetImporter constructor or the remote dataset's load_dataset()` method. If ``download_if_necessary == True, then kwargs can also contain arguments for download_zoo_dataset
Returns
a fiftyone.core.dataset.Dataset
def load_zoo_dataset_info(name_or_url): (source)

Loads the ZooDatasetInfo for the specified zoo dataset.

The dataset must be downloaded. Use download_zoo_dataset to download datasets.

Parameters
name_or_url

the name of the zoo dataset or its remote source, which can be:

  • a GitHub repo URL like https://github.com/<user>/<repo>
  • a GitHub ref like https://github.com/<user>/<repo>/tree/<branch> or https://github.com/<user>/<repo>/commit/<commit>
  • a GitHub ref string like <user>/<repo>[/<ref>]
  • a publicly accessible URL of an archive (eg zip or tar) file
Returns
the ZooDatasetInfo for the dataset
Raises
ValueErrorif the dataset has not been downloaded
DATASET_METADATA_FILENAMES: tuple[str, ...] = (source)

Undocumented

Value
('fiftyone.yml', 'fiftyone.yaml')

Undocumented

def _download_archive(url, outdir): (source)

Undocumented

def _download_dataset_metadata(url_or_gh_repo, overwrite=False): (source)

Undocumented

def _find_dataset_metadata(root_dir): (source)

Undocumented

def _get_zoo_dataset_dir(name): (source)

Undocumented

def _get_zoo_dataset_sources(zoo_datasets): (source)

Undocumented

def _get_zoo_datasets(): (source)

Undocumented

def _init_zoo_datasets(datasets): (source)

Undocumented

def _list_zoo_datasets(tags=None, source=None): (source)

Undocumented

def _load_dataset_metadata(dataset_dir): (source)

Undocumented

def _load_zoo_dataset_manifest(manifest_path): (source)

Undocumented

def _migrate_zoo_dataset_info(d): (source)

Undocumented

def _normalize_ref(url_or_gh_repo): (source)

Undocumented

def _overwrite_download(name_or_url, split=None, splits=None): (source)

Undocumented

def _parse_dataset_details(name_or_url, overwrite=False, **kwargs): (source)

Undocumented

def _parse_dataset_identifier(name_or_url): (source)

Undocumented

def _parse_splits(split, splits): (source)

Undocumented