fiftyone.zoo.datasets

package documentation

(source)

The FiftyOne Dataset Zoo.

This package defines a collection of open source datasets made available for download via FiftyOne.

Module	`base`	FiftyOne Zoo Datasets provided natively by the library.
Module	`tf`	FiftyOne Zoo Datasets provided by `tensorflow_datasets`.
Module	`torch`	FiftyOne Zoo Datasets provided by `torchvision:torchvision.datasets`.

From __init__.py:

Class	`DeprecatedZooDataset`	Class representing a zoo dataset that no longer exists in the FiftyOne Dataset Zoo.
Class	`RemoteZooDataset`	Class for working with remotely-sourced datasets that are compatible with the FiftyOne Dataset Zoo.
Class	`ZooDataset`	Base class for datasets made available in the FiftyOne Dataset Zoo.
Class	`ZooDatasetInfo`	Class containing info about a dataset in the FiftyOne Dataset Zoo.
Class	`ZooDatasetSplitInfo`	Class containing info about a split of a dataset in the FiftyOne Dataset Zoo.
Function	`delete_zoo_dataset`	Deletes the zoo dataset from local disk, if necessary.
Function	`download_zoo_dataset`	Downloads the specified dataset from the FiftyOne Dataset Zoo.
Function	`find_zoo_dataset`	Returns the directory containing the given zoo dataset.
Function	`get_zoo_dataset`	Returns the `ZooDataset` instance for the given dataset.
Function	`list_downloaded_zoo_datasets`	Returns information about the zoo datasets that have been downloaded.
Function	`list_zoo_dataset_sources`	Returns the list of available zoo dataset sources.
Function	`list_zoo_datasets`	Lists the available datasets in the FiftyOne Dataset Zoo.
Function	`load_zoo_dataset`	Loads the specified dataset from the FiftyOne Dataset Zoo.
Function	`load_zoo_dataset_info`	Loads the `ZooDatasetInfo` for the specified zoo dataset.
Constant	`DATASET_METADATA_FILENAMES`	Undocumented
Variable	`logger`	Undocumented
Function	`_download_archive`	Undocumented
Function	`_download_dataset_metadata`	Undocumented
Function	`_find_dataset_metadata`	Undocumented
Function	`_get_zoo_dataset_dir`	Undocumented
Function	`_get_zoo_dataset_sources`	Undocumented
Function	`_get_zoo_datasets`	Undocumented
Function	`_init_zoo_datasets`	Undocumented
Function	`_list_zoo_datasets`	Undocumented
Function	`_load_dataset_metadata`	Undocumented
Function	`_load_zoo_dataset_manifest`	Undocumented
Function	`_migrate_zoo_dataset_info`	Undocumented
Function	`_normalize_ref`	Undocumented
Function	`_overwrite_download`	Undocumented
Function	`_parse_dataset_details`	Undocumented
Function	`_parse_dataset_identifier`	Undocumented
Function	`_parse_splits`	Undocumented

def delete_zoo_dataset(name_or_url, split=None): (source) ¶

Deletes the zoo dataset from local disk, if necessary.

If a split is provided, only that split is deleted.

Parameters

name_or_url

the name of the zoo dataset, or its remote source, which can be:

a GitHub repo URL like https://github.com/<user>/<repo>
a GitHub ref like https://github.com/<user>/<repo>/tree/<branch> or https://github.com/<user>/<repo>/commit/<commit>
a GitHub ref string like <user>/<repo>[/<ref>]
a publicly accessible URL of an archive (eg zip or tar) file

split:None

def download_zoo_dataset(name_or_url, split=None, splits=None, overwrite=False, cleanup=True, **kwargs): (source) ¶

Downloads the specified dataset from the FiftyOne Dataset Zoo.

Any dataset splits that have already been downloaded are not re-downloaded, unless overwrite == True is specified.

Note

To download from a private GitHub repository that you have access to, provide your GitHub personal access token by setting the GITHUB_TOKEN environment variable.

Parameters
name_or_url	the name of the zoo dataset to download, or the remote source to download it from, which can be: a GitHub repo URL like `https://github.com/<user>/<repo>` a GitHub ref like `https://github.com/<user>/<repo>/tree/<branch>` or `https://github.com/<user>/<repo>/commit/<commit>` a GitHub ref string like `<user>/<repo>[/<ref>]` a publicly accessible URL of an archive (eg zip or tar) file
split:`None`	`("train", "validation", "test")`. If neither `split` nor `splits` are provided, all available splits are downloaded. Consult the documentation for the `ZooDataset` you specified to see the supported splits
splits:`None`	a list of splits to download, if applicable. Typical values are `("train", "validation", "test")`. If neither `split` nor `splits` are provided, all available splits are downloaded. Consult the documentation for the `ZooDataset` you specified to see the supported splits
overwrite:`False`	whether to overwrite any existing files
cleanup:`True`	whether to cleanup any temporary files generated during download
**kwargs	optional arguments for the `ZooDataset` constructor or the remote dataset's `download_and_prepare()` method
Returns
a tuple of	info: the `ZooDatasetInfo` for the dataset dataset_dir: the directory containing the dataset

def find_zoo_dataset(name_or_url, split=None): (source) ¶

Returns the directory containing the given zoo dataset.

If a split is provided, the path to the dataset split is returned; otherwise, the path to the root directory is returned.

The dataset must be downloaded. Use download_zoo_dataset to download datasets.

Parameters
name_or_url	the name of the zoo dataset or its remote source, which can be: a GitHub repo URL like `https://github.com/<user>/<repo>` a GitHub ref like `https://github.com/<user>/<repo>/tree/<branch>` or `https://github.com/<user>/<repo>/commit/<commit>` a GitHub ref string like `<user>/<repo>[/<ref>]` a publicly accessible URL of an archive (eg zip or tar) file
split:`None`	a specific split to locate
Returns
the directory containing the dataset or split
Raises
`ValueError`	if the dataset or split does not exist or has not been downloaded

def get_zoo_dataset(name_or_url, overwrite=False, **kwargs): (source) ¶

Returns the ZooDataset instance for the given dataset.

If the dataset is available from multiple sources, the default source is used.

Parameters
name_or_url	the name of the zoo dataset, or its remote source, which can be: a GitHub repo URL like `https://github.com/<user>/<repo>` a GitHub ref like `https://github.com/<user>/<repo>/tree/<branch>` or `https://github.com/<user>/<repo>/commit/<commit>` a GitHub ref string like `<user>/<repo>[/<ref>]` a publicly accessible URL of an archive (eg zip or tar) file
overwrite:`False`	whether to overwrite existing metadata if it has already been downloaded. Only applicable when `name_or_url` is a remote source
**kwargs	optional arguments for `ZooDataset`
Returns
the `ZooDataset` instance

def list_downloaded_zoo_datasets(): (source) ¶

Returns information about the zoo datasets that have been downloaded.

Returns
a dict mapping dataset names to (`dataset_dir`, `ZooDatasetInfo`) tuples

def list_zoo_dataset_sources(): (source) ¶

Returns the list of available zoo dataset sources.

Returns
a list of sources

def list_zoo_datasets(tags=None, source=None, license=None): (source) ¶

Lists the available datasets in the FiftyOne Dataset Zoo.

Also includes any remotely-sourced zoo datasets that you've downloaded.

Example usage:

import fiftyone as fo
import fiftyone.zoo as foz

#
# List all zoo datasets
#

names = foz.list_zoo_datasets()
print(names)

#
# List all zoo datasets with (both of) the specified tags
#

names = foz.list_zoo_datasets(tags=["image", "detection"])
print(names)

#
# List all zoo datasets available via the given source
#

names = foz.list_zoo_datasets(source="torch")
print(names)

Parameters
tags:`None`	only include datasets that have the specified tag or list of tags
source:`None`	only include datasets available via the given source or list of sources
license:`None`	only include datasets that are distributed under the specified license or any of the specified list of licenses. Run `fiftyone zoo datasets list` to see the available licenses
Returns
a sorted list of dataset names

def load_zoo_dataset(name_or_url, split=None, splits=None, label_field=None, dataset_name=None, download_if_necessary=True, drop_existing_dataset=False, persistent=False, overwrite=False, cleanup=True, progress=None, **kwargs): (source) ¶

Loads the specified dataset from the FiftyOne Dataset Zoo.

By default, the dataset will be downloaded if necessary.

Note

To download from a private GitHub repository that you have access to, provide your GitHub personal access token by setting the GITHUB_TOKEN environment variable.

If you do not specify a custom dataset_name and you have previously loaded the same zoo dataset and split(s) into FiftyOne, the existing dataset will be returned.

Parameters
name_or_url	the name of the zoo dataset to load, or the remote source to load it from, which can be: a GitHub repo URL like `https://github.com/<user>/<repo>` a GitHub ref like `https://github.com/<user>/<repo>/tree/<branch>` or `https://github.com/<user>/<repo>/commit/<commit>` a GitHub ref string like `<user>/<repo>[/<ref>]` a publicly accessible URL of an archive (eg zip or tar) file
split:`None`	`("train", "validation", "test")`. If neither `split` nor `splits` are provided, all available splits are loaded. Consult the documentation for the `ZooDataset` you specified to see the supported splits
splits:`None`	a list of splits to load, if applicable. Typical values are `("train", "validation", "test")`. If neither `split` nor `splits` are provided, all available splits are loaded. Consult the documentation for the `ZooDataset` you specified to see the supported splits
label_field:`None`	the label field (or prefix, if the dataset contains multiple label fields) in which to store the dataset's labels. By default, this is `"ground_truth"` if the dataset contains a single label field. If the dataset contains multiple label fields and this value is not provided, the labels will be stored under dataset-specific field names
dataset_name:`None`	an optional name to give the returned `fiftyone.core.dataset.Dataset`. By default, a name will be constructed based on the dataset and split(s) you are loading
download_if_necessary:`True`	whether to download the dataset if it is not found in the specified dataset directory
drop_existing_dataset:`False`	whether to drop an existing dataset with the same name if it exists
persistent:`False`	whether the dataset should persist in the database after the session terminates
overwrite:`False`	whether to overwrite any existing files if the dataset is to be downloaded
cleanup:`True`	whether to cleanup any temporary files generated during download
progress:`None`	whether to render a progress bar (True/False), use the default value `fiftyone.config.show_progress_bars` (None), or a progress callback function to invoke instead
**kwargs	optional arguments to pass to the `fiftyone.utils.data.importers.DatasetImporter` constructor or the remote dataset's load_dataset()` method. If ``download_if_necessary == True, then `kwargs` can also contain arguments for `download_zoo_dataset`
Returns
a `fiftyone.core.dataset.Dataset`

def load_zoo_dataset_info(name_or_url): (source) ¶

Loads the ZooDatasetInfo for the specified zoo dataset.

The dataset must be downloaded. Use download_zoo_dataset to download datasets.

Parameters
name_or_url	the name of the zoo dataset or its remote source, which can be: a GitHub repo URL like `https://github.com/<user>/<repo>` a GitHub ref like `https://github.com/<user>/<repo>/tree/<branch>` or `https://github.com/<user>/<repo>/commit/<commit>` a GitHub ref string like `<user>/<repo>[/<ref>]` a publicly accessible URL of an archive (eg zip or tar) file
Returns
the `ZooDatasetInfo` for the dataset
Raises
`ValueError`	if the dataset has not been downloaded

DATASET_METADATA_FILENAMES: tuple[str, ...] = (source) ¶

Undocumented

Value

('fiftyone.yml', 'fiftyone.yaml')

logger = (source) ¶

Undocumented

def _download_archive(url, outdir): (source) ¶

Undocumented

def _download_dataset_metadata(url_or_gh_repo, overwrite=False): (source) ¶

Undocumented

def _find_dataset_metadata(root_dir): (source) ¶

Undocumented

def _get_zoo_dataset_dir(name): (source) ¶

Undocumented

def _get_zoo_dataset_sources(zoo_datasets): (source) ¶

Undocumented

def _get_zoo_datasets(): (source) ¶

Undocumented

def _init_zoo_datasets(datasets): (source) ¶

Undocumented

def _list_zoo_datasets(tags=None, source=None, license=None): (source) ¶

Undocumented

def _load_dataset_metadata(dataset_dir): (source) ¶

Undocumented

def _load_zoo_dataset_manifest(manifest_path): (source) ¶

Undocumented

def _migrate_zoo_dataset_info(d): (source) ¶

Undocumented

def _normalize_ref(url_or_gh_repo): (source) ¶

Undocumented

def _overwrite_download(name_or_url, split=None, splits=None): (source) ¶

Undocumented

def _parse_dataset_details(name_or_url, overwrite=False, **kwargs): (source) ¶

Undocumented

def _parse_dataset_identifier(name_or_url): (source) ¶

Undocumented

def _parse_splits(split, splits): (source) ¶

Undocumented