Dataset Zoo API ¶¶
You can interact with the Dataset Zoo either via the Python library or the CLI:
Listing zoo datasets ¶¶
Getting information about zoo datasets ¶¶
Downloading zoo datasets ¶¶
Loading zoo datasets ¶¶
Loading zoo datasets with manual downloads ¶¶
Some zoo datasets such as
BDD100K
and Cityscapes
require
that you create accounts on a website and manually download the source files.
In such cases, the ZooDataset
class
will provide additional argument(s) that let you specify the paths to these
files that you have manually downloaded on disk.
You can load these datasets into FiftyOne by first calling
download_zoo_dataset()
with the appropriate keyword arguments (which are passed to the underlying
ZooDataset
constructor) to wrangle
the raw download into FiftyOne format, and then calling
load_zoo_dataset()
or using
fiftyone zoo datasets load to load the
dataset into FiftyOne.
For example, the following snippet shows how to load the BDD100K dataset from the zoo:
import fiftyone.zoo as foz
# First parse the manually downloaded files in `source_dir`
foz.download_zoo_dataset(
"bdd100k", source_dir="/path/to/dir-with-bdd100k-files"
)
# Now load into FiftyOne
dataset = foz.load_zoo_dataset("bdd100k", split="validation")
Controlling where zoo datasets are downloaded ¶¶
By default, zoo datasets are downloaded into subdirectories of
fiftyone.config.dataset_zoo_dir
corresponding to their names.
You can customize this backend by modifying the dataset_zoo_dir
setting
of your FiftyOne config.
Deleting zoo datasets ¶¶
Adding datasets to the zoo ¶¶
We frequently add new built-in datasets to the Dataset Zoo, which will automatically become accessible to you when you update your FiftyOne package.
Note
FiftyOne is open source! You are welcome to contribute datasets to the public dataset zoo by submitting a pull request to the GitHub repository.
You can also add your own datasets to your local dataset zoo, enabling you to
work with these datasets via the fiftyone.zoo.datasets
package and the
CLI using the same syntax that you would with publicly available datasets.
To add dataset(s) to your local zoo, you simply write a JSON manifest file in
the format below to tell FiftyOne about the dataset. For example, the manifest
below adds a second copy of the quickstart
dataset to the zoo under the
alias quickstart-copy
:
{
"custom": {
"quickstart-copy": "fiftyone.zoo.datasets.base.QuickstartDataset"
}
}
In the above, custom
specifies the source of the dataset, which can be an
arbitrary string and simply controls the column of the
fiftyone zoo datasets list listing in
which the dataset is annotated; quickstart-copy
is the name of the new
dataset; and fiftyone.zoo.datasets.base.QuickstartDataset
is the
fully-qualified class name of the
ZooDataset class
for the dataset,
which specifies how to download and load the dataset into FiftyOne. This class
can be defined anywhere that is importable at runtime in your environment.
Finally, expose your new dataset(s) to FiftyOne by adding your manifest to the
dataset_zoo_manifest_paths
parameter of your
FiftyOne config. One way to do this is to set the
FIFTYONE_DATASET_ZOO_MANIFEST_PATHS
environment variable:
export FIFTYONE_DATASET_ZOO_MANIFEST_PATHS=/path/to/custom/manifest.json
Now you can access the quickstart-copy
dataset as you would any other zoo
dataset:
# Will contain `quickstart-copy`
fiftyone zoo datasets list
# Load custom dataset into FiftyOne
fiftyone zoo datasets load quickstart-copy
Customizing your ML backend ¶¶
Behind the scenes, FiftyOne uses either TensorFlow Datasets or TorchVision Datasets libraries to download and wrangle some zoo datasets, depending on which ML library you have installed. In order to load datasets using TF, you must have the tensorflow-datasets package installed on your machine. In order to load datasets using PyTorch, you must have the torch and torchvision packages installed.
Note that the ML backends may expose different datasets.
For datasets that require an ML backend, FiftyOne will use whichever ML backend
is necessary to download the requested zoo dataset. If a dataset is available
through both backends, it will use the backend specified by the
fo.config.default_ml_backend
setting in your FiftyOne config.
You can customize this backend by modifying the default_ml_backend
setting
of your FiftyOne config.