module documentation

Utilities for working with the Open Images dataset.

Copyright 2017-2025, Voxel51, Inc.

Class OpenImagesDatasetImporter Base class for importing datasets in Open Images format.
Class OpenImagesV6DatasetImporter Base class for importing datasets in Open Images V6 format.
Class OpenImagesV7DatasetImporter Base class for importing datasets in Open Images V7 format.
Function download_open_images_split Utility that downloads full or partial splits of the Open Images dataset.
Function get_attributes Gets the list of relationship attributes in the Open Images dataset.
Function get_classes Gets the boxable classes that exist in classifications, detections, points, and relationships in the Open Images dataset.
Function get_point_classes Gets the list of classes that are labeled with points in the Open Images V7 dataset.
Function get_segmentation_classes Gets the list of classes (350) that are labeled with segmentations in the Open Images V6/V7 dataset.
Variable logger Undocumented
Function _create_classifications Undocumented
Function _create_detections Undocumented
Function _create_points Undocumented
Function _create_relationships Undocumented
Function _create_segmentations Undocumented
Function _download Undocumented
Function _download_file_if_necessary Undocumented
Function _download_images_if_necessary Undocumented
Function _download_masks_if_necessary Undocumented
Function _get_all_label_data Undocumented
Function _get_attrs_map Undocumented
Function _get_classes_map Undocumented
Function _get_dataframe_rows Undocumented
Function _get_downloaded_image_ids Undocumented
Function _get_general_metadata_file Undocumented
Function _get_hierarchy Undocumented
Function _get_label_data Undocumented
Function _get_pnt_classes_map Undocumented
Function _get_seg_classes Undocumented
Function _load_all_image_ids Undocumented
Function _parse_and_verify_image_ids Undocumented
Function _parse_csv Undocumented
Function _parse_image_ids Undocumented
Function _parse_label_types Undocumented
Function _rename_subcategories Undocumented
Function _setup Undocumented
Function _verify_image_ids Undocumented
Function _verify_version Undocumented
Constant _ANNOTATION_DOWNLOAD_URLS Undocumented
Constant _BUCKET_NAME Undocumented
Constant _CSV_DELIMITERS Undocumented
Constant _SUPPORTED_LABEL_TYPES Undocumented
Constant _SUPPORTED_LABEL_TYPES_V6 Undocumented
Constant _SUPPORTED_LABEL_TYPES_V7 Undocumented
Constant _SUPPORTED_SPLITS Undocumented
Constant _SUPPORTED_VERSIONS Undocumented
def download_open_images_split(dataset_dir, split, version='v6', label_types=None, classes=None, attrs=None, image_ids=None, num_workers=None, shuffle=None, seed=None, max_samples=None): (source)

Utility that downloads full or partial splits of the Open Images dataset.

See fiftyone.types.OpenImagesDataset for the format in which dataset_dir will be arranged.

Any existing files are not re-downloaded.

This method specifically downloads the subsets of annotations corresponding to the 600 boxable classes of Open Images. See here for other download options.

Parameters
dataset_dirthe directory to download the dataset
splitthe split to download. Supported values are ("train", "validation", "test")
version:"v7"the version of the Open Images dataset to download. Supported values are ("v6", "v7")
label_types:Nonea label type or list of label types to load. The supported values are ("detections", "classifications", "relationships", "segmentations") for "v6" and ("detections", "classifications", "points", "relationships", "segmentations") for "v7". By default, all label types are loaded
classes:Nonea string or list of strings specifying required classes to load. If provided, only samples containing at least one instance of a specified class will be loaded
attrs:Nonea string or list of strings specifying required relationship attributes to load. Only applicable when label_types includes "relationships". If provided, only samples containing at least one instance of a specified attribute will be loaded
image_ids:None

an optional list of specific image IDs to load. Can be provided in any of the following formats:

  • a list of <image-id> strings
  • a list of <split>/<image-id> strings
  • the path to a text (newline-separated), JSON, or CSV file containing the list of image IDs to load in either of the first two formats
num_workers:Nonea suggested number of threads to use when downloading individual images
shuffle:Falsewhether to randomly shuffle the order in which samples are chosen for partial downloads
seed:Nonea random seed to use when shuffling
max_samples:Nonea maximum number of samples to load per split. If label_types, classes, and/or attrs are also specified, first priority will be given to samples that contain all of the specified label types, classes, and/or attributes, followed by samples that contain at least one of the specified labels types or classes. The actual number of samples loaded may be less than this maximum value if the dataset does not contain sufficient samples matching your requirements. By default, all matching samples are loaded
Returns
a tuple of
  • num_samples: the total number of downloaded images, or None if everything was already downloaded
  • classes: the list of all classes, or None if everything was already downloaded
  • did_download: whether any content was downloaded (True) or if all necessary files were already downloaded (False)
def get_attributes(version='v7', dataset_dir=None): (source)

Gets the list of relationship attributes in the Open Images dataset.

Parameters
version:"v7"the version of the Open Images dataset. Supported values are ("v6", "v7")
dataset_dir:Nonean optional root directory the in which the dataset is downloaded
Returns
a sorted list of attribute names
def get_classes(version='v7', dataset_dir=None): (source)

Gets the boxable classes that exist in classifications, detections, points, and relationships in the Open Images dataset.

This method can be called in isolation without downloading the dataset.

Parameters
version:"v7"the version of the Open Images dataset. Supported values are ("v6", "v7")
dataset_dir:Nonean optional root directory the in which the dataset is downloaded
Returns
a sorted list of class name strings
def get_point_classes(version='v7', dataset_dir=None): (source)

Gets the list of classes that are labeled with points in the Open Images V7 dataset.

This method can be called in isolation without downloading the dataset.

Parameters
version:"v7"the version of the Open Images dataset. Supported values are ("v7")
dataset_dir:Nonean optional root directory in which the dataset is downloaded
Returns
a sorted list of segmentation class name strings
def get_segmentation_classes(version='v6', dataset_dir=None): (source)

Gets the list of classes (350) that are labeled with segmentations in the Open Images V6/V7 dataset.

This method can be called in isolation without downloading the dataset.

Parameters
version:"v6"the version of the Open Images dataset. Supported values are ("v6")
dataset_dir:Nonean optional root directory the in which the dataset is downloaded
Returns
a sorted list of segmentation class name strings

Undocumented

def _create_classifications(cls_data, image_id, classes_map): (source)

Undocumented

def _create_detections(det_data, image_id, classes_map): (source)

Undocumented

def _create_points(pnt_data, image_id, classes_map, dataset_dir): (source)

Undocumented

def _create_relationships(rel_data, image_id, classes_map, attrs_map): (source)

Undocumented

def _create_segmentations(seg_data, image_id, classes_map, dataset_dir): (source)

Undocumented

def _download(image_ids, downloaded_ids, oi_classes, oi_attrs, seg_classes, pnt_classes, dataset_dir, split, label_types=None, classes=None, attrs=None, max_samples=None, shuffle=False, num_workers=None, download=True, version=None): (source)

Undocumented

def _download_file_if_necessary(filepath, url, is_zip=False, quiet=-1, download=True): (source)

Undocumented

def _download_images_if_necessary(image_ids, split, dataset_dir, num_workers=None, download=True): (source)

Undocumented

def _download_masks_if_necessary(image_ids, dataset_dir, split, download=True): (source)

Undocumented

def _get_all_label_data(dataset_dir, image_ids, label_types=None, classes=None, oi_classes=None, attrs=None, oi_attrs=None, seg_classes=None, pnt_classes=None, download_only=False, ids_only=False, track_all_ids=True, only_matching=False, split=None, download=False, version=None): (source)

Undocumented

def _get_attrs_map(dataset_dir, download=True): (source)

Undocumented

def _get_classes_map(dataset_dir, download=True): (source)

Undocumented

def _get_dataframe_rows(df, image_id): (source)

Undocumented

def _get_downloaded_image_ids(dataset_dir): (source)

Undocumented

def _get_general_metadata_file(dataset_dir, filename, url, download=True): (source)

Undocumented

def _get_hierarchy(dataset_dir, classes_map=None, download=True): (source)

Undocumented

def _get_label_data(dataset_dir, image_ids, label_type, classes=None, oi_classes=None, download_only=False, ids_only=False, track_all_ids=True, only_matching=False, url=None, download=True): (source)

Undocumented

def _get_pnt_classes_map(dataset_dir, classes_map=None, download=True): (source)

Undocumented

def _get_seg_classes(dataset_dir, classes_map=None, download=True): (source)

Undocumented

def _load_all_image_ids(dataset_dir, split=None, download=True): (source)

Undocumented

def _parse_and_verify_image_ids(image_ids, dataset_dir, split, download=True): (source)

Undocumented

def _parse_csv(filename, dataframe=False, index_col=None): (source)

Undocumented

def _parse_image_ids(image_ids, ignore_split=False): (source)

Undocumented

def _parse_label_types(version, label_types): (source)

Undocumented

def _rename_subcategories(hierarchy, classes_map): (source)

Undocumented

def _setup(dataset_dir, label_types=None, classes=None, attrs=None, seed=None, download=False): (source)

Undocumented

def _verify_image_ids(selected_split_ids, unspecified_ids, dataset_dir, split, download=True): (source)

Undocumented

def _verify_version(version): (source)

Undocumented

_ANNOTATION_DOWNLOAD_URLS: dict = (source)

Undocumented

Value
{'general': {'class_names': 'https://storage.googleapis.com/openimages/v5/class-
descriptions-boxable.csv',
             'attr_names': 'https://storage.googleapis.com/openimages/v6/oidv6-a
ttributes-description.csv',
             'hierarchy': 'https://storage.googleapis.com/openimages/2018_04/bbo
x_labels_600_hierarchy.json',
             'segmentation_classes': 'https://storage.googleapis.com/openimages/
...
_BUCKET_NAME: str = (source)

Undocumented

Value
'open-images-dataset'
_CSV_DELIMITERS: list[str] = (source)

Undocumented

Value
[',', ';', ':', ' ', '\t', '\n']
_SUPPORTED_LABEL_TYPES = (source)

Undocumented

Value
{'v6': _SUPPORTED_LABEL_TYPES_V6, 'v7': _SUPPORTED_LABEL_TYPES_V7}
_SUPPORTED_LABEL_TYPES_V6: list[str] = (source)

Undocumented

Value
['classifications', 'detections', 'relationships', 'segmentations']
_SUPPORTED_LABEL_TYPES_V7: list[str] = (source)

Undocumented

Value
['classifications', 'detections', 'points', 'relationships', 'segmentations']
_SUPPORTED_SPLITS: list[str] = (source)

Undocumented

Value
['train', 'test', 'validation']
_SUPPORTED_VERSIONS: list[str] = (source)

Undocumented

Value
['v6', 'v7']