module documentation

Utilities for working with the Kinetics dataset.

Copyright 2017-2025, Voxel51, Inc.

Class ClasswiseS3KineticsDatasetInfo Undocumented
Class Kinetics400DatasetInfo Kinetics 400-specific dataset info.
Class Kinetics600DatasetInfo Kinetics 600-specific dataset info.
Class Kinetics7002020DatasetInfo Kinetics 700-2020-specific dataset info.
Class Kinetics700DatasetInfo Kinetics 700-specific dataset info.
Class KineticsDatasetDownloader Clas that downloads and extracts Kinetics tars from AWS.
Class KineticsDatasetInfo Class that contains information such as paths, labels, and sample IDs for a Kinetics download.
Class KineticsDatasetManager Class that manages the sample IDs and labels that need to be downloaded as well as performing the actual downloading.
Class KineticsDownloadConfig Config class for a Kinetics download run.
Function download_kinetics_split Utility that downloads full or partial splits of the Kinetics dataset.
Variable logger Undocumented
Function _flatten_list Undocumented
Constant _ANNOTATION_DOWNLOAD_LINKS Undocumented
Constant _INFO_VERSION_MAP Undocumented
Constant _SPLIT_MAP Undocumented
def download_kinetics_split(dataset_dir, split, classes=None, num_workers=None, shuffle=None, seed=None, max_samples=None, retry_errors=False, scratch_dir=None, version='700-2020'): (source)

Utility that downloads full or partial splits of the Kinetics dataset.

The downloaded splits are stored on disk in :ref:`VideoClassificationDirectoryTree format <VideoClassificationDirectoryTree-import>`.

Parameters
dataset_dirthe directory to download the dataset
splitthe split to download. Supported values are ("train", "validation", "test")
classes:Nonea string or list of strings specifying required classes to load. If provided, only samples containing at least one instance of a specified class will be loaded
num_workers:Nonea suggested number of threads to use when downloading individual videos
shuffle:Falsewhether to randomly shuffle the order in which samples are chosen for partial downloads
seed:Nonea random seed to use when shuffling
max_samples:Nonea maximum number of samples to load per split. If classes are also specified, only up to the number of samples that contain at least one specified class will be loaded. By default, all matching samples are loaded
retry_errors:Falsewhether to retry downloading samples from YouTube that have previously raised an error
scratch_dir:Nonea scratch directory to use to store temporary files
version:"700-2020"the version of the Kinetics dataset to download ("400", "600", "700", or "700-2020")
Returns
a tuple of
  • num_samples: the total number of downloaded videos, or None if everything was already downloaded
  • classes: the list of all classes, or None if everything was already downloaded
  • did_download: whether any content was downloaded (True) or if all necessary files were already downloaded (False)

Undocumented

def _flatten_list(l): (source)

Undocumented

_ANNOTATION_DOWNLOAD_LINKS: dict[str, str] = (source)

Undocumented

Value
{'400': 'https://storage.googleapis.com/deepmind-media/Datasets/kinetics400.tar.
gz',
 '600': 'https://storage.googleapis.com/deepmind-media/Datasets/kinetics600.tar.
gz',
 '700': 'https://storage.googleapis.com/deepmind-media/Datasets/kinetics700.tar.
gz',
 '700-2020': 'https://storage.googleapis.com/deepmind-media/Datasets/kinetics700
...
_INFO_VERSION_MAP = (source)
_SPLIT_MAP: dict[str, str] = (source)

Undocumented

Value
{'test': 'test', 'train': 'train', 'validation': 'validate'}