module documentation

Data utilities.

Copyright 2017-2025, Voxel51, Inc.

Function download_image_classification_dataset Downloads the classification dataset specified by the given CSV file, which should have the following format:
Function download_images Downloads the images from the given URLs.
Function parse_image_classification_dir_tree Parses the contents of the given image classification dataset directory tree, which should have the following format:
Function parse_images_dir Parses the contents of the given directory of images.
Function parse_videos_dir Parses the contents of the given directory of videos.
Variable logger Undocumented
Function _download_image Undocumented
Function _download_images Undocumented
Function _download_images_multi Undocumented
def download_image_classification_dataset(csv_path, dataset_dir, classes=None, num_workers=None): (source)

Downloads the classification dataset specified by the given CSV file, which should have the following format:

<label1>,<image_url1>
<label2>,<image_url2>
...

The image filenames are the basenames of the URLs, which are assumed to be unique.

The dataset is written to disk in fiftyone.types.FiftyOneImageClassificationDataset format.

Parameters
csv_patha CSV file containing the labels and image URLs
dataset_dirthe directory to write the dataset
classes:Nonean optional list of classes. By default, this will be inferred from the contents of csv_path
num_workers:Nonea suggested number of threads to use to download images
def download_images(image_urls, output_dir, num_workers=None): (source)

Downloads the images from the given URLs.

The filenames in output_dir are the basenames of the URLs, which are assumed to be unique.

Parameters
image_urlsa list of image URLs to download
output_dirthe directory to write the images
num_workers:Nonea suggested number of threads to use
Returns
the list of downloaded image paths
def parse_image_classification_dir_tree(dataset_dir): (source)

Parses the contents of the given image classification dataset directory tree, which should have the following format:

<dataset_dir>/
    <classA>/
        <image1>.<ext>
        <image2>.<ext>
        ...
    <classB>/
        <image1>.<ext>
        <image2>.<ext>
        ...
Parameters
dataset_dirthe dataset directory
Returns
samplesa list of (image_path, target) pairs classes: a list of class label strings
def parse_images_dir(dataset_dir, recursive=True): (source)

Parses the contents of the given directory of images.

Parameters
dataset_dirthe dataset directory
recursive:Truewhether to recursively traverse subdirectories
Returns
a list of image paths
def parse_videos_dir(dataset_dir, recursive=True): (source)

Parses the contents of the given directory of videos.

Parameters
dataset_dirthe dataset directory
recursive:Truewhether to recursively traverse subdirectories
Returns
a list of video paths

Undocumented

def _download_image(args): (source)

Undocumented

def _download_images(inputs): (source)

Undocumented

def _download_images_multi(inputs, num_workers): (source)

Undocumented