module documentation

Sama utilities.

Copyright 2017-2025, Voxel51, Inc.

Function download_sama_coco_dataset_split Utility that downloads full or partial data splits of the COCO dataset with annotation splits found at https://www.sama.com/sama-coco-dataset.
Variable logger Undocumented
Function _merge_annotations Undocumented
Constant _ANNOTATION_DOWNLOAD_LINKS Undocumented
Constant _ANNOTATION_PATHS Undocumented
Constant _CSV_DELIMITERS Undocumented
Constant _IMAGE_DOWNLOAD_LINKS Undocumented
Constant _SPLIT_SIZES Undocumented
Constant _SUPPORTED_LABEL_TYPES Undocumented
Constant _SUPPORTED_SPLITS Undocumented
Constant _TEST_INFO_DOWNLOAD_LINK Undocumented
Constant _TEST_INFO_PATHS Undocumented
def download_sama_coco_dataset_split(dataset_dir, split, label_types=None, classes=None, image_ids=None, num_workers=None, shuffle=None, seed=None, max_samples=None, raw_dir=None, scratch_dir=None): (source)

Utility that downloads full or partial data splits of the COCO dataset with annotation splits found at https://www.sama.com/sama-coco-dataset.

See :ref:`this page <COCODetectionDataset-export>` for the format in which dataset_dir will be arranged.

Any existing files are not re-downloaded.

Parameters
dataset_dirthe directory to download the dataset
splitthe split to download. Supported values are ("train", "validation", "test")
label_types:Nonea label type or list of label types to load. The supported values are ("detections", "segmentations"). By default, all label types are loaded
classes:Nonea string or list of strings specifying required classes to load. Only samples containing at least one instance of a specified class will be loaded
image_ids:None

an optional list of specific image IDs to load. Can be provided in any of the following formats:

  • a list of <image-id> ints or strings
  • a list of <split>/<image-id> strings
  • the path to a text (newline-separated), JSON, or CSV file containing the list of image IDs to load in either of the first two formats
num_workers:Nonea suggested number of threads to use when downloading individual images
shuffle:Falsewhether to randomly shuffle the order in which samples are chosen for partial downloads
seed:Nonea random seed to use when shuffling
max_samples:Nonea maximum number of samples to load. If label_types and/or classes are also specified, first priority will be given to samples that contain all of the specified label types and/or classes, followed by samples that contain at least one of the specified labels types or classes. The actual number of samples loaded may be less than this maximum value if the dataset does not contain sufficient samples matching your requirements. By default, all matching samples are loaded
raw_dir:Nonea directory in which full annotations files may be stored to avoid re-downloads in the future
scratch_dir:Nonea scratch directory to use to download any necessary temporary files
Returns
a tuple of
  • num_samples: the total number of downloaded images
  • classes: the list of all classes
  • did_download: whether any content was downloaded (True) or if all necessary files were already downloaded (False)

Undocumented

def _merge_annotations(merge_dir, output): (source)

Undocumented

_ANNOTATION_DOWNLOAD_LINKS: dict[str, str] = (source)

Undocumented

Value
{'train': 'https://sama-documentation-assets.s3.amazonaws.com/sama-coco/sama-coc
o-train.zip',
 'validation': 'https://sama-documentation-assets.s3.amazonaws.com/sama-coco/sam
a-coco-val.zip'}
_ANNOTATION_PATHS: dict[str, str] = (source)

Undocumented

Value
{'train': 'annotations/sama_coco_train.json',
 'validation': 'annotations/sama_coco_validation.json'}
_CSV_DELIMITERS: list[str] = (source)

Undocumented

Value
[',', ';', ':', ' ', '\t', '\n']
_IMAGE_DOWNLOAD_LINKS: dict[str, str] = (source)

Undocumented

Value
{'train': 'http://images.cocodataset.org/zips/train2017.zip',
 'validation': 'http://images.cocodataset.org/zips/val2017.zip',
 'test': 'http://images.cocodataset.org/zips/test2017.zip'}
_SPLIT_SIZES: dict[str, int] = (source)

Undocumented

Value
{'train': 118287, 'test': 40670, 'validation': 5000}
_SUPPORTED_LABEL_TYPES: list[str] = (source)

Undocumented

Value
['detections', 'segmentations']
_SUPPORTED_SPLITS: list[str] = (source)

Undocumented

Value
['train', 'validation', 'test']
_TEST_INFO_DOWNLOAD_LINK: str = (source)

Undocumented

Value
'http://images.cocodataset.org/annotations/image_info_test2017.zip'
_TEST_INFO_PATHS: str = (source)

Undocumented

Value
'annotations/image_info_test2017.json'