fiftyone.zoo.datasets.base.SamaCOCODataset

class documentation

class SamaCOCODataset(FiftyOneDataset): (source)

Constructor: SamaCOCODataset(label_types, classes, image_ids, num_workers, ...)

Sama-COCO is a large-scale object detection, segmentation, and captioning dataset based on COCO2017. It is a relabeling of the original training and validation sets with tighter polygons and more individually annotated crowds.

This version contains images, bounding boxes and segmentations for Sama's version of the 2017 version of the dataset.

This dataset supports partial downloads:

You can specify subsets of data to download via the label_types, classes, and max_samples parameters
You can specify specific images to load via the image_ids parameter

See :ref:`this page <dataset-zoo-sama-coco>` for more information about partial downloads of this dataset.

Full split stats:

Train split: 118,287 images
Test split: 40,670 images
Validation split: 5,000 images

Notes:

COCO defines 91 classes but the data only uses 80 classes
Some images from the train and validation sets don't have annotations
The test set does not have annotations
Sama-COCO may have some discrepancies with COCO-2017 in terms of the instances labeled

Example usage:

import fiftyone as fo
import fiftyone.zoo as foz

#
# Load 50 random samples from the validation split
#
# By default, only detections are loaded
#

dataset = foz.load_zoo_dataset(
    "sama-coco",
    split="validation",
    max_samples=50,
    shuffle=True,
)

session = fo.launch_app(dataset)

#
# Load segmentations for 25 samples from the validation split that
# contain cats and dogs
#
# Images that contain all `classes` will be prioritized first, followed
# by images that contain at least one of the required `classes`. If
# there are not enough images matching `classes` in the split to meet
# `max_samples`, only the available images will be loaded.
#
# Images will only be downloaded if necessary
#

dataset = foz.load_zoo_dataset(
    "sama-coco",
    split="validation",
    label_types=["segmentations"],
    classes=["cat", "dog"],
    max_samples=25,
)

session.dataset = dataset

#
# Download the entire validation split and load both detections and
# segmentations
#
# Subsequent partial loads of the validation split will never require
# downloading any images
#

dataset = foz.load_zoo_dataset(
    "sama-coco",
    split="validation",
    label_types=["detections", "segmentations"],
)

session.dataset = dataset

Dataset size: 25.67 GB
Source: https://www.sama.com/sama-coco-dataset/

Parameters
label_types	a label type or list of label types to load. The supported values are `("detections", "segmentations")`. By default, only "detections" are loaded
classes	a string or list of strings specifying required classes to load. If provided, only samples containing at least one instance of a specified class will be loaded
image_ids	an optional list of specific image IDs to load. Can be provided in any of the following formats: a list of `<image-id>` ints or strings a list of `<split>/<image-id>` strings the path to a text (newline-separated), JSON, or CSV file containing the list of image IDs to load in either of the first two formats
num_workers	a suggested number of threads to use when downloading individual images
shuffle	whether to randomly shuffle the order in which samples are chosen for partial downloads
seed	a random seed to use when shuffling
max_samples	a maximum number of samples to load per split. If `label_types` and/or `classes` are also specified, first priority will be given to samples that contain all of the specified label types and/or classes, followed by samples that contain at least one of the specified labels types or classes. The actual number of samples loaded may be less than this maximum value if the dataset does not contain sufficient samples matching your requirements. By default, all matching samples are loaded

Method	`__init__`	Undocumented
Instance Variable	`classes`	Undocumented
Instance Variable	`image_ids`	Undocumented
Instance Variable	`label_types`	Undocumented
Instance Variable	`max_samples`	Undocumented
Instance Variable	`num_workers`	Undocumented
Instance Variable	`seed`	Undocumented
Instance Variable	`shuffle`	Undocumented
Property	`importer_kwargs`	A dict of default kwargs to pass to this dataset's `fiftyone.utils.data.importers.DatasetImporter`.
Property	`license`	The license or list,of,licenses under which the dataset is distributed, or None if unknown.
Property	`name`	The name of the dataset.
Property	`supported_splits`	A tuple of supported splits for the dataset, or None if the dataset does not have splits.
Property	`supports_partial_downloads`	Whether the dataset supports downloading partial subsets of its splits.
Property	`tags`	A tuple of tags for the dataset.
Method	`_download_and_prepare`	Internal implementation of downloading the dataset and preparing it for use in the given directory.
Method	`_get_raw_dir`	Undocumented

Inherited from ZooDataset (via FiftyOneDataset):

Static Method	`get_info_path`	Returns the path to the `ZooDatasetInfo` for the dataset.
Static Method	`has_info`	Determines whether the directory contains `ZooDatasetInfo`.
Static Method	`load_info`	Loads the `ZooDatasetInfo` from the given dataset directory.
Method	`download_and_prepare`	Downloads the dataset and prepares it for use.
Method	`get_split_dir`	Returns the directory for the given split of the dataset.
Method	`has_split`	Whether the dataset has the given split.
Method	`has_tag`	Whether the dataset has the given tag.
Property	`has_patches`	Whether the dataset has patches that may need to be applied to already downloaded files.
Property	`has_splits`	Whether the dataset has splits.
Property	`has_tags`	Whether the dataset has tags.
Property	`is_remote`	Whether the dataset is remotely-sourced.
Property	`parameters`	An optional dict of parameters describing the configuration of the zoo dataset when it was downloaded.
Property	`requires_manual_download`	Whether this dataset requires some files to be manually downloaded by the user before the dataset can be loaded.
Method	`_get_splits_to_download`	Undocumented
Method	`_is_dataset_ready`	Undocumented
Method	`_is_split_ready`	Undocumented
Method	`_patch_if_necessary`	Internal method called when an already downloaded dataset may need to be patched.

def __init__(self, label_types=None, classes=None, image_ids=None, num_workers=None, shuffle=None, seed=None, max_samples=None): (source) ¶

Undocumented

classes: None = (source) ¶

Undocumented

image_ids: None = (source) ¶

Undocumented

label_types: None = (source) ¶

Undocumented

max_samples: None = (source) ¶

Undocumented

num_workers: None = (source) ¶

Undocumented

seed: None = (source) ¶

Undocumented

shuffle: False = (source) ¶

Undocumented

@property
importer_kwargs = (source) ¶

overrides fiftyone.zoo.datasets.ZooDataset.importer_kwargs

A dict of default kwargs to pass to this dataset's fiftyone.utils.data.importers.DatasetImporter.

@property
license = (source) ¶

overrides fiftyone.zoo.datasets.ZooDataset.license

The license or list,of,licenses under which the dataset is distributed, or None if unknown.

@property
name = (source) ¶

overrides fiftyone.zoo.datasets.ZooDataset.name

The name of the dataset.

@property
supported_splits = (source) ¶

overrides fiftyone.zoo.datasets.ZooDataset.supported_splits

A tuple of supported splits for the dataset, or None if the dataset does not have splits.

@property
supports_partial_downloads = (source) ¶

overrides fiftyone.zoo.datasets.ZooDataset.supports_partial_downloads

Whether the dataset supports downloading partial subsets of its splits.

@property
tags = (source) ¶

overrides fiftyone.zoo.datasets.ZooDataset.tags

A tuple of tags for the dataset.

def _download_and_prepare(self, dataset_dir, scratch_dir, split): (source) ¶

overrides fiftyone.zoo.datasets.ZooDataset._download_and_prepare

Internal implementation of downloading the dataset and preparing it for use in the given directory.

Parameters
dataset_dir	the directory in which to construct the dataset. If a `split` is provided, this is the directory for the split
scratch_dir	a scratch directory to use to download and prepare any required intermediate files
split	the split to download, or None if the dataset does not have splits
Returns
tuple of	dataset_type: the `fiftyone.types.Dataset` type of the dataset num_samples: the number of samples in the split. For datasets that support partial downloads, this can be `None`, which indicates that all content was already downloaded classes: an optional list of class label strings

def _get_raw_dir(self, dataset_dir): (source) ¶

Undocumented