Using Sample Parsers ¶¶
This page describes how to use the SampleParser
interface to add samples to
your FiftyOne datasets.
The SampleParser
interface provides native support for loading samples in a
variety of common formats, and it can be easily
extended to import datasets in custom formats,
allowing you to automate the dataset loading process.
Warning
Using the SampleParser
interface is not recommended. You’ll likely prefer
adding samples manually or
using dataset importers to load data
into FiftyOne.
Adding samples to datasets ¶¶
Basic recipe ¶¶
The basic recipe for using the SampleParser
interface to add samples to a
Dataset
is to create a parser of the appropriate type and then pass the
parser along with an iterable of samples to the appropriate Dataset
method.
Note
A typical use case is that samples
in the above recipe is a
torch.utils.data.Dataset
or an iterable generated by
tf.data.Dataset.as_numpy_iterator()
.
Adding unlabeled images ¶¶
FiftyOne provides a few convenient ways to add unlabeled images in FiftyOne datasets.
Adding a directory of images ¶¶
Use Dataset.add_images_dir()
to add a directory of images to a dataset:
import fiftyone as fo
dataset = fo.Dataset()
# A directory of images to add
images_dir = "/path/to/images"
# Add images to the dataset
dataset.add_images_dir(images_dir)
Adding a glob pattern of images ¶¶
Use Dataset.add_images_patt()
to add a glob pattern of images to a dataset:
import fiftyone as fo
dataset = fo.Dataset()
# A glob pattern of images to add
images_patt = "/path/to/images/*.jpg"
# Add images to the dataset
dataset.add_images_patt(images_patt)
Adding images using a SampleParser ¶¶
Use Dataset.add_images()
to add an iterable of unlabeled images that can be parsed via a specified
UnlabeledImageSampleParser
to a dataset.
Example
FiftyOne provides an
ImageSampleParser
that handles samples that contain either an image that can be converted to
numpy format via np.asarray()
of the path to an
image on disk.
import fiftyone as fo
import fiftyone.utils.data as foud
dataset = fo.Dataset()
# An iterable of images or image paths and the UnlabeledImageSampleParser
# to use to parse them
samples = ...
sample_parser = foud.ImageSampleParser
# Add images to the dataset
dataset.add_images(samples, sample_parser)
Adding labeled images ¶¶
Use Dataset.add_labeled_images()
to add an iterable of samples that can be parsed via a specified
LabeledImageSampleParser
to a dataset.
Example
FiftyOne provides an
ImageClassificationSampleParser
that handles samples that contain (image_or_path, target)
tuples, where:
-
image_or_path
is either an image that can be converted to numpy format vianp.asarray()
or the path to an image on disk -
target
is either a class ID or a label string
The snippet below adds an iterable of image classification data in the above format to a dataset:
import fiftyone as fo
import fiftyone.utils.data as foud
dataset = fo.Dataset()
# An iterable of `(image_or_path, target)` tuples and the
# LabeledImageSampleParser to use to parse them
samples = ...
sample_parser = foud.ImageClassificationSampleParser
# Add labeled images to the dataset
dataset.add_labeled_images(samples, sample_parser)
Adding unlabeled videos ¶¶
FiftyOne provides a few convenient ways to add unlabeled videos in FiftyOne datasets.
Adding a directory of videos ¶¶
Use Dataset.add_videos_dir()
to add a directory of videos to a dataset:
import fiftyone as fo
dataset = fo.Dataset()
# A directory of videos to add
videos_dir = "/path/to/videos"
# Add videos to the dataset
dataset.add_videos_dir(videos_dir)
Adding a glob pattern of videos ¶¶
Use Dataset.add_videos_patt()
to add a glob pattern of videos to a dataset:
import fiftyone as fo
dataset = fo.Dataset()
# A glob pattern of videos to add
videos_patt = "/path/to/videos/*.mp4"
# Add videos to the dataset
dataset.add_videos_patt(videos_patt)
Adding videos using a SampleParser ¶¶
Use Dataset.add_videos()
to add an iterable of unlabeled videos that can be parsed via a specified
UnlabeledVideoSampleParser
to a dataset.
Example
FiftyOne provides a
VideoSampleParser
that handles samples that directly contain the path to the video on disk.
import fiftyone as fo
import fiftyone.utils.data as foud
dataset = fo.Dataset()
# An iterable of video paths and the UnlabeledVideoSampleParser to use to
# parse them
samples = ...
sample_parser = foud.VideoSampleParser
# Add videos to the dataset
dataset.add_videos(samples, sample_parser)
Adding labeled videos ¶¶
Use Dataset.add_labeled_videos()
to add an iterable of samples that can be parsed via a specified
LabeledVideoSampleParser
to a dataset.
Example
FiftyOne provides a
VideoLabelsSampleParser
that handles samples that contain (video_path, video_labels_or_path)
tuples, where:
-
video_path
is the path to a video on disk -
video_labels_or_path
is aneta.core.video.VideoLabels
instance, a serialized dict representation of one, or the path to one on disk
The snippet below adds an iterable of labeled video samples in the above format to a dataset:
import fiftyone as fo
import fiftyone.utils.data as foud
dataset = fo.Dataset()
# An iterable of `(video_path, video_labels_or_path)` tuples and the
# LabeledVideoSampleParser to use to parse them
samples = ...
sample_parser = foud.VideoLabelsSampleParser
# Add labeled videos to the dataset
dataset.add_labeled_videos(samples, sample_parser)
Ingesting samples into datasets ¶¶
Creating FiftyOne datasets typically does not create copies of the source media,
since Sample
instances store the filepath
to the media, not the media itself.
However, in certain circumstances, such as loading data from binary sources like TFRecords or creating a FiftyOne dataset from unorganized and/or temporary files on disk, it can be desirable to ingest the raw media for each sample into a common backing location.
FiftyOne provides support for ingesting samples and their underlying source media in both common formats and can be extended to import datasets in custom formats.
Basic recipe ¶¶
The basic recipe for ingesting samples and their source media into a Dataset
is to create a SampleParser
of the appropriate type of sample that you’re
loading and then pass the parser along with an iterable of samples to the
appropriate Dataset
method.
Note
A typical use case is that samples
in the above recipe is a
torch.utils.data.Dataset
or an iterable generated by
tf.data.Dataset.as_numpy_iterator()
.
Ingesting unlabeled images ¶¶
Use Dataset.ingest_images()
to ingest an iterable of unlabeled images that can be parsed via a specified
UnlabeledImageSampleParser
into a dataset.
The has_image_path
property of the parser may either be True
or False
. If the parser provides
image paths, the source images will be directly copied from their source
locations into the backing directory for the dataset; otherwise, the image will
be read in-memory via
get_image()
and then written to the backing directory.
Example
FiftyOne provides an
ImageSampleParser
that handles samples that contain either an image that can be converted to
numpy format via np.asarray()
of the path to an
image on disk.
import fiftyone as fo
import fiftyone.utils.data as foud
dataset = fo.Dataset()
# An iterable of images or image paths and the UnlabeledImageSampleParser
# to use to parse them
samples = ...
sample_parser = foud.ImageSampleParser
# A directory in which the images will be written; If `None`, a default directory
# based on the dataset's `name` will be used
dataset_dir = ...
# Ingest the images into the dataset
# The source images are copied into `dataset_dir`
dataset.ingest_images(samples, sample_parser, dataset_dir=dataset_dir)
Ingesting labeled images ¶¶
Use Dataset.ingest_labeled_images()
to ingest an iterable of samples that can be parsed via a specified
LabeledImageSampleParser
into a dataset.
The has_image_path
property of the parser may either be True
or False
. If the parser provides
image paths, the source images will be directly copied from their source
locations into the backing directory for the dataset; otherwise, the image will
be read in-memory via
get_image()
and then written to the backing directory.
Example
FiftyOne provides an
ImageClassificationSampleParser
that handles samples that contain (image_or_path, target)
tuples, where:
-
image_or_path
is either an image that can be converted to numpy format vianp.asarray()
or the path to an image on disk -
target
is either a class ID or a label string
The snippet below ingests an iterable of image classification data in the above format intoa a FiftyOne dataset:
import fiftyone as fo
import fiftyone.utils.data as foud
dataset = fo.Dataset()
# An iterable of `(image_or_path, target)` tuples and the
# LabeledImageSampleParser to use to parse them
samples = ...
sample_parser = foud.ImageClassificationSampleParser # for example
# A directory in which the images will be written; If `None`, a default directory
# based on the dataset's `name` will be used
dataset_dir = ...
# Ingest the labeled images into the dataset
# The source images are copied into `dataset_dir`
dataset.ingest_labeled_images(samples, sample_parser, dataset_dir=dataset_dir)
Ingesting unlabeled videos ¶¶
Use Dataset.ingest_videos()
to ingest an iterable of unlabeled videos that can be parsed via a specified
UnlabeledVideoSampleParser
into a dataset.
The source videos will be directly copied from their source locations into the backing directory for the dataset.
Example
FiftyOne provides a
VideoSampleParser
that handles samples that directly contain the paths to videos on disk.
import fiftyone as fo
import fiftyone.utils.data as foud
dataset = fo.Dataset()
# An iterable of videos or video paths and the UnlabeledVideoSampleParser
# to use to parse them
samples = ...
sample_parser = foud.VideoSampleParser
# A directory in which the videos will be written; If `None`, a default directory
# based on the dataset's `name` will be used
dataset_dir = ...
# Ingest the videos into the dataset
# The source videos are copied into `dataset_dir`
dataset.ingest_videos(samples, sample_parser, dataset_dir=dataset_dir)
Ingesting labeled videos ¶¶
Use Dataset.ingest_labeled_videos()
to ingest an iterable of samples that can be parsed via a specified
LabeledVideoSampleParser
into a dataset.
The source videos will be directly copied from their source locations into the backing directory for the dataset.
Example
FiftyOne provides a
VideoLabelsSampleParser
that handles samples that contain (video_path, video_labels_or_path)
tuples, where:
-
video_path
is the path to a video on disk -
video_labels_or_path
is aneta.core.video.VideoLabels
instance, a serialized dict representation of one, or the path to one on disk
The snippet below ingests an iterable of labeled videos in the above format into a FiftyOne dataset:
import fiftyone as fo
import fiftyone.utils.data as foud
dataset = fo.Dataset()
# An iterable of `(video_path, video_labels_or_path)` tuples and the
# LabeledVideoSampleParser to use to parse them
samples = ...
sample_parser = foud.VideoLabelsSampleParser # for example
# A directory in which the videos will be written; If `None`, a default directory
# based on the dataset's `name` will be used
dataset_dir = ...
# Ingest the labeled videos into the dataset
# The source videos are copied into `dataset_dir`
dataset.ingest_labeled_videos(samples, sample_parser, dataset_dir=dataset_dir)
Built-in SampleParser classes ¶¶
The table below lists the common data formats for which FiftyOne provides
built-in SampleParser
implementations. You can also write a
custom SampleParser to automate the parsing of
samples in your own custom data format.
You can use a SampleParser
to
add samples to datasets and
ingest samples into datasets.
SampleParser | Description |
---|---|
ImageSampleParser |
A sample parser that parses raw image samples. |
VideoSampleParser |
A sample parser that parses raw video samples. |
ImageClassificationSampleParser |
Generic parser for image classification samples whose labels are represented as Classification instances. |
ImageDetectionSampleParser |
Generic parser for image detection samples whose labels are represented as Detections instances. |
ImageLabelsSampleParser |
Generic parser for image detection samples whose labels are stored in ETA ImageLabels format. |
FiftyOneImageClassificationSampleParser |
Parser for samples in FiftyOne image classification datasets. SeeFiftyOneImageClassificationDataset for formatdetails. |
FiftyOneImageDetectionSampleParser |
Parser for samples in FiftyOne image detection datasets. SeeFiftyOneImageDetectionDataset for format details. |
FiftyOneImageLabelsSampleParser |
Parser for samples in FiftyOne image labels datasets. SeeFiftyOneImageLabelsDataset for format details. |
FiftyOneVideoLabelsSampleParser |
Parser for samples in FiftyOne video labels datasets. SeeFiftyOneVideoLabelsDataset for format details. |
TFImageClassificationSampleParser |
Parser for image classification samples stored as TFRecords. |
TFObjectDetectionSampleParser |
Parser for image detection samples stored in TF Object Detection API format. |
Writing a custom SampleParser ¶¶
FiftyOne provides a variety of
built-in SampleParser classes to parse
data in common formats. However, if your samples are stored in a custom format,
you can provide a custom SampleParser
class and provide it to FiftyOne when
adding or
ingesting samples into your datasets.
The SampleParser
interface provides a mechanism for defining methods that
parse a data sample that is stored in a particular (external to FiftyOne)
format and return various elements of the sample in a format that FiftyOne
understands.
SampleParser
itself is an abstract interface; the concrete interface that you
should implement is determined by the type of samples that you are importing.
For example, LabeledImageSampleParser
defines an interface for parsing
information from a labeled image sample, such as the path to the image on
disk, the image itself, metadata about the image, and the label (e.g.,
classification or object detections) associated with the image.