class documentation

A flexible CSV importer that represents slice(s) of field values of a dataset as columns of a CSV file.

See :ref:`this page <CSVDataset-import>` for format details.

Parameters
dataset_dirthe dataset directory. If omitted, data_path and/or labels_path must be provided
data_path

an optional parameter that enables explicit control over the location of the media. Can be any of the following:

  • a folder name like "data" or "data/" specifying a subfolder of dataset_dir where the media files reside
  • an absolute directory path where the media files reside. In this case, the dataset_dir has no effect on the location of the data
  • a filename like "data.json" specifying the filename of the JSON data manifest file in dataset_dir
  • an absolute filepath specifying the location of the JSON data manifest. In this case, dataset_dir has no effect on the location of the data
  • a dict mapping filenames to absolute filepaths

If None, this parameter will default to whichever of data/ or data.json exists in the dataset directory

labels_path

an optional parameter that enables explicit control over the location of the labels. Can be any of the following:

  • a filename like "labels.csv" specifying the location of the labels in dataset_dir
  • an absolute filepath to the labels. In this case, dataset_dir has no effect on the location of the labels

If None, the parameter will default to labels.csv

media_field

the name of the column containing the media path for each sample. The media paths in this column may be:

  • filenames or relative paths to media files in data_path
  • absolute media paths, in which case data_path has no effect
fields

an optional parameter that specifies the columns to read and parse from the CSV file. Can be any of the following:

  • an iterable of column names to parse as strings
  • a dict mapping column names to functions that parse the column values into the appropriate type. Any keys with None values in this case are directly loaded as strings

If not provided, all columns are parsed as strings

skip_missing_mediawhether to skip (True) or raise an error (False) when rows with no media_field are encountered
include_all_datawhether to generate samples for all media in the data directory (True) rather than only creating samples for media with CSV rows (False)
shufflewhether to randomly shuffle the order in which the samples are imported
seeda random seed to use when shuffling
max_samplesa maximum number of samples to import. By default, all samples are imported
Method __init__ Undocumented
Method __iter__ Undocumented
Method __len__ The total number of samples that will be imported.
Method __next__ Returns information about the next sample in the dataset.
Method setup Performs any necessary setup before importing the first sample in the dataset.
Instance Variable data_path Undocumented
Instance Variable fields Undocumented
Instance Variable include_all_data Undocumented
Instance Variable labels_path Undocumented
Instance Variable media_field Undocumented
Instance Variable skip_missing_media Undocumented
Property has_dataset_info Whether this importer produces a dataset info dictionary.
Property has_sample_field_schema Whether this importer produces a sample field schema.
Instance Variable _fields Undocumented
Instance Variable _filepaths Undocumented
Instance Variable _iter_filepaths Undocumented
Instance Variable _media_paths_map Undocumented
Instance Variable _num_samples Undocumented
Instance Variable _rows_map Undocumented

Inherited from GenericSampleDatasetImporter:

Method get_sample_field_schema Returns a dictionary describing the field schema of the samples loaded by this importer.

Inherited from DatasetImporter (via GenericSampleDatasetImporter):

Method __enter__ Undocumented
Method __exit__ Undocumented
Method close Performs any necessary actions after the last sample has been imported.
Method get_dataset_info Returns the dataset info for the dataset.
Method _preprocess_list Internal utility that preprocesses the given list---which is presumed to be a list defining the samples that should be imported---by applying the values of the shuffle, seed, and max_samples parameters of the importer.

Inherited from ImportPathsMixin (via GenericSampleDatasetImporter, DatasetImporter):

Static Method _load_data_map Helper function that parses either a data directory or a data manifest file into a UUID -> filepath map.
Static Method _parse_data_path Helper function that computes default values for the data_path parameter supported by many importers.
Static Method _parse_labels_path Helper function that computes default values for the labels_path parameter supported by many importers.
def __init__(self, dataset_dir=None, data_path=None, labels_path=None, media_field='filepath', fields=None, skip_missing_media=False, include_all_data=False, shuffle=False, seed=None, max_samples=None): (source)
def __len__(self): (source)

The total number of samples that will be imported.

Raises
TypeErrorif the total number is not known
def __next__(self): (source)

Returns information about the next sample in the dataset.

Returns
a fiftyone.core.sample.Sample instance
Raises
StopIterationif there are no more samples to import
def setup(self): (source)

Performs any necessary setup before importing the first sample in the dataset.

This method is called when the importer's context manager interface is entered, DatasetImporter.__enter__.

data_path: None = (source)

Undocumented

Undocumented

include_all_data: False = (source)

Undocumented

labels_path: None = (source)

Undocumented

media_field: "filepath" = (source)

Undocumented

skip_missing_media: False = (source)

Undocumented

@property
has_dataset_info = (source)

Whether this importer produces a dataset info dictionary.

@property
has_sample_field_schema = (source)

Whether this importer produces a sample field schema.

Undocumented

_filepaths = (source)

Undocumented

_iter_filepaths = (source)

Undocumented

_media_paths_map = (source)

Undocumented

_num_samples = (source)

Undocumented

_rows_map = (source)

Undocumented