fiftyone.utils.csv.CSVDatasetImporter

class documentation

class CSVDatasetImporter(foud.GenericSampleDatasetImporter, foud.ImportPathsMixin): (source)

Constructor: CSVDatasetImporter(dataset_dir, data_path, labels_path, media_field, ...)

A flexible CSV importer that represents slice(s) of field values of a dataset as columns of a CSV file.

See :ref:`this page <CSVDataset-import>` for format details.

Parameters
dataset_dir	the dataset directory. If omitted, `data_path` and/or `labels_path` must be provided
data_path	an optional parameter that enables explicit control over the location of the media. Can be any of the following: a folder name like `"data"` or `"data/"` specifying a subfolder of `dataset_dir` where the media files reside an absolute directory path where the media files reside. In this case, the `dataset_dir` has no effect on the location of the data a filename like `"data.json"` specifying the filename of the JSON data manifest file in `dataset_dir` an absolute filepath specifying the location of the JSON data manifest. In this case, `dataset_dir` has no effect on the location of the data a dict mapping filenames to absolute filepaths If None, this parameter will default to whichever of `data/` or `data.json` exists in the dataset directory
labels_path	an optional parameter that enables explicit control over the location of the labels. Can be any of the following: a filename like `"labels.csv"` specifying the location of the labels in `dataset_dir` an absolute filepath to the labels. In this case, `dataset_dir` has no effect on the location of the labels If None, the parameter will default to `labels.csv`
media_field	the name of the column containing the media path for each sample. The media paths in this column may be: filenames or relative paths to media files in `data_path` absolute media paths, in which case `data_path` has no effect
fields	an optional parameter that specifies the columns to read and parse from the CSV file. Can be any of the following: an iterable of column names to parse as strings a dict mapping column names to functions that parse the column values into the appropriate type. Any keys with `None` values in this case are directly loaded as strings If not provided, all columns are parsed as strings
skip_missing_media	whether to skip (True) or raise an error (False) when rows with no `media_field` are encountered
include_all_data	whether to generate samples for all media in the data directory (True) rather than only creating samples for media with CSV rows (False)
shuffle	whether to randomly shuffle the order in which the samples are imported
seed	a random seed to use when shuffling
max_samples	a maximum number of samples to import. By default, all samples are imported

Method	`__init__`	Undocumented
Method	`__iter__`	Undocumented
Method	`__len__`	The total number of samples that will be imported.
Method	`__next__`	Returns information about the next sample in the dataset.
Method	`setup`	Performs any necessary setup before importing the first sample in the dataset.
Instance Variable	`data_path`	Undocumented
Instance Variable	`fields`	Undocumented
Instance Variable	`include_all_data`	Undocumented
Instance Variable	`labels_path`	Undocumented
Instance Variable	`media_field`	Undocumented
Instance Variable	`skip_missing_media`	Undocumented
Property	`has_dataset_info`	Whether this importer produces a dataset info dictionary.
Property	`has_sample_field_schema`	Whether this importer produces a sample field schema.
Instance Variable	`_fields`	Undocumented
Instance Variable	`_filepaths`	Undocumented
Instance Variable	`_iter_filepaths`	Undocumented
Instance Variable	`_media_paths_map`	Undocumented
Instance Variable	`_num_samples`	Undocumented
Instance Variable	`_rows_map`	Undocumented

Inherited from GenericSampleDatasetImporter:

Method get_sample_field_schema Returns a dictionary describing the field schema of the samples loaded by this importer.

Inherited from DatasetImporter (via GenericSampleDatasetImporter):

Method	`__enter__`	Undocumented
Method	`__exit__`	Undocumented
Method	`close`	Performs any necessary actions after the last sample has been imported.
Method	`get_dataset_info`	Returns the dataset info for the dataset.
Method	`_preprocess_list`	Internal utility that preprocesses the given list---which is presumed to be a list defining the samples that should be imported---by applying the values of the `shuffle`, `seed`, and `max_samples` parameters of the importer.

Inherited from ImportPathsMixin (via GenericSampleDatasetImporter, DatasetImporter):

Static Method	`_load_data_map`	Helper function that parses either a data directory or a data manifest file into a UUID -> filepath map.
Static Method	`_parse_data_path`	Helper function that computes default values for the `data_path` parameter supported by many importers.
Static Method	`_parse_labels_path`	Helper function that computes default values for the `labels_path` parameter supported by many importers.

def __init__(self, dataset_dir=None, data_path=None, labels_path=None, media_field='filepath', fields=None, skip_missing_media=False, include_all_data=False, shuffle=False, seed=None, max_samples=None): (source) ¶

overrides fiftyone.utils.data.importers.DatasetImporter.__init__

Undocumented

def __iter__(self): (source) ¶

overrides fiftyone.utils.data.importers.DatasetImporter.__iter__

Undocumented

def __len__(self): (source) ¶

overrides fiftyone.utils.data.importers.DatasetImporter.__len__

The total number of samples that will be imported.