fiftyone.utils.torch.FiftyOneTorchDataset

class documentation

class FiftyOneTorchDataset(Dataset): (source)

Constructor: FiftyOneTorchDataset(samples, get_item, cache_field_names, local_process_group, skip_failures)

View In Hierarchy

A class that accepts a FO dataset and creates a corresponding torch.utils.data.Dataset

Notes

General: - This only works with persistent datasets.

In order to make a dataset persistent, do dataset.persistent = True

Process start methods

It is recommended to use the 'spawn' and 'forkserver' start methods over 'fork' - Spawn and forkserver are safer when dealing with code that may be threaded (which a lot of code is, for example NumPy). - MongoDB, which backs FiftyOne's database, is not fork safe. In theory nothing here should be breaking, but you will see a lot of warnings. - When using persistent_workers=True, the overhead of 'spawn' and 'forkserver' is low. - When using Jupyter notebooks, if you want all of your code to be in the notebook, rather than calling it from a python file, you'll have

to use fork. Do so at your own risk. You can easily set the start method for all of your torch code with the following command: torch.multiprocessing.set_start_method('forkserver')
Make sure to not touch self.samples or subscript this object until after

all workers are initialized. This will help you avoid unnecessary memory usage. If you're using DDP, this will help your code not crash.

DDP: - A helper function ::method:`distributed_init` is provided to be called by each

trainer process in the beginning of DDP training. This function: - Safely creates a database connection for each trainer - Shares an authkey between all local processes to ensure they can communicate

torch.utils.data.Dataloader use: - DO NOT use torch.Tensor.to in this function, do so in the train loop, or use

the pin_memory argument in your torch.utils.data.DataLoader

If using a dataloader with many workers, remember to pass

:method:`fo.utils.torch.FiftyOneTorchDataset.worker_init` to the argument worker_init_fn. This class will not work otherwise.
Using persistent_workers=True is a good idea.

On reading and writing to the FO object during training: - Reading

Try to have as many of the reads as possible during get_item or

when caching fields rather than in the main training script loop. Reads into the FO object during training may slow your code down significantly.

Writing
- Writing currently happens on the process from which it is called. This
shows moderate slowdown, and will be adressed.

Parameters
samples	a `fo.core.collections.SampleCollection`
get_item	a Callable[`fo.core.sample.SampleView, Any]` Must be a serializable function.
cache_field_names	a list of strings. Fields to cache in memory. If this argument is passed, get_item should be from a dict with keys and values corresponding to the sample's fields and values to the model input. This argument is highly recommended, as it offers a significant performance boost. Please note : the field values must be pickle serializable i.e. `pickle.dumps(field_value)` should not raise an error. `pickle.loads(pickle.dumps(field_value))` should have all of the functionality of the original field value that you would need in your get_item function.
local_process_group	The process group with each of the processes running the main train script on the machine this object is on.
skip_failures	whether to skip failures when loading samples. If True, the dataset will return the exception that occurred in place of the resulting get_item value. If False, the code will fail normally.

Static Method	`distributed_init`	Function to be called by each trainer process in DDP training. Facilitates communication between processes. Safely creates database connection for each trainer.
Static Method	`worker_init`	Undocumented
Method	`__getitem__`	Undocumented
Method	`__getitems__`	Undocumented
Method	`__init__`	Undocumented
Method	`__len__`	Undocumented
Instance Variable	`cache_field_names`	Undocumented
Instance Variable	`cached_fields`	Undocumented
Instance Variable	`get_item`	Undocumented
Instance Variable	`ids`	Undocumented
Instance Variable	`name`	Undocumented
Instance Variable	`skip_failures`	Undocumented
Instance Variable	`stages`	Undocumented
Property	`samples`	Undocumented
Method	`_get_item`	Undocumented
Method	`_load_field`	Undocumented
Method	`_load_samples`	Undocumented
Instance Variable	`_dataset`	Undocumented
Instance Variable	`_samples`	Undocumented

@staticmethod
def distributed_init(dataset_name, local_process_group, view_name=None): (source) ¶

Function to be called by each trainer process in DDP training. Facilitates communication between processes. Safely creates database connection for each trainer.

This function should be called at the beginning of the training script.

Parameters
dataset_name	the name of the dataset to load
local_process_group	the process group with all the processes running the main training script
view_name:`None`	the name of the view to load. If None, the whole dataset is loaded.
Returns
The loaded `fiftyone.core.dataset.Dataset` or `fiftyone.core.view.DatasetView`

@staticmethod
def worker_init(worker_id): (source) ¶

Undocumented

def __getitem__(self, index): (source) ¶

Undocumented

def __getitems__(self, indices): (source) ¶

Undocumented

def __init__(self, samples: focol.SampleCollection, get_item: Callable[fos.SampleView, Any], cache_field_names: list[str] = None, local_process_group=None, skip_failures=False): (source) ¶

Undocumented

def __len__(self): (source) ¶

Undocumented

cache_field_names: None = (source) ¶

Undocumented

cached_fields: dict = (source) ¶

Undocumented

get_item = (source) ¶

Undocumented

ids = (source) ¶

Undocumented

name = (source) ¶

Undocumented

skip_failures: False = (source) ¶

Undocumented

stages = (source) ¶

Undocumented

@property
samples = (source) ¶

Undocumented

def _get_item(self, sample): (source) ¶

Undocumented

def _load_field(self, samples, field_name, local_process_group): (source) ¶

Undocumented

def _load_samples(self): (source) ¶

Undocumented

_dataset = (source) ¶

Undocumented

_samples = (source) ¶

Undocumented