fiftyone.utils.torch.TorchImageModelConfig

class documentation

class TorchImageModelConfig(foc.Config): (source)

Known subclasses: fiftyone.utils.clip.zoo.TorchCLIPModelConfig, fiftyone.utils.open_clip.TorchOpenClipModelConfig, fiftyone.utils.sam.SegmentAnythingModelConfig, fiftyone.utils.sam2.SegmentAnything2ImageModelConfig, fiftyone.utils.sam2.SegmentAnything2VideoModelConfig, fiftyone.utils.super_gradients.TorchYoloNasModelConfig, fiftyone.zoo.models.torch.TorchvisionImageModelConfig

Constructor: TorchImageModelConfig(d)

View In Hierarchy

Configuration for running a TorchImageModel.

Models are represented by this class via the following three components:

Model:

# Directly specify a model
model

# Load model from an entrypoint
model = entrypoint_fcn(**entrypoint_args)

Transforms:

# Directly provide transforms
transforms

# Load transforms from a function
transforms = transforms_fcn(**transforms_args)

# Use the `image_XXX` parameters defined below to build a transform
transforms = build_transforms(image_XXX, ...)

OutputProcessor:

# Directly provide an OutputProcessor
output_processor

# Load an OutputProcessor from a function
output_processor = output_processor_cls(**output_processor_args)

Given these components, inference happens as follows:

def predict_all(imgs):
    imgs = [transforms(img) for img in imgs]
    if not raw_inputs:
        imgs = torch.stack(imgs)

    output = model(imgs)
    return output_processor(output, ...)

Parameters
model	a `torch:torch.nn.Module` instance to use
entrypoint_fcn	a function or string like `"torchvision.models.inception_v3"` specifying the entrypoint function that loads the model
entrypoint_args	a dictionary of arguments for `entrypoint_fcn`
transforms	a preprocessing transform to apply
transforms_fcn	a function or string like `"torchvision.models.Inception_V3_Weights.DEFAULT.transforms"` specifying a function that returns a preprocessing transform function to apply
transforms_args	a dictionary of arguments for `transforms_args`
ragged_batches	whether the provided `transforms` or `transforms_fcn` may return tensors of different sizes. This must be set to `False` to enable batch inference, if it is desired
raw_inputs	whether to feed the raw list of images to the model rather than stacking them as a Torch tensor
output_processor	an `OutputProcessor` instance to use
output_processor_cls	a class or string like `"fifytone.utils.torch.ClassifierOutputProcessor"` specifying the `OutputProcessor` to use
output_processor_args	a dictionary of arguments for `output_processor_cls(classes=classes, **kwargs)`
confidence_thresh	an optional confidence threshold apply to any applicable predictions generated by the model
classes	a list of class names for the model, if applicable
labels_string	a comma-separated list of the class names for the model, if applicable
labels_path	the path to the labels map for the model, if applicable
mask_targets	a mask targets dict for the model, if applicable
mask_targets_path	the path to a mask targets map for the model, if applicable
skeleton	a keypoint skeleton dict for the model, if applicable
image_min_size	resize the input images during preprocessing, if necessary, so that the image dimensions are at least this `(width, height)`
image_min_dim	resize input images during preprocessing, if necessary, so that the smaller image dimension is at least this value
image_max_size	resize the input images during preprocessing, if necessary, so that the image dimensions are at most this `(width, height)`
image_max_dim	resize input images during preprocessing, if necessary, so that the largest image dimension is at most this value.
image_size	a `(width, height)` to which to resize the input images during preprocessing
image_dim	resize the smaller input dimension to this value during preprocessing
image_patch_size	crop the input images during preprocessing, if necessary, so that the image dimensions are a multiple of this patch size
image_mean	a 3-array of mean values in `[0, 1]` for preprocessing the input images
image_std	a 3-array of std values in `[0, 1]` for preprocessing the input images inputs that are lists of Tensors
embeddings_layer	the name of a layer whose output to expose as embeddings. Prepend `"<"` to save the input tensor instead
as_feature_extractor	whether to operate the model as a feature extractor. If `embeddings_layer` is provided, this layer is passed to torchvision's `create_feature_extractor()` function. If no `embeddings_layer` is provided, the model's output is used as-is for feature extraction
use_half_precision	whether to use half precision (only supported when using GPU)
cudnn_benchmark	a value to use for `torch:torch.backends.cudnn.benchmark` while the model is running
device	a string specifying the device to use, eg `("cuda:0", "mps", "cpu")`. By default, CUDA is used if available, else CPU is used

Method	`__init__`	Undocumented
Instance Variable	`as_feature_extractor`	Undocumented
Instance Variable	`classes`	Undocumented
Instance Variable	`confidence_thresh`	Undocumented
Instance Variable	`cudnn_benchmark`	Undocumented
Instance Variable	`device`	Undocumented
Instance Variable	`embeddings_layer`	Undocumented
Instance Variable	`entrypoint_args`	Undocumented
Instance Variable	`entrypoint_fcn`	Undocumented
Instance Variable	`image_dim`	Undocumented
Instance Variable	`image_max_dim`	Undocumented
Instance Variable	`image_max_size`	Undocumented
Instance Variable	`image_mean`	Undocumented
Instance Variable	`image_min_dim`	Undocumented
Instance Variable	`image_min_size`	Undocumented
Instance Variable	`image_patch_size`	Undocumented
Instance Variable	`image_size`	Undocumented
Instance Variable	`image_std`	Undocumented
Instance Variable	`labels_path`	Undocumented
Instance Variable	`labels_string`	Undocumented
Instance Variable	`mask_targets`	Undocumented
Instance Variable	`mask_targets_path`	Undocumented
Instance Variable	`model`	Undocumented
Instance Variable	`output_processor`	Undocumented
Instance Variable	`output_processor_args`	Undocumented
Instance Variable	`output_processor_cls`	Undocumented
Instance Variable	`ragged_batches`	Undocumented
Instance Variable	`raw_inputs`	Undocumented
Instance Variable	`skeleton`	Undocumented
Instance Variable	`transforms`	Undocumented
Instance Variable	`transforms_args`	Undocumented
Instance Variable	`transforms_fcn`	Undocumented
Instance Variable	`use_half_precision`	Undocumented

Inherited from Config:

Method __repr__ Undocumented

def __init__(self, d): (source) ¶

overridden in fiftyone.utils.clip.zoo.TorchCLIPModelConfig, fiftyone.utils.open_clip.TorchOpenClipModelConfig, fiftyone.utils.sam.SegmentAnythingModelConfig, fiftyone.utils.sam2.SegmentAnything2ImageModelConfig, fiftyone.utils.sam2.SegmentAnything2VideoModelConfig, fiftyone.utils.super_gradients.TorchYoloNasModelConfig, fiftyone.zoo.models.torch.TorchvisionImageModelConfig

Undocumented

as_feature_extractor: False = (source) ¶

Undocumented

classes: None = (source) ¶

overridden in fiftyone.utils.clip.zoo.TorchCLIPModelConfig, fiftyone.utils.open_clip.TorchOpenClipModelConfig

Undocumented

confidence_thresh: None = (source) ¶

Undocumented

cudnn_benchmark: None = (source) ¶

Undocumented

device: None = (source) ¶

Undocumented