class documentation

Sorts a collection by similarity to a specified query.

In order to use this stage, you must first use fiftyone.brain.compute_similarity to index your dataset by similarity.

Examples:

import fiftyone as fo
import fiftyone.brain as fob
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart")

fob.compute_similarity(
    dataset, model="clip-vit-base32-torch", brain_key="clip"
)

#
# Sort samples by their similarity to a sample by its ID
#

query_id = dataset.first().id

stage = fo.SortBySimilarity(query_id, k=5)
view = dataset.add_stage(stage)

#
# Sort samples by their similarity to a manually computed vector
#

model = foz.load_zoo_model("clip-vit-base32-torch")
embeddings = dataset.take(2, seed=51).compute_embeddings(model)
query = embeddings.mean(axis=0)

stage = fo.SortBySimilarity(query, k=5)
view = dataset.add_stage(stage)

#
# Sort samples by their similarity to a text prompt
#

query = "kites high in the air"

stage = fo.SortBySimilarity(query, k=5)
view = dataset.add_stage(stage)
Parameters
query

the query, which can be any of the following:

  • an ID or iterable of IDs
  • a num_dims vector or num_queries x num_dims array of vectors
  • a prompt or iterable of prompts (if supported by the index)
kthe number of matches to return. By default, the entire collection is sorted
reversewhether to sort by least similarity (True) or greatest similarity (False). Some backends may not support least similarity
dist_fieldthe name of a float field in which to store the distance of each example to the specified query. The field is created if necessary
brain_keythe brain key of an existing fiftyone.brain.compute_similarity run on the dataset. If not specified, the dataset must have an applicable run, which will be used by default
Method __init__ Undocumented
Method to_mongo Returns the MongoDB aggregation pipeline for the stage.
Method validate Validates that the stage can be applied to the given collection.
Property brain_key The brain key of the similarity index to use.
Property dist_field The field to store similarity distances, if any.
Property k The number of matches to return.
Property query The query.
Property reverse Whether to sort by least similarity.
Class Method _params Returns a list of JSON dicts describing the stage's supported parameters.
Method _kwargs Returns a list of [name, value] lists describing the parameters of this stage instance.
Method _make_pipeline Undocumented
Instance Variable _brain_key Undocumented
Instance Variable _dist_field Undocumented
Instance Variable _is_prompt Undocumented
Instance Variable _k Undocumented
Instance Variable _pipeline Undocumented
Instance Variable _query Undocumented
Instance Variable _query_kwarg Undocumented
Instance Variable _reverse Undocumented
Instance Variable _state Undocumented

Inherited from ViewStage:

Method __eq__ Undocumented
Method __repr__ Undocumented
Method __str__ Undocumented
Method get_edited_fields Returns a list of names of fields or embedded fields that may have been edited by the stage, if any.
Method get_excluded_fields Returns a list of fields that have been excluded by the stage, if any.
Method get_filtered_fields Returns a list of names of fields or embedded fields that contain arrays have been filtered by the stage, if any.
Method get_group_expr Returns the dynamic group expression for the given stage, if any.
Method get_media_type Returns the media type outputted by this stage when applied to the given collection, if and only if it is different from the input type.
Method get_selected_fields Returns a list of fields that have been selected by the stage, if any.
Method load_view Loads the fiftyone.core.view.DatasetView containing the output of the stage.
Property has_view Whether this stage's output view should be loaded via load_view rather than appending stages to an aggregation pipeline via to_mongo.
Property outputs_dynamic_groups Whether this stage outputs or flattens dynamic groups.
Class Method _from_dict Creates a ViewStage instance from a serialized JSON dict representation of it.
Method _needs_frames Whether the stage requires frame labels of video samples to be attached.
Method _needs_group_slices Whether the stage requires group slice(s) to be attached.
Method _serialize Returns a JSON dict representation of the ViewStage.
Instance Variable _uuid Undocumented
def __init__(self, query, k=None, reverse=False, dist_field=None, brain_key=None, _state=None): (source)

Undocumented

def to_mongo(self, _): (source)

Returns the MongoDB aggregation pipeline for the stage.

Only usable if has_view is False.

Parameters
_Undocumented
sample_collectionthe fiftyone.core.collections.SampleCollection to which the stage is being applied
Returns
a MongoDB aggregation pipeline (list of dicts)
def validate(self, sample_collection): (source)

Validates that the stage can be applied to the given collection.

Parameters
sample_collectiona fiftyone.core.collections.SampleCollection
Raises
ViewStageErrorif the stage cannot be applied to the collection

The brain key of the similarity index to use.

The field to store similarity distances, if any.

The number of matches to return.

The query.

Whether to sort by least similarity.

@classmethod
def _params(cls): (source)

Returns a list of JSON dicts describing the stage's supported parameters.

Returns
a list of JSON dicts
def _kwargs(self): (source)

Returns a list of [name, value] lists describing the parameters of this stage instance.

Returns
a list of [name, value] lists
def _make_pipeline(self, sample_collection): (source)

Undocumented

_brain_key = (source)

Undocumented

_dist_field = (source)

Undocumented

_is_prompt = (source)

Undocumented

Undocumented

_pipeline = (source)

Undocumented

Undocumented

_query_kwarg = (source)

Undocumented

_reverse = (source)

Undocumented

Undocumented