Skip to content

Filtering Cheat Sheet

This cheat sheet shows how to perform common matching and filtering operations in FiftyOne using dataset views.

Strings and pattern matching

The formulas in this section use the following example data:

import fiftyone.zoo as foz
from fiftyone import ViewField as F

ds = foz.load_zoo_dataset("quickstart")
Operation Command
Filepath starts with “/Users” <br>ds.match(F("filepath").starts_with("/Users"))<br>
Filepath ends with “10.jpg” or “10.png” <br>ds.match(F("filepath").ends_with(("10.jpg", "10.png"))<br>
Label contains string “be” <br>ds.filter_labels(<br> "predictions",<br> F("label").contains_str("be"),<br>)<br>
Filepath contains “088” and is JPEG <br>ds.match(F("filepath").re_match("088*.jpg"))<br>

Reference: match() and filter_labels().

Dates and times

The formulas in this section use the following example data:

from datetime import datetime, timedelta

import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F

filepaths = ["image%d.jpg" % i for i in range(5)]
dates = [\
    datetime(2021, 8, 24, 1, 0, 0),\
    datetime(2021, 8, 24, 2, 0, 0),\
    datetime(2021, 8, 25, 3, 11, 12),\
    datetime(2021, 9, 25, 4, 22, 23),\
    datetime(2022, 9, 27, 5, 0, 0)\
]

ds = fo.Dataset()
ds.add_samples(
    [fo.Sample(filepath=f, date=d) for f, d in zip(filepaths, dates)]
)

# Example data
query_date = datetime(2021, 8, 24, 2, 0, 1)
query_delta = timedelta(minutes=30)
Operation Command
After 2021-08-24 02:01:00 <br>ds.match(F("date") > query_date)<br>
Within 30 minutes of 2021-08-24 02:01:00 <br>ds.match(abs(F("date") - query_date) < query_delta)<br>
On the 24th of the month <br>ds.match(F("date").day_of_month() == 24)<br>
On even day of the week <br>ds.match(F("date").day_of_week() % 2 == 0)<br>
On the 268th day of the year <br>ds.match(F("date").day_of_year() == 268)<br>
In the 9th month of the year (September) <br>ds.match(F("date").month() == 9)<br>
In the 38th week of the year <br>ds.match(F("date").week() == 38)<br>
In the year 2022 <br>ds.match(F("date").year() == 2022)<br>
With minute not equal to 0 <br>ds.match(F("date").minute() != 0)<br>

Reference: match().

Geospatial

The formulas in this section use the following example data:

import fiftyone.zoo as foz

TIMES_SQUARE = [-73.9855, 40.7580]
MANHATTAN = [\
    [\
        [-73.949701, 40.834487],\
        [-73.896611, 40.815076],\
        [-73.998083, 40.696534],\
        [-74.031751, 40.715273],\
        [-73.949701, 40.834487],\
    ]\
]

ds = foz.load_zoo_dataset("quickstart-geo")
Operation Command
Within 5km of Times Square <br>ds.geo_near(TIMES_SQUARE, max_distance=5000)<br>
Within Manhattan <br>ds.geo_within(MANHATTAN)<br>

Reference: geo_near() and geo_within().

Detections

The formulas in this section use the following example data:

import fiftyone.zoo as foz
from fiftyone import ViewField as F

ds = foz.load_zoo_dataset("quickstart")
Operation Command
Predictions with confidence > 0.95 <br>ds.filter_labels("predictions", F("confidence") > 0.95)<br>
Exactly 10 ground truth detections <br>ds.match(F("ground_truth.detections").length() == 10)<br>
At least one dog <br>ds.match(<br> F("ground_truth.detections.label").contains("dog")<br>)<br>
Images that do not contain dogs <br>ds.match(<br> ~F("ground_truth.detections.label").contains("dog")<br>)<br>
Only dog detections <br>ds.filter_labels("ground_truth", F("label") == "dog")<br>
Images that only contain dogs <br>ds.match(<br> F("ground_truth.detections.label").is_subset(<br> ["dog"]<br> )<br>)<br>
Contains either a cat or a dog <br>ds.match(<br> F("predictions.detections.label").contains(<br> ["cat","dog"]<br> )<br>)<br>
Contains a cat and a dog prediction <br>ds.match(<br> F("predictions.detections.label").contains(<br> ["cat", "dog"], all=True<br> )<br>)<br>
Contains a cat or dog but not both <br>field = "predictions.detections.label"<br>one_expr = F(field).contains(["cat", "dog"])<br>both_expr = F(field).contains(["cat", "dog"], all=True)<br>ds.match(one_expr & ~both_expr)<br>

Reference: match() and filter_labels().

Bounding boxes

The formulas in this section assume the following code has been run:

import fiftyone.zoo as foz
from fiftyone import ViewField as F

ds = foz.load_zoo_dataset("quickstart")

box_width, box_height = F("bounding_box")[2], F("bounding_box")[3]
rel_bbox_area = box_width * box_height

im_width, im_height = F("$metadata.width"), F("$metadata.height")
abs_area = rel_bbox_area * im_width * im_height
Bounding box query Command
Larger than absolute size <br>ds.filter_labels("predictions", abs_area > 96**2)<br>
Between two relative sizes <br>good_bboxes = (rel_bbox_area > 0.25) & (rel_bbox_area < 0.75)<br>good_expr = rel_bbox_area.let_in(good_bboxes)<br>ds.filter_labels("predictions", good_expr)<br>
Approximately square <br>rectangleness = abs(<br> box_width * im_width - box_height * im_height<br>)<br>ds.select_fields("predictions").filter_labels(<br> "predictions", rectangleness <= 1<br>)<br>
Aspect ratio > 2 <br>aspect_ratio = (<br> (box_width * im_width) / (box_height * im_height)<br>)<br>ds.select_fields("predictions").filter_labels(<br> "predictions", aspect_ratio > 2<br>)<br>

Reference: filter_labels() and select_fields().

Evaluating detections

The formulas in this section assume the following code has been run on a dataset ds with detections in its predictions field:

import fiftyone.brain as fob
import fiftyone.zoo as foz
from fiftyone import ViewField as F

ds = foz.load_zoo_dataset("quickstart")

ds.evaluate_detections("predictions", eval_key="eval")

fob.compute_uniqueness(ds)
fob.compute_mistakenness(ds, "predictions", label_field="ground_truth")
ep = ds.to_evaluation_patches("eval")
Operation Command
Uniqueness > 0.9 <br>ds.match(F("uniqueness") > 0.9)<br>
10 most unique images <br>ds.sort_by("uniqueness", reverse=True)[:10]<br>
Predictions with confidence > 0.95 <br>ds.filter_labels("predictions", F("confidence") > 0.95)<br>
10 most “wrong” predictions <br>ds.sort_by("mistakenness", reverse=True)[:10]<br>
Images with more than 10 false positives <br>ds.match(F("eval_fp") > 10)<br>
False positive “dog” detections <br>ep.match_labels(<br> filter=(F("eval") == "fp") & (F("label") == "dog"),<br> fields="predictions",<br>)<br>
Predictions with IoU > 0.9 <br>ep.match(F("iou") > 0.9)<br>

Reference: match(), sort_by(), filter_labels(), and match_labels().

Classifications

Evaluating classifications

The formulas in the following table assumes the following code has been run on a dataset ds, where the predictions field is populated with classification predictions that have their logits attribute set:

import fiftyone.brain as fob
import fiftyone.zoo as foz

ds = foz.load_zoo_dataset("cifar10", split="test")

# TODO: add your own predicted classifications

ds.evaluate_classifications("predictions", gt_field="ground_truth")

fob.compute_uniqueness(ds)
fob.compute_hardness(ds, "predictions")
fob.compute_mistakenness(ds, "predictions", label_field="ground_truth")
Operation Command
10 most unique incorrect predictions <br>ds.match(<br> F("predictions.label") != F("ground_truth.label")<br>).sort_by("uniqueness", reverse=True)[:10]<br>
10 most “wrong” predictions <br>ds.sort_by("mistakenness", reverse=True)[:10]<br>
10 most likely annotation mistakes <br>ds.match_tags("train").sort_by(<br> "mistakenness", reverse=True<br>)[:10]<br>

Reference: match(), sort_by(), and match_tags().

Built-in filter and match functions

FiftyOne has special methods for matching and filtering on specific data types. Take a look at the examples in this section to see how various operations can be performed via these special purpose methods, and compare that to the brute force implementation of the same operation that follows.

The tables in this section use the following example data:

from bson import ObjectId

import fiftyone as fo
import fiftyone.zoo as foz
from fiftyone import ViewField as F

ds = foz.load_zoo_dataset("quickstart")

# Tag a few random samples
ds.take(3).tag_labels("potential_mistake", label_fields="predictions")

# Grab a few label IDs
label_ids = [\
    dataset.first().ground_truth.detections[0].id,\
    dataset.last().predictions.detections[0].id,\
]
ds.select_labels(ids=label_ids).tag_labels("error")

len_filter = F("label").strlen() < 3
id_filter = F("_id").is_in([ObjectId(_id) for _id in label_ids])

Filtering labels

Operation Get predicted detections that have confidence > 0.9
Idiomatic <br>ds.filter_labels("predictions", F("confidence") > 0.9)<br>
Brute force <br>ds.set_field(<br> "predictions.detections",<br> F("detections").filter(F("confidence") > 0.9)),<br>)<br>

Reference: filter_labels().

Matching labels

Operation Samples that have labels with id’s in the list label_ids
Idiomatic <br>ds.match_labels(ids=label_ids)<br>
Brute force <br>pred_expr = F("predictions.detections").filter(id_filter).length() > 0<br>gt_expr = F("ground_truth.detections").filter(id_filter).length() > 0<br>ds.match(pred_expr | gt_expr)<br>
Operation Samples that have labels satisfying len_filter in predictions or ground_truth field
Idiomatic <br>ds.match_labels(<br> filter=len_filter,<br> fields=["predictions", "ground_truth"],<br>)<br>
Brute force <br>pred_expr = F("predictions.detections").filter(len_filter).length() > 0<br>gt_expr = F("ground_truth.detections").filter(len_filter).length() > 0<br>ds.match(pred_expr | gt_expr)<br>
Operation Samples that have labels with tag “error” in predictions or ground_truth field
Idiomatic <br>ds.match_labels(tags="error")<br>
Brute force <br>tag_expr = F("tags").contains("error")<br>pred_expr = F("predictions.detections").filter(tag_expr).length() > 0<br>gt_expr = F("ground_truth.detections").filter(tag_expr).length() > 0<br>ds.match(pred_expr | gt_expr)<br>

Reference: match_labels().

Matching tags

Operation Samples that have tag validation
Idiomatic <br>ds.match_tags("validation")<br>
Brute force <br>ds.match(F("tags").contains("validation"))<br>

Reference: match_tags().

Matching frames

The following table uses this example data:

import fiftyone.zoo as foz
from fiftyone import ViewField as F

ds = foz.load_zoo_dataset("quickstart-video")
num_objects = F("detections.detections").length()
Operation Frames with at least 10 detections
Idiomatic <br>ds.match_frames(num_objects > 10)<br>
Brute force <br>ds.match(F("frames").filter(num_objects > 10).length() > 0)<br>

Reference: match_frames().

Filtering keypoints

You can use filter_keypoints() to retrieve individual keypoints within a Keypoint instance that match a specified condition.

The following table uses this example data:

import fiftyone as fo
from fiftyone import ViewField as F

ds = fo.Dataset()
ds.add_samples(
    [\
        fo.Sample(\
            filepath="image1.jpg",\
            predictions=fo.Keypoints(\
                keypoints=[\
                    fo.Keypoint(\
                        label="person",\
                        points=[(0.1, 0.1), (0.1, 0.9), (0.9, 0.9), (0.9, 0.1)],\
                        confidence=[0.7, 0.8, 0.95, 0.99],\
                    )\
                ]\
            )\
        ),\
        fo.Sample(filepath="image2.jpg"),\
    ]
)

ds.default_skeleton = fo.KeypointSkeleton(
    labels=["nose", "left eye", "right eye", "left ear", "right ear"],
    edges=[[0, 1, 2, 0], [0, 3], [0, 4]],
)
Operation Only include predicted keypoints with confidence > 0.9
Idiomatic <br>ds.filter_keypoints("predictions", filter=F("confidence") > 0.9)<br>
Brute force <br>tmp = ds.clone()<br>for sample in tmp.iter_samples(autosave=True):<br> if sample.predictions is None:<br> continue<br> for keypoint in sample.predictions.keypoints:<br> for i, confidence in enumerate(keypoint.confidence):<br> if confidence <= 0.9:<br> keypoint.points[i] = [None, None]<br>

Reference: match_frames().