module documentation

Random sampling utilities.

Copyright 2017-2025, Voxel51, Inc.

Function balanced_sample Generates a random sample of size k from the given collection such that the expected histogram of path values in the sample is uniform.
Function random_split Generates a random partition of the samples in the collection according to the specified split fractions.
Function weighted_sample Generates a random sample of size k from the given collection such that the probability of selecting each sample is proportional to the given per-sample weights.
def balanced_sample(sample_collection, k, path, tag=None, exact=True, seed=None): (source)

Generates a random sample of size k from the given collection such that the expected histogram of path values in the sample is uniform.


import fiftyone as fo
import fiftyone.utils.random as four
import fiftyone.zoo as foz
from fiftyone import ViewField as F

dataset = foz.load_zoo_dataset("cifar10", split="train")

# Sample proportional to label length
weights = dataset.values(F("ground_truth.label").strlen())
view1 = four.weighted_sample(dataset, 10000, weights)

# Now take a balanced sample from this unbalanced sample
view2 = four.balanced_sample(view1, 2000, "ground_truth.label")

# Plot results
plot1 = fo.CategoricalHistogram("ground_truth.label", init_view=dataset)
plot2 = fo.CategoricalHistogram(
    order=lambda kv: -len(kv[0]),  # order by label length
plot3 = fo.CategoricalHistogram("ground_truth.label", init_view=view2)
plot = fo.ViewGrid([plot1, plot2, plot3])
sample_collectiona fiftyone.core.collections.SampleCollection
kthe number of samples to select
paththe categorical field against which to sample, e.g., "ground_truth.label"
tag:Nonean optional sample tag to use to encode the results
exact:Truewhether to tag exactly k samples (True) or sample so that the expected number of samples is k (False)
seed:Nonean optional random seed to use
a fiftyone.core.view.DatasetView containing the sample
def random_split(sample_collection, split_fracs, seed=None): (source)

Generates a random partition of the samples in the collection according to the specified split fractions.


import fiftyone as fo
import fiftyone.utils.random as four
import fiftyone.zoo as foz

# A dataset with `ground_truth` detections and no tags
dataset = (
    .set_field("tags", [])

# Generate a random sample and encode results via tags

four.random_split(dataset, {"train": 0.7, "test": 0.2, "val": 0.1})

# {'train': 140, 'test': 40, 'val': 20}

# Generate a random sample in-memory

view1, view2 = four.random_split(dataset, [0.5, 0.5])

assert len(view1) + len(view2) == len(dataset)
assert set(view1.values("id")).isdisjoint(set(view2.values("id")))
sample_collectiona fiftyone.core.collections.SampleCollection

can be either of the following:

  • a dict mapping tag strings to split fractions in [0, 1]. In this case, the partition is denoted by tagging each sample with its assigned split
  • a list of split fractions in [0, 1]. In this case, a corresponding list of fiftyone.core.view.DatasetView instances containing the partition is returned

In either case, the split fractions are normalized so that they sum to 1, if necessary

seed:Nonean optional random seed

one of the following

def weighted_sample(sample_collection, k, weights, tag=None, exact=True, seed=None): (source)

Generates a random sample of size k from the given collection such that the probability of selecting each sample is proportional to the given per-sample weights.


import fiftyone as fo
import fiftyone.utils.random as four
import fiftyone.zoo as foz
from fiftyone import ViewField as F

dataset = foz.load_zoo_dataset("cifar10", split="train")

# Sample proportional to label length
weights = dataset.values(F("ground_truth.label").strlen())
sample = four.weighted_sample(dataset, 10000, weights)

# Plot results
plot = fo.CategoricalHistogram(
    order=lambda kv: -len(kv[0]),  # order by label length
sample_collectiona fiftyone.core.collections.SampleCollection
kthe number of samples to select
weightsan array of per-sample weights
tag:Nonean optional sample tag to use to encode the results
exact:Truewhether to tag exactly k samples (True) or sample so that the expected number of samples is k (False)
seed:Nonean optional random seed to use
a fiftyone.core.view.DatasetView containing the sample