module documentation

Database utilities.

Copyright 2017-2025, Voxel51, Inc.

Class DatabaseConfigDocument Backing document for the database config.
Function aggregate Executes one or more aggregations on a collection.
Function bulk_write Performs a batch of write operations on a collection.
Function count_documents Undocumented
Function delete_annotation_run Deletes the annotation run with the given key from the dataset with the given name.
Function delete_annotation_runs Deletes all annotation runs from the dataset with the given name.
Function delete_brain_run Deletes the brain method run with the given key from the dataset with the given name.
Function delete_brain_runs Deletes all brain method runs from the dataset with the given name.
Function delete_dataset Deletes the dataset with the given name.
Function delete_evaluation Deletes the evaluation run with the given key from the dataset with the given name.
Function delete_evaluations Deletes all evaluations from the dataset with the given name.
Function delete_run Deletes the run with the given key from the dataset with the given name.
Function delete_runs Deletes all runs from the dataset with the given name.
Function delete_saved_view Deletes the saved view with the given name from the dataset with the given name.
Function delete_saved_views Deletes all saved views from the dataset with the given name.
Function drop_collection Drops specified collection from the database.
Function drop_database Drops the database.
Function drop_orphan_collections Drops all orphan collections from the database.
Function drop_orphan_runs Drops all orphan runs from the database.
Function drop_orphan_saved_views Drops all orphan saved views from the database.
Function drop_orphan_stores Drops all orphan execution stores from the database.
Function ensure_connection Ensures database connection exists
Function establish_db_conn Establishes the database connection.
Function export_collection Exports the collection to disk in JSON format.
Function export_document Exports the document to disk in JSON format.
Function get_async_db_client Returns an async database client.
Function get_async_db_conn Returns an async connection to the database.
Function get_collection_stats Sets stats about the collection.
Function get_db_client Returns a database client.
Function get_db_config Retrieves the database config.
Function get_db_conn Returns a connection to the database.
Function import_collection Imports the collection from JSON on disk.
Function import_document Imports a document from JSON on disk.
Function insert_documents Inserts documents into a collection.
Function list_collections Returns a list of all collection names in the database.
Function list_datasets Returns the list of available FiftyOne datasets.
Function patch_annotation_runs Ensures that the annotation runs in the runs collection for the given dataset exactly match the values in its dataset document.
Function patch_brain_runs Ensures that the brain method runs in the runs collection for the given dataset exactly match the values in its dataset document.
Function patch_evaluations Ensures that the evaluation runs in the runs collection for the given dataset exactly match the values in its dataset document.
Function patch_runs Ensures that the runs in the runs collection for the given dataset exactly match the values in its dataset document.
Function patch_saved_views Ensures that the saved view documents in the views collection for the given dataset exactly match the IDs in its dataset document.
Function patch_workspaces Ensures that the workspace documents in the workspaces collection for the given dataset exactly match the IDs in its dataset document.
Function stream_collection Streams the contents of the collection to stdout.
Function sync_database Syncs all pending database writes to disk.
Variable foa Undocumented
Variable fob Undocumented
Variable fod Undocumented
Variable foe Undocumented
Variable fors Undocumented
Variable logger Undocumented
Class _DryRunLoggerAdapter Undocumented
Function _apply_options Undocumented
Function _async_connect Undocumented
Function _connect Undocumented
Function _delete_non_persistent_datasets_if_allowed Deletes all non-persistent datasets if and only if we are the only client currently connected to the database.
Function _delete_run Undocumented
Function _delete_run_docs Undocumented
Function _delete_run_results Undocumented
Function _delete_runs Undocumented
Function _delete_saved_views Undocumented
Function _delete_stores Undocumented
Async Function _do_async_aggregate Undocumented
Async Function _do_async_pooled_aggregate Undocumented
Function _do_pooled_aggregate Undocumented
Function _export_collection_multi Undocumented
Function _export_collection_single Undocumented
Function _get_logger Undocumented
Function _get_result_ids Undocumented
Function _get_run_ids Undocumented
Function _get_saved_view_ids Undocumented
Function _handle_multiple_config_docs Undocumented
Function _import_collection_multi Undocumented
Function _import_collection_single Undocumented
Function _patch_referenced_docs Ensures that the referenced documents in the collection for the given dataset exactly match the IDs in its dataset document.
Function _patch_runs Undocumented
Function _validate_db_version Undocumented
Constant _RUNS_FIELDS Undocumented
Variable _async_client Undocumented
Variable _client Undocumented
Variable _connection_kwargs Undocumented
Variable _db_service Undocumented
def aggregate(collection, pipelines): (source)

Executes one or more aggregations on a collection.

Multiple aggregations are executed using multiple threads, and their results are returned as lists rather than cursors.

Parameters
collectiona pymongo.collection.Collection or motor.motor_asyncio.AsyncIOMotorCollection
pipelinesa MongoDB aggregation pipeline or a list of pipelines
Returns
  • If a single pipeline is provided, a pymongo.command_cursor.CommandCursor or motor.motor_asyncio.AsyncIOMotorCommandCursor is returned
  • If multiple pipelines are provided, each cursor is extracted into a list and the list of lists is returned
def bulk_write(ops, coll, ordered=False, progress=False): (source)

Performs a batch of write operations on a collection.

Parameters
opsa list of pymongo operations
colla pymongo collection
ordered:Falsewhether the operations must be performed in order
progress:Falsewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
def count_documents(coll, pipeline): (source)

Undocumented

def delete_annotation_run(name, anno_key, dry_run=False): (source)

Deletes the annotation run with the given key from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset or fiftyone.core.collections.SampleCollection.delete_annotation_run, which is helpful if a dataset's backing document or collections are corrupted and cannot be loaded via the normal pathways.

Note that, as this method does not load fiftyone.core.runs.Run instances, it does not call fiftyone.core.runs.Run.cleanup.

Parameters
namethe name of the dataset
anno_keythe annotation key
dry_run:Falsewhether to log the actions that would be taken but not perform them
def delete_annotation_runs(name, dry_run=False): (source)

Deletes all annotation runs from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset or fiftyone.core.collections.SampleCollection.delete_annotation_runs, which is helpful if a dataset's backing document or collections are corrupted and cannot be loaded via the normal pathways.

Note that, as this method does not load fiftyone.core.runs.Run instances, it does not call fiftyone.core.runs.Run.cleanup.

Parameters
namethe name of the dataset
dry_run:Falsewhether to log the actions that would be taken but not perform them
def delete_brain_run(name, brain_key, dry_run=False): (source)

Deletes the brain method run with the given key from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset or fiftyone.core.collections.SampleCollection.delete_brain_run, which is helpful if a dataset's backing document or collections are corrupted and cannot be loaded via the normal pathways.

Note that, as this method does not load fiftyone.core.runs.Run instances, it does not call fiftyone.core.runs.Run.cleanup.

Parameters
namethe name of the dataset
brain_keythe brain key
dry_run:Falsewhether to log the actions that would be taken but not perform them
def delete_brain_runs(name, dry_run=False): (source)

Deletes all brain method runs from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset or fiftyone.core.collections.SampleCollection.delete_brain_runs, which is helpful if a dataset's backing document or collections are corrupted and cannot be loaded via the normal pathways.

Note that, as this method does not load fiftyone.core.runs.Run instances, it does not call fiftyone.core.runs.Run.cleanup.

Parameters
namethe name of the dataset
dry_run:Falsewhether to log the actions that would be taken but not perform them
def delete_dataset(name, dry_run=False): (source)

Deletes the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset, which is helpful if a dataset's backing document or collections are corrupted and cannot be loaded via the normal pathways.

Parameters
namethe name of the dataset
dry_run:Falsewhether to log the actions that would be taken but not perform them
def delete_evaluation(name, eval_key, dry_run=False): (source)

Deletes the evaluation run with the given key from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset or fiftyone.core.collections.SampleCollection.delete_evaluation, which is helpful if a dataset's backing document or collections are corrupted and cannot be loaded via the normal pathways.

Note that, as this method does not load fiftyone.core.runs.Run instances, it does not call fiftyone.core.runs.Run.cleanup.

Parameters
namethe name of the dataset
eval_keythe evaluation key
dry_run:Falsewhether to log the actions that would be taken but not perform them
def delete_evaluations(name, dry_run=False): (source)

Deletes all evaluations from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset or fiftyone.core.collections.SampleCollection.delete_evaluations, which is helpful if a dataset's backing document or collections are corrupted and cannot be loaded via the normal pathways.

Note that, as this method does not load fiftyone.core.runs.Run instances, it does not call fiftyone.core.runs.Run.cleanup.

Parameters
namethe name of the dataset
dry_run:Falsewhether to log the actions that would be taken but not perform them
def delete_run(name, run_key, dry_run=False): (source)

Deletes the run with the given key from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset or fiftyone.core.collections.SampleCollection.delete_run, which is helpful if a dataset's backing document or collections are corrupted and cannot be loaded via the normal pathways.

Note that, as this method does not load fiftyone.core.runs.Run instances, it does not call fiftyone.core.runs.Run.cleanup.

Parameters
namethe name of the dataset
run_keythe run key
dry_run:Falsewhether to log the actions that would be taken but not perform them
def delete_runs(name, dry_run=False): (source)

Deletes all runs from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset or fiftyone.core.collections.SampleCollection.delete_runs, which is helpful if a dataset's backing document or collections are corrupted and cannot be loaded via the normal pathways.

Note that, as this method does not load fiftyone.core.runs.Run instances, it does not call fiftyone.core.runs.Run.cleanup.

Parameters
namethe name of the dataset
dry_run:Falsewhether to log the actions that would be taken but not perform them
def delete_saved_view(dataset_name, view_name, dry_run=False): (source)

Deletes the saved view with the given name from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset or fiftyone.core.collections.SampleCollection.load_saved_view, which is helpful if a dataset's backing document or collections are corrupted and cannot be loaded via the normal pathways.

Parameters
dataset_namethe name of the dataset
view_namethe name of the saved view
dry_run:Falsewhether to log the actions that would be taken but not perform them
def delete_saved_views(dataset_name, dry_run=False): (source)

Deletes all saved views from the dataset with the given name.

This is a low-level implementation of deletion that does not call fiftyone.core.dataset.load_dataset or fiftyone.core.collections.SampleCollection.load_saved_view, which is helpful if a dataset's backing document or collections are corrupted and cannot be loaded via the normal pathways.

Parameters
dataset_namethe name of the dataset
dry_run:Falsewhether to log the actions that would be taken but not perform them
def drop_collection(collection_name): (source)

Drops specified collection from the database.

Parameters
collection_namethe collection name
def drop_database(): (source)

Drops the database.

def drop_orphan_collections(dry_run=False): (source)

Drops all orphan collections from the database.

Orphan collections are collections that are not associated with any known dataset or other collections used by FiftyOne.

Parameters
dry_run:Falsewhether to log the actions that would be taken but not perform them
def drop_orphan_runs(dry_run=False): (source)

Drops all orphan runs from the database.

Orphan runs are runs that are not associated with any known dataset or other collections used by FiftyOne.

Parameters
dry_run:Falsewhether to log the actions that would be taken but not perform them
def drop_orphan_saved_views(dry_run=False): (source)

Drops all orphan saved views from the database.

Orphan saved views are saved view documents that are not associated with any known dataset or other collections used by FiftyOne.

Parameters
dry_run:Falsewhether to log the actions that would be taken but not perform them
def drop_orphan_stores(dry_run=False): (source)

Drops all orphan execution stores from the database.

Orphan stores are those that are associated with a dataset that no longer exists in the database.

Parameters
dry_run:Falsewhether to log the actions that would be taken but not perform them
def ensure_connection(): (source)

Ensures database connection exists

def establish_db_conn(config): (source)

Establishes the database connection.

If fiftyone.config.database_uri is defined, then we connect to that URI. Otherwise, a fiftyone.core.service.DatabaseService is created.

Parameters
configa fiftyone.core.config.FiftyOneConfig
Raises
ConnectionErrorif a connection to mongod could not be established
FiftyOneConfigErrorif fiftyone.config.database_uri is not defined and mongod could not be found
ServiceExecutableNotFoundif fiftyone.core.service.DatabaseService startup was attempted, but mongod was not found in fiftyone.db.bin
RuntimeErrorif the mongod found does not meet FiftyOne's requirements, or validation could not occur
def export_collection(docs, json_dir_or_path, key='documents', patt='{idx:06d}-{id}.json', num_docs=None, progress=None): (source)

Exports the collection to disk in JSON format.

Parameters
docsan iterable containing the documents to export
json_dir_or_paththe path to write a single JSON file containing the entire collection, or a directory in which to write per-document JSON files
key:"documents"the field name under which to store the documents when json_path is a single JSON file
pattUndocumented
num_docs:Nonethe total number of documents. If omitted, this must be computable via len(docs)
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
patt ("{idx06d}-{id}.json"): a filename pattern to use when json_path is a directory. The pattern may contain idx to refer to the index of the document in docs or id to refer to the document's ID
def export_document(doc, json_path): (source)

Exports the document to disk in JSON format.

Parameters
doca BSON document dict
json_paththe path to write the JSON file
def get_async_db_client(use_global=False): (source)

Returns an async database client.

Parameters
use_globalwhether to use the global client singleton
Returns
a motor.motor_asyncio.AsyncIOMotorClient
def get_async_db_conn(use_global=False): (source)

Returns an async connection to the database.

Returns
a motor.motor_asyncio.AsyncIOMotorDatabase
def get_collection_stats(collection_name): (source)

Sets stats about the collection.

Parameters
collection_namethe name of the collection
Returns
a stats dict
def get_db_client(): (source)

Returns a database client.

Returns
a pymongo.mongo_client.MongoClient
def get_db_config(): (source)

Retrieves the database config.

Returns
a DatabaseConfigDocument
def get_db_conn(): (source)

Returns a connection to the database.

Returns
a pymongo.database.Database
def import_collection(json_dir_or_path, key='documents'): (source)

Imports the collection from JSON on disk.

Parameters
json_dir_or_paththe path to a JSON file on disk, or a directory containing per-document JSON files
key:"documents"the field name under which the documents are stored when json_path is a single JSON file
Returns

a tuple of

  • an iterable of BSON documents
  • the number of documents
def import_document(json_path): (source)

Imports a document from JSON on disk.

Parameters
json_paththe path to the document
Returns
a BSON document dict
def insert_documents(docs, coll, ordered=False, progress=None, num_docs=None): (source)

Inserts documents into a collection.

The _id field of the input documents will be populated if it is not already set.

Parameters
docsan iterable of BSON document dicts
colla pymongo collection
ordered:Falsewhether the documents must be inserted in order
progress:Nonewhether to render a progress bar (True/False), use the default value fiftyone.config.show_progress_bars (None), or a progress callback function to invoke instead
num_docs:Nonethe total number of documents. Only used when progress=True. If omitted, this will be computed via len(docs), if possible
Returns
a list of IDs of the inserted documents
def list_collections(): (source)

Returns a list of all collection names in the database.

Returns
a list of all collection names
def list_datasets(): (source)

Returns the list of available FiftyOne datasets.

This is a low-level implementation of dataset listing that does not call fiftyone.core.dataset.list_datasets, which is helpful if a database may be corrupted.

Returns
a list of Dataset names
def patch_annotation_runs(dataset_name, dry_run=False): (source)

Ensures that the annotation runs in the runs collection for the given dataset exactly match the values in its dataset document.

Parameters
dataset_namethe name of the dataset
dry_run:Falsewhether to log the actions that would be taken but not perform them
def patch_brain_runs(dataset_name, dry_run=False): (source)

Ensures that the brain method runs in the runs collection for the given dataset exactly match the values in its dataset document.

Parameters
dataset_namethe name of the dataset
dry_run:Falsewhether to log the actions that would be taken but not perform them
def patch_evaluations(dataset_name, dry_run=False): (source)

Ensures that the evaluation runs in the runs collection for the given dataset exactly match the values in its dataset document.

Parameters
dataset_namethe name of the dataset
dry_run:Falsewhether to log the actions that would be taken but not perform them
def patch_runs(dataset_name, dry_run=False): (source)

Ensures that the runs in the runs collection for the given dataset exactly match the values in its dataset document.

Parameters
dataset_namethe name of the dataset
dry_run:Falsewhether to log the actions that would be taken but not perform them
def patch_saved_views(dataset_name, dry_run=False): (source)

Ensures that the saved view documents in the views collection for the given dataset exactly match the IDs in its dataset document.

Parameters
dataset_namethe name of the dataset
dry_run:Falsewhether to log the actions that would be taken but not perform them
def patch_workspaces(dataset_name, dry_run=False): (source)

Ensures that the workspace documents in the workspaces collection for the given dataset exactly match the IDs in its dataset document.

Parameters
dataset_namethe name of the dataset
dry_run:Falsewhether to log the actions that would be taken but not perform them
def stream_collection(collection_name): (source)

Streams the contents of the collection to stdout.

Parameters
collection_namethe name of the collection
def sync_database(): (source)

Syncs all pending database writes to disk.

Undocumented

Undocumented

Undocumented

Undocumented

Undocumented

Undocumented

def _apply_options(db): (source)

Undocumented

def _async_connect(use_global=False): (source)

Undocumented

def _connect(): (source)

Undocumented

def _delete_non_persistent_datasets_if_allowed(): (source)

Deletes all non-persistent datasets if and only if we are the only client currently connected to the database.

def _delete_run(dataset_name, run_key, runs_field, run_str, dry_run=False): (source)

Undocumented

def _delete_run_docs(conn, run_ids): (source)

Undocumented

def _delete_run_results(conn, result_ids): (source)

Undocumented

def _delete_runs(dataset_name, runs_field, run_str, dry_run=False): (source)

Undocumented

def _delete_saved_views(conn, view_ids): (source)

Undocumented

def _delete_stores(conn, dataset_ids): (source)

Undocumented

async def _do_async_aggregate(collection, pipeline): (source)

Undocumented

async def _do_async_pooled_aggregate(collection, pipelines): (source)

Undocumented

def _do_pooled_aggregate(collection, pipelines): (source)

Undocumented

def _export_collection_multi(docs, json_dir, patt, num_docs, progress=None): (source)

Undocumented

def _export_collection_single(docs, json_path, key, num_docs, progress=None): (source)

Undocumented

def _get_logger(dry_run=False): (source)

Undocumented

def _get_result_ids(conn, dataset_dict): (source)

Undocumented

def _get_run_ids(dataset_dict): (source)

Undocumented

def _get_saved_view_ids(dataset_dict): (source)

Undocumented

def _handle_multiple_config_docs(conn, config_docs): (source)

Undocumented

def _import_collection_multi(json_dir): (source)

Undocumented

def _import_collection_single(json_path, key): (source)

Undocumented

def _patch_referenced_docs(dataset_name, collection_name, field_name, dry_run=False): (source)

Ensures that the referenced documents in the collection for the given dataset exactly match the IDs in its dataset document.

def _patch_runs(dataset_name, runs_field, run_cls, run_str, dry_run=False): (source)

Undocumented

def _validate_db_version(config, client): (source)

Undocumented

_RUNS_FIELDS: list[str] = (source)

Undocumented

Value
['annotation_runs', 'brain_methods', 'evaluations', 'runs']
_async_client = (source)

Undocumented

Undocumented

_connection_kwargs: dict = (source)

Undocumented

_db_service = (source)

Undocumented