Finding Classification Mistakes with FiftyOneĀ¶
Annotations mistakes create an artificial ceiling on the performance of your models. However, finding these mistakes by hand is at least as arduous as the original annotation work! Enter FiftyOne.
In this tutorial, we explore how FiftyOne can be used to help you find mistakes in your classification annotations. To detect mistakes in detection datasets, check out this tutorial.
We'll cover the following concepts:
- Loading your existing dataset into FiftyOne
- Adding model predictions to your dataset
- Computing insights into your dataset relating to possible label mistakes
- Visualizing mistakes in the FiftyOne App
So, what's the takeaway?
FiftyOne can help you find and correct label mistakes in your datasets, enabling you to curate higher quality datasets and, ultimately, train better models!
SetupĀ¶
If you haven't already, install FiftyOne:
!pip install fiftyone
We'll also need torch
and torchvision
installed:
!pip install torch torchvision
In this tutorial, we'll use a pretrained CIFAR-10 PyTorch model (a ResNet-50) from the web:
# Download the software
!git clone --depth 1 --branch v2.1 https://github.com/huyvnphan/PyTorch_CIFAR10.git
# Download the pretrained model (90MB)
!eta gdrive download --public \
1dGfpeFK_QG0kV-U6QDHMX2EOGXPqaNzu \
PyTorch_CIFAR10/cifar10_models/state_dicts/resnet50.pt
Manipulating the dataĀ¶
For this walkthrough, we will artificially perturb an existing dataset with mistakes on the labels. Of course, in your normal workflow, you would not add labeling mistakes; this is only for the sake of the walkthrough.
The code block below loads the test split of the CIFAR-10 dataset into FiftyOne and randomly breaks 10% (1000 samples) of the labels:
import random
import fiftyone as fo
import fiftyone.zoo as foz
# Load the CIFAR-10 test split
# Downloads the dataset from the web if necessary
dataset = foz.load_zoo_dataset("cifar10", split="test")
# Get the CIFAR-10 classes list
classes = dataset.default_classes
# Artificially corrupt 10% of the labels
_num_mistakes = int(0.1 * len(dataset))
for sample in dataset.take(_num_mistakes):
mistake = random.randint(0, 9)
while classes[mistake] == sample.ground_truth.label:
mistake = random.randint(0, 9)
sample.tags.append("mistake")
sample.ground_truth = fo.Classification(label=classes[mistake])
sample.save()
Let's print some information about the dataset to verify the operation that we performed:
# Verify that the `mistake` tag is now in the dataset's schema
print(dataset)
Name: cifar10-test Media type: image Num samples: 10000 Persistent: False Tags: ['mistake', 'test'] Sample fields: filepath: fiftyone.core.fields.StringField tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField) metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata) ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)
# Count the number of samples with the `mistake` tag
num_mistakes = len(dataset.match_tags("mistake"))
print("%d ground truth labels are now mistakes" % num_mistakes)
1000 ground truth labels are now mistakes
Add predictions to the datasetĀ¶
Using an off-the-shelf model, let's now add predictions to the dataset, which are necessary for us to deduce some understanding of the possible label mistakes.
The code block below adds model predictions to another randomly chosen 10% (1000 samples) of the dataset:
import sys
import numpy as np
import torch
import torchvision
from torch.utils.data import DataLoader
import fiftyone.utils.torch as fout
sys.path.insert(1, "PyTorch_CIFAR10")
from cifar10_models import resnet50
def make_cifar10_data_loader(image_paths, sample_ids, batch_size):
mean = [0.4914, 0.4822, 0.4465]
std = [0.2023, 0.1994, 0.2010]
transforms = torchvision.transforms.Compose(
[
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize(mean, std),
]
)
dataset = fout.TorchImageDataset(
image_paths, sample_ids=sample_ids, transform=transforms
)
return DataLoader(dataset, batch_size=batch_size, num_workers=4)
def predict(model, imgs):
logits = model(imgs).detach().cpu().numpy()
predictions = np.argmax(logits, axis=1)
odds = np.exp(logits)
confidences = np.max(odds, axis=1) / np.sum(odds, axis=1)
return predictions, confidences, logits
#
# Load a model
#
# Model performance numbers are available at:
# https://github.com/huyvnphan/PyTorch_CIFAR10
#
model = resnet50(pretrained=True)
model_name = "resnet50"
#
# Extract a few images to process
# (some of these will have been manipulated above)
#
num_samples = 1000
batch_size = 20
view = dataset.take(num_samples)
image_paths, sample_ids = zip(*[(s.filepath, s.id) for s in view.iter_samples()])
data_loader = make_cifar10_data_loader(image_paths, sample_ids, batch_size)
#
# Perform prediction and store results in dataset
#
with fo.ProgressBar() as pb:
for imgs, sample_ids in pb(data_loader):
predictions, _, logits_ = predict(model, imgs)
# Add predictions to your FiftyOne dataset
for sample_id, prediction, logits in zip(sample_ids, predictions, logits_):
sample = dataset[sample_id]
sample.tags.append("processed")
sample[model_name] = fo.Classification(
label=classes[prediction], logits=logits,
)
sample.save()
100% |āāāāāāāāāāāāāāāāāāā| 50/50 [11.0s elapsed, 0s remaining, 4.7 samples/s]
Let's print some information about the predictions that were generated and how many of them correspond to samples whose ground truth labels were corrupted:
# Count the number of samples with the `processed` tag
num_processed = len(dataset.match_tags("processed"))
# Count the number of samples with both `processed` and `mistake` tags
num_corrupted = len(dataset.match_tags("processed").match_tags("mistake"))
print("Added predictions to %d samples" % num_processed)
print("%d of these samples have label mistakes" % num_corrupted)
Added predictions to 1000 samples 86 of these samples have label mistakes
Find the mistakesĀ¶
Now we can run a method from FiftyOne that estimates the mistakenness of the ground samples for which we generated predictions:
import fiftyone.brain as fob
# Get samples for which we added predictions
h_view = dataset.match_tags("processed")
# Compute mistakenness
fob.compute_mistakenness(h_view, model_name, label_field="ground_truth", use_logits=True)
Computing mistakenness... 100% |āāāāāāāāāāāāāāā| 1000/1000 [2.4s elapsed, 0s remaining, 446.1 samples/s] Mistakenness computation complete
The above method added mistakenness
field to all samples for which we added
predictions. We can easily sort by likelihood of mistakenness from code:
# Sort by likelihood of mistake (most likely first)
mistake_view = (dataset
.match_tags("processed")
.sort_by("mistakenness", reverse=True)
)
# Print some information about the view
print(mistake_view)
Dataset: cifar10-test Media type: image Num samples: 1000 Tags: ['mistake', 'processed', 'test'] Sample fields: filepath: fiftyone.core.fields.StringField tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField) metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata) ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification) resnet50: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification) mistakenness: fiftyone.core.fields.FloatField View stages: 1. MatchTags(tags=['processed']) 2. SortBy(field_or_expr='mistakenness', reverse=True)
# Inspect the first few samples
print(mistake_view.head())
[<SampleView: { 'id': '6064c24201257d68b7b046d7', 'media_type': 'image', 'filepath': '/home/ben/fiftyone/cifar10/test/data/001326.jpg', 'tags': BaseList(['test', 'mistake', 'processed']), 'metadata': None, 'ground_truth': <Classification: { 'id': '6064c24c01257d68b7b0b34c', 'tags': BaseList([]), 'label': 'ship', 'confidence': None, 'logits': None, }>, 'resnet50': <Classification: { 'id': '6064c26e01257d68b7b0be64', 'tags': BaseList([]), 'label': 'deer', 'confidence': None, 'logits': array([-0.925419 , -1.2076195 , -0.37321544, -0.2750331 , 6.723097 , -0.44599843, -0.7555994 , -0.43585306, -1.1593063 , -1.1450499 ], dtype=float32), }>, 'mistakenness': 0.9778614850560818, }>, <SampleView: { 'id': '6064c24201257d68b7b04977', 'media_type': 'image', 'filepath': '/home/ben/fiftyone/cifar10/test/data/001550.jpg', 'tags': BaseList(['test', 'mistake', 'processed']), 'metadata': None, 'ground_truth': <Classification: { 'id': '6064c24c01257d68b7b0b20e', 'tags': BaseList([]), 'label': 'deer', 'confidence': None, 'logits': None, }>, 'resnet50': <Classification: { 'id': '6064c26701257d68b7b0b94a', 'tags': BaseList([]), 'label': 'automobile', 'confidence': None, 'logits': array([-0.6696544 , 6.331352 , -0.90380824, -0.8609426 , -0.97413117, -0.8693008 , -0.8035213 , -0.9215686 , -0.48488098, 0.15646096], dtype=float32), }>, 'mistakenness': 0.967886808991774, }>, <SampleView: { 'id': '6064c24401257d68b7b060c9', 'media_type': 'image', 'filepath': '/home/ben/fiftyone/cifar10/test/data/003540.jpg', 'tags': BaseList(['test', 'processed']), 'metadata': None, 'ground_truth': <Classification: { 'id': '6064c24401257d68b7b060c8', 'tags': BaseList([]), 'label': 'cat', 'confidence': None, 'logits': None, }>, 'resnet50': <Classification: { 'id': '6064c26e01257d68b7b0bdfe', 'tags': BaseList([]), 'label': 'ship', 'confidence': None, 'logits': array([ 0.74897313, -0.7627302 , -0.79189354, -0.78844124, -1.0206403 , -1.0742921 , -0.9762771 , -1.0660601 , 6.3457403 , -0.6143737 ], dtype=float32), }>, 'mistakenness': 0.9653186284617471, }>]
Let's open the FiftyOne App to visually inspect the results:
# Show only the samples for which we added label mistakes
session.view = dataset.match_tags("mistake")
# Show the samples we processed in rank order by the mistakenness
session.view = mistake_view
session.freeze() # screenshot the active App for sharing