Voxel51 Integration

The integration between V7 and Voxel51 allows for maximising value from data and annotation efficiency, while decreasing data volumes. Users can curate and improve datasets, easily annotate and reannotate data, and transfer it seamlessly between the platforms.

Key Benefits of the Integration

Dataset curation for smarter annotation

FiftyOne by Voxel51 is the leading open-source toolkit for building high-quality datasets and computer vision models, and helps to visualise, curate, manage, and QA data, and automate the workflows that support enterprise machine learning. FiftyOne will help users identify the most relevant samples from datasets to send to V7 for annotation. It does this by providing a variety of tools and workflows to:

  • explore and balance your datasets by class and metadata distribution
  • visualize, de-duplicate, sample, and pre-label your data distributions using embeddings
  • perform automated pre-labeling with off-the-shelf or custom models

These workflows enable the creation of diverse, representative data subsets while minimizing data volume—getting you the most out of your annotation budget and boosting model performance.

Dataset improvements for model fine-tuning and evaluation

This integration clears the way for efficiently optimizing and augmenting existing datasets to boost model performance even further.

FiftyOne enables powerful image and object-level annotation review and QA workflows. The platform’s embedding visualization, compatible with off-the-shelf and custom models, can be used to help analyze the quality of the annotations and the dataset—to weed out mistakes and find areas for improvement.

By adding model predictions to a dataset for comparison with ground truth, or using the built-in similarity search features and vector database integrations, you can identify the difficult samples with targeted precision and pinpoint inaccurate annotations. FiftyOne’s sample- and label-level tags, as well as saved views, make it easy to mark samples for reannotation back in V7.

Seamless data transfers and support for all data formats

Data can be easily sent back and forth between Voxel51 and V7 via an API—to reduce the time and effort needed for transfers and ensure top data security. The integration will allow for seamless conversion of all data formats, retaining all existing annotations (including labels made in other tools).

V7 supports all data in its native formats—be it images, videos, or medical imagery. Powered with auto-annotation, specialized video labeling, and SAM integration, labeling teams can annotate data faster without sacrificing quality.

Notably, V7 supports DICOM, NIfTI, and WSI imagery, showcasing its commitment to delivering fit-for-purpose infrastructure for industries of all types.

Using V7 and Voxel51 together

The integration between Voxel51’s FiftyOne and V7 Darwin is provided by darwin_fiftyone.

It enables FiftyOne users to send subsets of their datasets to Darwin for annotation and review. The annotated data can then be imported back into FiftyOne.

Let’s go through a quick rundown of how to set it up.

Set up FiftyOne

To start, you need to install FiftyOne, an open-source tool for building high-quality datasets and computer vision models.

pip install fiftyone

Darwin V7 configuration

Connect FiftyOne with V7’s Darwin to start annotating files. Here’s how to integrate with the Darwin backend:

  1. Install the backend
pip install darwin-fiftyone
  1. Configure FiftyOne to use darwin-fiftyone
cat ~/.fiftyone/annotation_config.json
{
  "backends": {
    "darwin": {
      "config_cls": "darwin_fiftyone.DarwinBackendConfig",
      "api_key": "d8mLUXQ.**********************"
    }
  }
}

Loading example data in FiftyOne

Let’s start by loading example data into FiftyOne.

import fiftyone.zoo as foz
import fiftyone as fo

dataset = foz.load_zoo_dataset("quickstart", dataset_name="quickstart-example")
session = fo.launch_app(dataset)

Annotation

Now let’s load this data into V7 for annotation and refinement.

To illustrate, let's upload all samples from this dataset into a Darwin dataset named "quickstart-example".

If the dataset doesn't already exist in Darwin, it will be created.

📘

Anno Key

The anno_key is a unique alphanumeric string for an annotation run in the V7 platform. It should start with a letter.

You can list annotation runs with the list_annotation_runs method on your dataset/view e.g. dataset.list_annotation_runs()

Note: It is not related to the V7 API Key used

dataset.annotate(
    anno_key,
    label_field="ground_truth",
    attributes=["iscrowd"],
    launch_editor=True,
    backend="darwin",
    dataset_slug="quickstart-example",
    external_storage="example-darwin-storage-slug"
)

Annotating Videos

The integration also supports video annotation. You should call the ensure_frames method on your dataset/view before annotation so that unannotated videos have frame ids

dataset.ensure_frames()

dataset.annotate(
        anno_key,
        label_field="frames.detections",
        launch_editor=True,
        backend="darwin",
        dataset_slug="quickstart-example-video",
    )

After the annotations and reviews are completed in Darwin, you can fetch the updated data as follows:

dataset.load_annotations(anno_key)

Checking Annotation Job Status

To check the progress of your annotation job, you can call the check_statusmethod

dataset.load_annotations(anno_key)
results = dataset.load_annotation_results(anno_key)
results.check_status()

API

In addition to the standard arguments provided by dataset.annotate(), we also support:

  • backend=darwin: The Darwin backend being used
  • dataset_slug: The name of the dataset to use or create in Darwin‍
  • external_storage: The sluggified name of the Darwin external storage; all samples should exist in this external storage

Supported Annotation types

V7 Top level annotation types
Bounding boxes
Polygons (closed polylines)
Keypoints
Tags
V7 Sub annotation types
Attributes
Properties
Text
Instance Id

Model training and next steps

The data annotated in V7 can now be further reviewed and refined before it’s used for model training.

FiftyOne has a variety of tools to enable easy integration with your model training pipelines. You can easily export your data in common formats like COCO or YOLO suitable for use with most training packages. FiftyOne also provides workflows for popular model training libraries such as PyTorch, PyTorch Lightning Flash, and Tensorflow.

With FiftyOne’s new plugins architecture, custom training workflows can be directly integrated into the FiftyOne App and become available at the click of a button. The delegated execution feature lets you process the workflows on dedicated compute nodes.

Once a model is trained, you can easily run inference, load the model predictions back into FiftyOne, and evaluate them against the ground truth annotation.

# Load an existing dataset with predictions
dataset = foz.load_zoo_dataset("quickstart")

# Evaluate model predictions
dataset.evaluate_detections(
    "predictions",
    gt_field="ground_truth",
    eval_key="eval",
)

FiftyOne’s evaluation and filtering capabilities make it easy to spot discrepancies between model predictions and ground truth, including annotation errors or difficult samples where your model underperforms.

Tag annotation mistakes in FiftyOne for reannotation in V7, and add the difficult samples to the next iteration of your training set. Take a snapshot of your dataset and move on to the next round of improvements.

Loading cloud-backed media and working with dataset versioning

If you’re a FiftyOne Teams customer and work with cloud-backed files, you will be able to load items directly from your cloud storage.

The two steps you need to complete are:

  • Set up V7 external storage settings with your cloud provider
  • Add the sluggified V7 storage name to your fiftyone-darwin config in the external_storage config field.

The name will be identical to the name field in the V7 external storage settings.

Once you follow these steps, you are ready to connect your cloud-backed media to FiftyOne Teams and V7 Darwin.

📘

Note: The Darwin cloud-backed media registration process is handled as a part of the annotate method.

FiftyOne Teams also includes dataset versioning, which means every annotation and model you ran can now be captured and versioned in a history of dataset snapshots. Dataset snapshots in FiftyOne Teams can be created, browsed, linked to, and re-materialized with ease—without complex naming conventions or manual tracking of versions