Registering Items From External Storage

After you've connected your external storage (AWS, Azure, GCP), you're ready to register files so they can be accessed in a Darwin dataset. This is done through our REST API, so you'll need an API key to continue. Steps to generate your own key are here.

If your storage configuration is read-write, please see the section directly below for step-by-step instructions. Otherwise if using read-only, please navigate to the Read-Only Registration section further down.

Stuck? Check out our troubleshooting guide to resolve common errors:

Read-Write Registration

Registering any read-write file involves sending a POST request to the below API endpoint with a payload containing instructions for Darwin on where to access the item:

f"https://darwin.v7labs.com/api/v2/teams/{team_slug}/items/register_existing"

The Basics

Below is a Python script covering the simplest case of registering a single image file as a dataset item in a dataset. A breakdown of the function of every field within is available below the script.

import requests

# Define constants
api_key = "your-api-key-here"
team_slug = "your-team-slug-here"
dataset_slug = "your-dataset-slug-here"
storage_name = "your-storage-bucket-name-here"

# Populate request headers
headers = {
    "Content-Type": "application/json",
    "Accept": "application/json",
    "Authorization": f"ApiKey {api_key}"
}

# Define registration payload
payload = {
    "items": [
        {
            "path": "/",
            "type": "image",
            "storage_key": "car_folder/car_1.png",
            "name": "car_1.png",
        }
    ],
    "dataset_slug": dataset_slug,
    "storage_slug": storage_name,
}

# Send the request
response = requests.post(
  f"https://darwin.v7labs.com/api/v2/teams/{team_slug}/items/register_existing",
  headers=headers,
  json=payload
)

# Inspect the response for errors
body = response.json()
if response.status_code != 200:
    print("request failed", response.text)
elif 'blocked_items' in body and len(body['blocked_items']) > 0:
    print("failed to register items:")
    for item in body['blocked_items']:
        print("\t - ", item)
    if len(body['items']) > 0: print("successfully registered items:")
    for item in body['items']:
       print("\t - ", item)
else:
    print("success")
  • api_key: Your API key

  • team_slug: Your sluggified team name

  • dataset_slug: The sluggified name of the dataset to register the file in

  • storage_name: The name of your storage integration in your configuration. For example:

Payload-specific fields & concepts:

  • items: It's possible to register multiple items in the same request, therefore items is a list of dictionaries where each dictionary corresponds to one dataset item.
  • path: The folder path within the Darwin dataset that this item should be registered at
  • type: The type of file being registered. It can be image, video, pdf or dicom. This instructs us on how to treat the file so it can be viewed correctly. The type field can be omitted if you include the file extension as part of the slots.file_name field. Please see here for further details
  • storage_key: The exact file path to the file in your external storage. This file path is case sensitive, cannot start with a forward slash, and is entered slightly differently depending on your cloud provider:
    • For AWS S3, exclude the bucket name. For example if the full path to your file is s3://example-bucket/darwin/sub_folder/example_image.jpg then your storage_key must be darwin/sub_folder/example_image.jpg
    • For Azure blobs, include the container name. For example if the full path to your file is https://myaccount.blob.core.windows.net/mycontainer/sub_folder/myblob.jpg then your storage_key must be mycontainer/sub_folder/myblob.jpg
    • For GCP Buckets, exclude the bucket name. For example if the full path to your file is gs://example-bucket/darwin/sub_folder/example_image.jpg, then your storage_key must be darwin/sub_folder/example_image.jpg
  • name: The name of the resulting dataset item as it appears in Darwin. This can be any name you choose, but we strongly recommend giving files the same or similar names to the externally stored files

Every image uploaded in read-write will generate a thumbnail in your external storage at the location specified by your configured storage prefix. For example, in an AWS S3 bucket:

Registering Videos

Registering videos in read-write is nearly identical to registering images. The only difference is that you can specify an optional fps parameter to determine the frequency that frames are sampled from the video at. If fps is left out of the payload, then frames will be extracted at the video's native framerate.

This results in a slightly different payload to above as follows:

payload = {
    "items": [
        {
            "path": "/",
            "type": "video",
            "fps": 5,
            "storage_key": "video_folder/car_video.mp4",
            "name": "car_video.mp4",
        }
    ],
    "dataset_slug": dataset_slug,
    "storage_slug": storage_name,
}

Every video uploaded in read-write will generate a series of files in your external storage at the location specified by your configured storage prefix. These include high-quality and low-quality frames, as well as metadata necessary for video playback. Low-quality frames are used during video playback, and high-quality frames are displayed for annotation when playback is paused.

These are necessary to access the video file correctly in Darwin. For example, in an AWS S3 bucket:

Registering Files in Multiple Slots

If you need to display multiple files next to each other simultaneously, you'll need to register them in different slots. Please refer to this article to gain an understanding of the concept of slots.

To register a dataset with multiple slots from external storage, the registration payload changes in structure as follows:

payload = {
    "items": [
        {
            "path": "/",
            "slots": [
                {
                    "slot_name": "0",
                    "type": "image",
                    "storage_key": "car_folder/car_1.png",
                    "file_name": "car_1.png",
                },
                {
                    "slot_name": "1",
                    "type": "image",
                    "storage_key": "car_folder/car_2.png",
                    "file_name": "car_2.png",
                },
                {
                    "slot_name": "3",
                    "type": "video",
                    "storage_key": "car_folder/cars.mp4",
                    "file_name": "cars.mp4",
                },
            ],
            "name": "cars",
        }
    ],
    "dataset_slug": dataset_slug,
    "storage_slug": storage_name,
}

Important points are:

  • Because the dataset item now contains multiple files, we need to break the item up into separate slots each with a different slot_name. Slots can be named any string, so long as they are unique for items that need to go into separate slots
  • Each item in slots is given a new file_name field. This is distinct from the name field which will be the name of the resulting dataset item in Darwin. file_name should match the exact file name of the file in that slot (i.e. it should match the last part of storage_key). When including file_name, if it correctly specifies the extension of the file in external storage, you can omit the type field. This is because without type, our processing pipeline infers the filetype from the extension of file_name
  • Only DICOM (.dcm) slices can be registered within the same slot_name, resulting in concatenation of those slices. Other file types must occupy their own slot. Please see the section below for further detail
  • It's possible to register different filetypes in different slots within the same dataset item. For example, above we have 2 slots containing images and a third containing a video

Registering DICOM Files

DICOM (.dcm) files can be either individual slices, or a series of slices. A series of slices as a single .dcm is registered similarly to a video. The only differences are that:

  • No fps value can be passed
  • The type is dicom
payload = {
    "items": [
        {
            "path": "/",
            "type": "dicom",
            "storage_key": "dicom_folder/my_dicom_series.dcm",
            "name": "my_dicom_series.dcm",
        }
    ],
    "dataset_slug": dataset_slug,
    "storage_slug": storage_name,
}

A series of DICOM slices can be uploaded as a sequence by registering them in the same slot_name. For example:

payload = {
    "items": [
        {
            "path": "/",
            "slots": [
                {
                    "slot_name": "0",
                    "type": "dicom",
                    "storage_key": "dicom_slices/slice_1.dcm",
                    "file_name": "slice_1.dcm",
                },
                {
                    "slot_name": "0",
                    "type": "dicom",
                    "storage_key": "dicom_slices/slice_2.dcm",
                    "file_name": "slice_2.dcm",
                },
                {
                    "slot_name": "0",
                    "type": "dicom",
                    "storage_key": "dicom_slices/slice_3.dcm",
                    "file_name": "slice_3.dcm",
                },
            ],
            "name": "my_dicom_series",
        }
    ],
    "dataset_slug": dataset_slug,
    "storage_slug": storage_name,
}

🚧

Uploading DICOM Slices as Series

When uploading DICOM slices as a sequence, the order that the slices appears is determined by the following file metadata in order of significance:

  • 1: SeriesNumber
  • 2: InstanceNumber
  • 3: SliceLocation
  • 4: ImagePositionPatient
  • 5: FileName

Additionally, all files passed as slices that contain more than 1 volume will be assigned their own slot.

If you'd prefer to override these behaviours by either:

  • 1: Forcing each .dcm file into the series as a slice, regardless of if it contains multiple volumes
  • 2: Forcing the series of slices to respect the order passed in the registration payload

You can do so by adding an optional argument to the base of the payload as follows:

payload = {
    "items": [
        ...
    ],
    "dataset_slug": dataset_slug,
    "storage_slug": storage_name,
    "options": {"ignore_dicom_layout": True}
}

Multi-Planar View

To register medical volumes and extract the axial, sagittal, and coronal views:

  • 1: Include the "extract_views": "true" payload field
  • 2: The specified slot_name must be 0
payload = {
     "items": [
          {
               "path": "/",
               "slots": [
                    {
                         "type": "dicom",
                         "slot_name": "0",
                         "storage_key": "001/slice1.dcm",
                         "file_name": "slice1.dcm",
                      	 "extract_views": "true"
                    }
               ],
               "name": "001.dcm"
          }
     ],
     "dataset_slug": dataset_slug,
     "storage_slug": storage_name,
}

Registration Through darwin-py

If you're using read-write registration, you can simplify item registration using the darwin-py SDK. Below is an example Python script demonstrating how to register single-slotted items with darwin-py:

from darwin.client import Client

# Define your storage keys
storage_keys = [
    "path/to/first/image.png",
    "path/to/second/image.png",
    "path/to/third/image.png",
]

# Populate your Darwin API key, team slug, target dataset slug, and storage configuration name in Darwin
API_KEY = "YOUR_API_KEY_HERE"
team_slug = "team_slug"
dataset_slug = "dataset_slug"
storage_config_name = "your_bucket_name"

# Retreive the dataset and connect to your bucket
client = Client.from_api_key(API_KEY)
dataset = client.get_remote_dataset(dataset_identifier=f"{team_slug}/{dataset_slug}")
my_storage_config = client.get_external_storage(name=storage_config_name, team_slug=team_slug)

# Register each storage key as a dataset item
results = dataset.register(my_storage_config, storage_keys)

# Optionally inspect the results of each item
print(results)

Note: The first step is to define your storage keys. These can be read in from a file, or returned from the SDK of your cloud provider (see below), but they must be structured as a list of strings.

By default, darwin-py will register every item in the root directory of the chosen dataset. You can recreate the folder structure defined by your storage keys in the Darwin dataset using the preserve_folders option:

results = dataset.register(my_storage_config, storage_keys, preserve_folders=True)

If you're registering videos, you can specify the FPS that frames should be sampled from each video at with the optional fps argument:

fps = 10
results = dataset.register(my_storage_config, storage_keys, fps=fps)

If you're registering DICOM volumes and wish to use multi-planar view, you can use the optional multi_planar_view argument:

results = dataset.register(my_storage_config, storage_keys, multi_planar_view=True)

If you want to register multi-slotted items, you can use the multi_slotted argument. Note that in this case your storage keys will need to be formatted as a dictionary of lists, where:

  • Each dictionary key is an item name
  • Each dictionary value is a list of storage keys for the item
storage_keys = {
    "item1": ["path/to/first/image.png", "path/to/second/image.png"],
    "item2": ["path/to/third/image.png", "path/to/fourth/image.png"],
    "item3": ["my/sample/image.png", "my/sample/video.mp4", "my/sample/pdf.pdf"]
}

results = dataset.register(my_storage_config, storage_keys, multi_slotted=True)

📘

Slot names & folder structures

If using darwin-py to register multi-slotted items, please note that:

  • Each slot will be given a name equivalent to the filename in the storage key
  • If using preserve_folders=True, the item will be registered in the dataset directory specified by the first storage key in each list

If you'd prefer to read your storage keys directly from your external storage, you can do so using your cloud provider's SDK. Below is an example showing how to get all storage keys from a specific AWS S3 bucket directory using AWS boto3:

import boto3

def list_keys_in_bucket(bucket_name):
    all_keys = []
    s3 = boto3.client("s3")
    paginator = s3.get_paginator("list_objects_v2")
    pages = paginator.paginate(Bucket=bucket_name, Prefix="my/bucket/directory/")
    for page in pages:
        for obj in page["Contents"]:
            key = obj["Key"]
            if not key.endswith("/"):
                all_keys.append(key)
    return all_keys
  
storage_keys = list_keys_in_bucket("s3-bucket-name")

Read-Only Registration

Registering any read-only file involves sending a POST request to the below API endpoint with a payload containing instructions for Darwin on where to access the item:

f"https://darwin.v7labs.com/api/v2/teams/{team_slug}/items/register_existing_readonly"

Please be aware that registering read-only items requires that:

  • 1: A thumbnail file for each item is generated and available in your external storage
  • 2: Video files have a set of high-quality and low-quality frames pre-extracted and available in your external storage

We recommend using mogrify for thumbnail generation:

> mogrify -resize "356x200>" -format jpg -quality 50 -write thumbnail.jpg large.png

❗️

Don't use Original Images as Thumbnails

It is strongly recommended that you don't use the originally image as your thumbnail. This can lead to CORS issues in some browsers, preventing access to the item.

The Basics

Below is a Python script covering the simplest case of registering a single image file as a dataset item in a dataset. A breakdown of the function of every field within is available below the script.

import requests

# Define constants
api_key = "your_api_key_here"
team_slug = "your-team-slug-here"
dataset_slug = "your-dataset-slug-here"
storage_name = "your-storage-bucket-name-here"

# Populate request headers
headers = {
    "Content-Type": "application/json",
    "Accept": "application/json",
    "Authorization": f"ApiKey {api_key}"
}

# Define registration payload
payload = {
    "items": [
        {
            "path": "/",
            "type": "image",
            "storage_key": "car_folder/car_1.png",
            "storage_thumbnail_key": "thumbnails/car_1_thumbnail.png",
            "height": 1080,
            "width": 1920,
            "name": "car_1.png",
        }
    ],
    "dataset_slug": dataset_slug,
    "storage_slug": storage_name,
}

# Send the request
response = requests.post(
  f"https://darwin.v7labs.com/api/v2/teams/{team_slug}/items/register_existing_readonly",
  headers=headers,
  json=payload
)

# Inspect the response for errors
body = response.json()
if response.status_code != 200:
    print("request failed", response.text)
elif 'blocked_items' in body and len(body['blocked_items']) > 0:
    print("failed to register items:")
    for item in body['blocked_items']:
        print("\t - ", item)
    if len(body['items']) > 0: print("successfully registered items:")
    for item in body['items']:
       print("\t - ", item)
else:
    print("success")
  • api_key: Your API key

  • team_slug: Your sluggified team name

  • dataset_slug: The sluggified name of the dataset to register the file in

  • storage_name: The name of your storage integration in your configuration. For example:

Payload-specific fields & concepts:

  • items: It's possible to register multiple items in the same request, therefore items is a list of dictionaries where each dictionary corresponds to one dataset item
  • path: The folder path within the Darwin dataset that this item should be registered at
  • type: The type of file being registered. It can be image, video, or pdf. This instructs us on how to treat the file so it can be viewed correctly
  • storage_key and storage_thumbnail_key: The exact file paths to the file and it's corresponding thumbnail in your external storage. This file path is case sensitive, cannot start with a forward slash, and is entered slightly differently depending on your cloud provider:
    • For AWS S3, exclude the bucket name. For example if the full path to your file is s3://example-bucket/darwin/sub_folder/example_image.jpg then your storage_key must be darwin/sub_folder/example_image.jpg
    • For Azure blobs, include the container name. For example if the full path to your file is https://myaccount.blob.core.windows.net/mycontainer/myblob.jpg then your storage_key must be mycontainer/darwin/sub_folder/myblob.jpg
    • For GCP Buckets, exclude the bucket name. For example if the full path to your file is gs://example-bucket/darwin/sub_folder/example_image.jpg, then your storage_key must be darwin/sub_folder/example_image.jpg
  • height and width: The exact height and width of the main image. If these are included incorrectly, then uploaded annotations will appear in the incorrect part of the screen or incorrectly scaled
  • name: The name of the resulting dataset item as it appears in Darwin. This can be any name you choose, but we strongly recommend giving files the same or similar names to the externally stored files

Registering Videos

Registering videos in read-only requires that sets of high-quality and low-quality frames are pre-extracted and available in your external storage. Low-quality frames are used during video playback, and high-quality frames are displayed for annotation when playback is paused.

This results in a registration payload structured as follows:

payload = {
    "items": [
        {
            "path": "/",
            "type": "video",
            "storage_key": "video_folder/car_video.mp4",
            "storage_thumbnail_key": "thumbnails/car_video_thumbnail.png",
            "name": "car_video.mp4",
            "sections": [
                {
                    "section_index": 1,
                    "height": 1080,
                    "width": 1920,
                    "storage_hq_key": "video_folder/car_video/frame_1_hq.png",
                    "storage_lq_key": "video_folder/car_video/frame_1_lq.png",
                },
                {
                    "section_index": 2,
                    "height": 1080,
                    "width": 1920,
                    "storage_hq_key": "video_folder/car_video/frame_2_hq.png",
                    "storage_lq_key": "video_folder/car_video/frame_2_lq.png",
                },
            ]
        }
    ],
    "dataset_slug": dataset_slug,
    "storage_slug": storage_name,
}

Registering files in Multiple Slots

If you need to display multiple files next to each other simultaneously, you'll need to register them in different slots. Please refer to this article to gain an understanding of the concept of slots.

To register a dataset with multiple slots from external storage, the registration payload changes in structure as follows:

payload = {
    "items": [
        {
            "path": "/",
            "slots": [
                {
                    "slot_name": "0",
                    "type": "image",
                    "storage_key": "car_folder/car_1.png",
                    "storage_thumbnail_key": "thumbnails/car_1_thumbnail.png",
                    "height": 1080,
                    "width": 1920,
                    "file_name": "car_1.png",
                },
                {
                    "slot_name": "1",
                    "type": "image",
                    "storage_key": "car_folder/car_2.png",
                    "storage_thumbnail_key": "thumbnails/car_2_thumbnail.png",
                    "height": 1080,
                    "width": 1920,
                    "file_name": "car_2.png",
                },
                {
                    "slot_name": "2",
                    "type": "video",
                    "storage_key": "video_folder/car_video.mp4",
                    "storage_thumbnail_key": "thumbnails/car_video_thumbnail.png",
                    "file_name": "cars.mp4",
                    "sections": [
                        {
                            "section_index": 1,
                            "height": 1080,
                            "width": 1920,
                            "storage_hq_key": "video_folder/car_video/frame_1_hq.png",
                            "storage_lq_key": "video_folder/car_video/frame_1_lq.png",
                        },
                        {
                            "section_index": 2,
                            "height": 1080,
                            "width": 1920,
                            "storage_hq_key": "video_folder/car_video/frame_2_hq.png",
                            "storage_lq_key": "video_folder/car_video/frame_2_lq.png",
                        },
                    ],
                },
            ],
            "name": "cars",
        }
    ],
    "dataset_slug": dataset_slug,
    "storage_slug": storage_name,
}

Important points are:

  • Because the dataset item now contains multiple files, we need to break the item up into separate slots each with a different slot_name. Slots can be named any string, so long as they are unique for items that need to go into separate slots
  • Each item in slots is given a new file_name field. This is distinct from the name field which will be the name of the resulting dataset item in Darwin. file_name should match the exact file name of the file in that slot (i.e. it should match the last part of storage_key)
  • No two files can be registered to the same slot
  • It's possible to register different filetypes in different slots within the same dataset item. For example, above we have 2 slots containing images and a third containing a video

Registering DICOM Files

Unlike when using read-write storage, DICOM (.dcm) files cannot be registered directly in read-only. Instead, DICOM slices and series must first be converted to images and stored in your external bucket. Individual slices can then be registered as image items, and series of slices can be registered as video items.