Registering Items From External Storage
After you've connected your external storage (AWS, Azure, GCP), you're ready to register files so they can be accessed in a Darwin dataset. This is done through our REST API, so you'll need an API key to continue. Steps to generate your own key are here.
Rate Limiting
Please note that API requests for external storage registration are rate limited which will result in HTTP 429 responses. Please implement appropriate retry and back-off strategies.
If your storage configuration is read-write, please see the section directly below for step-by-step instructions. Otherwise if using read-only, please navigate to the Read-Only Registration section further down.
Stuck? Check out our troubleshooting guide to resolve common errors:
Read-Write Registration
Registering any read-write file involves sending a POST request to the below API endpoint with a payload containing instructions for Darwin on where to access the item:
f"https://darwin.v7labs.com/api/v2/teams/{team_slug}/items/register_existing"
The Basics
Below is a Python script covering the simplest case of registering a single image file as a dataset item in a dataset. A breakdown of the function of every field within is available below the script.
import requests
# Define constants
api_key = "your-api-key-here"
team_slug = "your-team-slug-here"
dataset_slug = "your-dataset-slug-here"
storage_name = "your-storage-bucket-name-here"
# Populate request headers
headers = {
"Content-Type": "application/json",
"Accept": "application/json",
"Authorization": f"ApiKey {api_key}"
}
# Define registration payload
payload = {
"items": [
{
"path": "/",
"type": "image",
"storage_key": "car_folder/car_1.png",
"name": "car_1.png",
}
],
"dataset_slug": dataset_slug,
"storage_slug": storage_name,
}
# Send the request
response = requests.post(
f"https://darwin.v7labs.com/api/v2/teams/{team_slug}/items/register_existing",
headers=headers,
json=payload
)
# Inspect the response for errors
body = response.json()
if response.status_code != 200:
print("request failed", response.text)
elif 'blocked_items' in body and len(body['blocked_items']) > 0:
print("failed to register items:")
for item in body['blocked_items']:
print("\t - ", item)
if len(body['items']) > 0: print("successfully registered items:")
for item in body['items']:
print("\t - ", item)
else:
print("success")
-
api_key
: Your API key -
team_slug
: Your sluggified team name -
dataset_slug
: The sluggified name of the dataset to register the file in -
storage_name
: The name of your storage integration in your configuration. For example:
Payload-specific fields & concepts:
items
: It's possible to register multiple items in the same request, thereforeitems
is a list of dictionaries where each dictionary corresponds to one dataset item.path
: The folder path within the Darwin dataset that this item should be registered attype
: The type of file being registered. It can beimage
,video
,pdf
ordicom
. This instructs us on how to treat the file so it can be viewed correctly. Thetype
field can be omitted if you include the file extension as part of theslots.file_name
field. Please see here for further detailsstorage_key
: The exact file path to the file in your external storage. This file path is case sensitive, cannot start with a forward slash, and is entered slightly differently depending on your cloud provider:- For AWS S3, exclude the bucket name. For example if the full path to your file is
s3://example-bucket/darwin/sub_folder/example_image.jpg
then yourstorage_key
must bedarwin/sub_folder/example_image.jpg
- For Azure blobs, include the container name. For example if the full path to your file is
https://myaccount.blob.core.windows.net/mycontainer/sub_folder/myblob.jpg
then yourstorage_key
must bemycontainer/sub_folder/myblob.jpg
- For GCP Buckets, exclude the bucket name. For example if the full path to your file is
gs://example-bucket/darwin/sub_folder/example_image.jpg
, then yourstorage_key
must bedarwin/sub_folder/example_image.jpg
- For AWS S3, exclude the bucket name. For example if the full path to your file is
name
: The name of the resulting dataset item as it appears in Darwin. This can be any name you choose, but we strongly recommend giving files the same or similar names to the externally stored files
Every image uploaded in read-write will generate a thumbnail in your external storage at the location specified by your configured storage prefix. For example, in an AWS S3 bucket:
Registering Videos
Registering videos in read-write is nearly identical to registering images. The only difference is that you can specify an optional fps
parameter to determine the frequency that frames are sampled from the video at. If fps
is left out of the payload, then frames will be extracted at the video's native framerate.
This results in a slightly different payload to above as follows:
payload = {
"items": [
{
"path": "/",
"type": "video",
"fps": 5,
"storage_key": "video_folder/car_video.mp4",
"name": "car_video.mp4",
}
],
"dataset_slug": dataset_slug,
"storage_slug": storage_name,
}
Every video uploaded in read-write will generate a series of files in your external storage at the location specified by your configured storage prefix. These include high-quality and low-quality frames, as well as metadata necessary for video playback. Low-quality frames are used during video playback, and high-quality frames are displayed for annotation when playback is paused.
These are necessary to access the video file correctly in Darwin. For example, in an AWS S3 bucket:
Registering Files in Multiple Slots
If you need to display multiple files next to each other simultaneously, you'll need to register them in different slots. Please refer to this article to gain an understanding of the concept of slots.
To register a dataset with multiple slots from external storage, the registration payload changes in structure as follows:
payload = {
"items": [
{
"path": "/",
"slots": [
{
"slot_name": "0",
"type": "image",
"storage_key": "car_folder/car_1.png",
"file_name": "car_1.png",
},
{
"slot_name": "1",
"type": "image",
"storage_key": "car_folder/car_2.png",
"file_name": "car_2.png",
},
{
"slot_name": "3",
"type": "video",
"storage_key": "car_folder/cars.mp4",
"file_name": "cars.mp4",
},
],
"name": "cars",
}
],
"dataset_slug": dataset_slug,
"storage_slug": storage_name,
}
Important points are:
- Because the dataset item now contains multiple files, we need to break the item up into separate slots each with a different
slot_name
. Slots can be named any string, so long as they are unique for items that need to go into separate slots - Each item in
slots
is given a newfile_name
field. This is distinct from thename
field which will be the name of the resulting dataset item in Darwin.file_name
should match the exact file name of the file in that slot (i.e. it should match the last part ofstorage_key
). When includingfile_name
, if it correctly specifies the extension of the file in external storage, you can omit thetype
field. This is because withouttype
, our processing pipeline infers the filetype from the extension offile_name
- Only DICOM (
.dcm
) slices can be registered within the sameslot_name
, resulting in concatenation of those slices. Other file types must occupy their own slot. Please see the section below for further detail - It's possible to register different filetypes in different slots within the same dataset item. For example, above we have 2 slots containing images and a third containing a video
Registering DICOM Files
DICOM (.dcm
) files can be either individual slices, or a series of slices. A series of slices as a single .dcm
is registered similarly to a video. The only differences are that:
- No
fps
value can be passed - The
type
isdicom
payload = {
"items": [
{
"path": "/",
"type": "dicom",
"storage_key": "dicom_folder/my_dicom_series.dcm",
"name": "my_dicom_series.dcm",
}
],
"dataset_slug": dataset_slug,
"storage_slug": storage_name,
}
A series of DICOM slices can be uploaded as a sequence by registering them in the same slot_name
. For example:
payload = {
"items": [
{
"path": "/",
"slots": [
{
"slot_name": "0",
"type": "dicom",
"storage_key": "dicom_slices/slice_1.dcm",
"file_name": "slice_1.dcm",
},
{
"slot_name": "0",
"type": "dicom",
"storage_key": "dicom_slices/slice_2.dcm",
"file_name": "slice_2.dcm",
},
{
"slot_name": "0",
"type": "dicom",
"storage_key": "dicom_slices/slice_3.dcm",
"file_name": "slice_3.dcm",
},
],
"name": "my_dicom_series",
}
],
"dataset_slug": dataset_slug,
"storage_slug": storage_name,
}
Uploading DICOM Slices as Series
When uploading DICOM slices as a sequence, the order that the slices appears is determined by the following file metadata in order of significance:
- 1:
SeriesNumber
- 2:
InstanceNumber
- 3:
SliceLocation
- 4:
ImagePositionPatient
- 5:
FileName
Additionally, all files passed as slices that contain more than 1 volume will be assigned their own slot.
If you'd prefer to override these behaviours by either:
- 1: Forcing each
.dcm
file into the series as a slice, regardless of if it contains multiple volumes- 2: Forcing the series of slices to respect the order passed in the registration payload
You can do so by adding an optional argument to the base of the payload as follows:
payload = { "items": [ ... ], "dataset_slug": dataset_slug, "storage_slug": storage_name, "options": {"ignore_dicom_layout": True} }
Multi-Planar View
To register medical volumes and extract the axial, sagittal, and coronal views:
- 1: Include the
"extract_views": "true"
payload field - 2: The specified
slot_name
must be0
payload = {
"items": [
{
"path": "/",
"slots": [
{
"type": "dicom",
"slot_name": "0",
"storage_key": "001/slice1.dcm",
"file_name": "slice1.dcm",
"extract_views": "true"
}
],
"name": "001.dcm"
}
],
"dataset_slug": dataset_slug,
"storage_slug": storage_name,
}
Registration Through darwin-py
If you're using read-write registration, you can simplify item registration using the darwin-py SDK. Below is an example Python script demonstrating how to register single-slotted items with darwin-py:
from darwin.client import Client
# Define your storage keys
storage_keys = [
"path/to/first/image.png",
"path/to/second/image.png",
"path/to/third/image.png",
]
# Populate your Darwin API key, team slug, target dataset slug, and storage configuration name in Darwin
API_KEY = "YOUR_API_KEY_HERE"
team_slug = "team_slug"
dataset_slug = "dataset_slug"
storage_config_name = "your_bucket_name"
# Retreive the dataset and connect to your bucket
client = Client.from_api_key(API_KEY)
dataset = client.get_remote_dataset(dataset_identifier=f"{team_slug}/{dataset_slug}")
my_storage_config = client.get_external_storage(name=storage_config_name, team_slug=team_slug)
# Register each storage key as a dataset item
results = dataset.register(my_storage_config, storage_keys)
# Optionally inspect the results of each item
print(results)
Note: The first step is to define your storage keys. These can be read in from a file, or returned from the SDK of your cloud provider (see below), but they must be structured as a list of strings.
By default, darwin-py will register every item in the root directory of the chosen dataset. You can recreate the folder structure defined by your storage keys in the Darwin dataset using the preserve_folders
option:
results = dataset.register(my_storage_config, storage_keys, preserve_folders=True)
If you're registering videos, you can specify the FPS that frames should be sampled from each video at with the optional fps
argument:
fps = 10
results = dataset.register(my_storage_config, storage_keys, fps=fps)
If you're registering DICOM volumes and wish to use multi-planar view, you can use the optional multi_planar_view
argument:
results = dataset.register(my_storage_config, storage_keys, multi_planar_view=True)
If you want to register multi-slotted items, you can use the multi_slotted
argument. Note that in this case your storage keys will need to be formatted as a dictionary of lists, where:
- Each dictionary key is an item name
- Each dictionary value is a list of storage keys for the item
storage_keys = {
"item1": ["path/to/first/image.png", "path/to/second/image.png"],
"item2": ["path/to/third/image.png", "path/to/fourth/image.png"],
"item3": ["my/sample/image.png", "my/sample/video.mp4", "my/sample/pdf.pdf"]
}
results = dataset.register(my_storage_config, storage_keys, multi_slotted=True)
Slot names & folder structures
If using darwin-py to register multi-slotted items, please note that:
- Each slot will be given a name equivalent to the filename in the storage key
- If using
preserve_folders=True
, the item will be registered in the dataset directory specified by the first storage key in each list
If you'd prefer to read your storage keys directly from your external storage, you can do so using your cloud provider's SDK. Below is an example showing how to get all storage keys from a specific AWS S3 bucket directory using AWS boto3
:
import boto3
def list_keys_in_bucket(bucket_name):
all_keys = []
s3 = boto3.client("s3")
paginator = s3.get_paginator("list_objects_v2")
pages = paginator.paginate(Bucket=bucket_name, Prefix="my/bucket/directory/")
for page in pages:
for obj in page["Contents"]:
key = obj["Key"]
if not key.endswith("/"):
all_keys.append(key)
return all_keys
storage_keys = list_keys_in_bucket("s3-bucket-name")
Read-Only Registration
Registering any read-only file involves sending a POST request to the below API endpoint with a payload containing instructions for Darwin on where to access the item:
f"https://darwin.v7labs.com/api/v2/teams/{team_slug}/items/register_existing_readonly"
Please be aware that registering read-only items requires that:
- 1: A thumbnail file for each item is generated and available in your external storage
- 2: Video files have a set of high-quality and low-quality frames pre-extracted and available in your external storage
We recommend using mogrify
for thumbnail generation:
> mogrify -resize "356x200>" -format jpg -quality 50 -write thumbnail.jpg large.png
Don't use Original Images as Thumbnails
It is strongly recommended that you don't use the originally image as your thumbnail. This can lead to CORS issues in some browsers, preventing access to the item.
The Basics
Below is a Python script covering the simplest case of registering a single image file as a dataset item in a dataset. A breakdown of the function of every field within is available below the script.
import requests
# Define constants
api_key = "your_api_key_here"
team_slug = "your-team-slug-here"
dataset_slug = "your-dataset-slug-here"
storage_name = "your-storage-bucket-name-here"
# Populate request headers
headers = {
"Content-Type": "application/json",
"Accept": "application/json",
"Authorization": f"ApiKey {api_key}"
}
# Define registration payload
payload = {
"items": [
{
"path": "/",
"type": "image",
"storage_key": "car_folder/car_1.png",
"storage_thumbnail_key": "thumbnails/car_1_thumbnail.png",
"height": 1080,
"width": 1920,
"name": "car_1.png",
}
],
"dataset_slug": dataset_slug,
"storage_slug": storage_name,
}
# Send the request
response = requests.post(
f"https://darwin.v7labs.com/api/v2/teams/{team_slug}/items/register_existing_readonly",
headers=headers,
json=payload
)
# Inspect the response for errors
body = response.json()
if response.status_code != 200:
print("request failed", response.text)
elif 'blocked_items' in body and len(body['blocked_items']) > 0:
print("failed to register items:")
for item in body['blocked_items']:
print("\t - ", item)
if len(body['items']) > 0: print("successfully registered items:")
for item in body['items']:
print("\t - ", item)
else:
print("success")
-
api_key
: Your API key -
team_slug
: Your sluggified team name -
dataset_slug
: The sluggified name of the dataset to register the file in -
storage_name
: The name of your storage integration in your configuration. For example:
Payload-specific fields & concepts:
items
: It's possible to register multiple items in the same request, thereforeitems
is a list of dictionaries where each dictionary corresponds to one dataset itempath
: The folder path within the Darwin dataset that this item should be registered attype
: The type of file being registered. It can beimage
,video
, orpdf
. This instructs us on how to treat the file so it can be viewed correctlystorage_key
andstorage_thumbnail_key
: The exact file paths to the file and it's corresponding thumbnail in your external storage. This file path is case sensitive, cannot start with a forward slash, and is entered slightly differently depending on your cloud provider:- For AWS S3, exclude the bucket name. For example if the full path to your file is
s3://example-bucket/darwin/sub_folder/example_image.jpg
then yourstorage_key
must bedarwin/sub_folder/example_image.jpg
- For Azure blobs, include the container name. For example if the full path to your file is
https://myaccount.blob.core.windows.net/mycontainer/myblob.jpg
then yourstorage_key
must bemycontainer/darwin/sub_folder/myblob.jpg
- For GCP Buckets, exclude the bucket name. For example if the full path to your file is
gs://example-bucket/darwin/sub_folder/example_image.jpg
, then yourstorage_key
must bedarwin/sub_folder/example_image.jpg
- For AWS S3, exclude the bucket name. For example if the full path to your file is
height
andwidth
: The exact height and width of the main image. If these are included incorrectly, then uploaded annotations will appear in the incorrect part of the screen or incorrectly scaledname
: The name of the resulting dataset item as it appears in Darwin. This can be any name you choose, but we strongly recommend giving files the same or similar names to the externally stored files
Registering Videos
Registering videos in read-only requires that sets of high-quality and low-quality frames are pre-extracted and available in your external storage. Low-quality frames are used during video playback, and high-quality frames are displayed for annotation when playback is paused.
This results in a registration payload structured as follows:
payload = {
"items": [
{
"path": "/",
"type": "video",
"storage_key": "video_folder/car_video.mp4",
"storage_thumbnail_key": "thumbnails/car_video_thumbnail.png",
"name": "car_video.mp4",
"sections": [
{
"section_index": 1,
"height": 1080,
"width": 1920,
"storage_hq_key": "video_folder/car_video/frame_1_hq.png",
"storage_lq_key": "video_folder/car_video/frame_1_lq.png",
},
{
"section_index": 2,
"height": 1080,
"width": 1920,
"storage_hq_key": "video_folder/car_video/frame_2_hq.png",
"storage_lq_key": "video_folder/car_video/frame_2_lq.png",
},
]
}
],
"dataset_slug": dataset_slug,
"storage_slug": storage_name,
}
Registering files in Multiple Slots
If you need to display multiple files next to each other simultaneously, you'll need to register them in different slots. Please refer to this article to gain an understanding of the concept of slots.
To register a dataset with multiple slots from external storage, the registration payload changes in structure as follows:
payload = {
"items": [
{
"path": "/",
"slots": [
{
"slot_name": "0",
"type": "image",
"storage_key": "car_folder/car_1.png",
"storage_thumbnail_key": "thumbnails/car_1_thumbnail.png",
"height": 1080,
"width": 1920,
"file_name": "car_1.png",
},
{
"slot_name": "1",
"type": "image",
"storage_key": "car_folder/car_2.png",
"storage_thumbnail_key": "thumbnails/car_2_thumbnail.png",
"height": 1080,
"width": 1920,
"file_name": "car_2.png",
},
{
"slot_name": "2",
"type": "video",
"storage_key": "video_folder/car_video.mp4",
"storage_thumbnail_key": "thumbnails/car_video_thumbnail.png",
"file_name": "cars.mp4",
"sections": [
{
"section_index": 1,
"height": 1080,
"width": 1920,
"storage_hq_key": "video_folder/car_video/frame_1_hq.png",
"storage_lq_key": "video_folder/car_video/frame_1_lq.png",
},
{
"section_index": 2,
"height": 1080,
"width": 1920,
"storage_hq_key": "video_folder/car_video/frame_2_hq.png",
"storage_lq_key": "video_folder/car_video/frame_2_lq.png",
},
],
},
],
"name": "cars",
}
],
"dataset_slug": dataset_slug,
"storage_slug": storage_name,
}
Important points are:
- Because the dataset item now contains multiple files, we need to break the item up into separate slots each with a different
slot_name
. Slots can be named any string, so long as they are unique for items that need to go into separate slots - Each item in
slots
is given a newfile_name
field. This is distinct from thename
field which will be the name of the resulting dataset item in Darwin.file_name
should match the exact file name of the file in that slot (i.e. it should match the last part ofstorage_key
) - No two files can be registered to the same slot
- It's possible to register different filetypes in different slots within the same dataset item. For example, above we have 2 slots containing images and a third containing a video
Registering DICOM Files
Unlike when using read-write storage, DICOM (.dcm
) files cannot be registered directly in read-only. Instead, DICOM slices and series must first be converted to images and stored in your external bucket. Individual slices can then be registered as image
items, and series of slices can be registered as video
items.
Updated 6 months ago