Exports/Releases from Datasets (SDK)

Once you have a Dataset ready, you will probably want to download your images/videos from it.
The first step to do that is to create an Export.

Exports are immutable snapshots of all the completed images for a given Dataset at the time the Export was created.

This means that if you create an Export now, and your Dataset has no completed images, your export will be empty. As a result, when you try to download your images, you will get nothing.

Later on, if you add more images or if you complete them later, to download them, you will need to create a new Export.

In this section we will focus on:

  • export
  • get_releases
  • get_release
  • pull
  • export_annotation

By end of the section you should be able to understand:

  • What Exports/Releases are,
  • How to create Exports/Releases and what they contain,
  • How to list Exports/Releases,
  • How to access a specific Export/Release
  • How to download your files from an Export/Release
  • How to convert an annotation_export from v7's proprietary format to other formats

Creates an Export/Release with the given name.
Can optionally filter by annotation_class and it can also optional include tokens in the image's url to allow access to other people that are not part of the team.

from darwin.client import Client

# Authenticate
client = Client.local()

# Get remote dataset
dataset = client.get_remote_dataset('my-team-slug/my-dataset-slug')

# Create an export
release_name = 'all-cars-v3'
release = dataset.export(name=release_name)

Get a sorted list of releases with the most recent first.

from darwin.client import Client

# Authenticate
client = Client.local()

# Get remote dataset
dataset = client.get_remote_dataset('my-team-slug/my-dataset-slug')

# Access all releases
all_releases = dataset.get_releases()
for release in all_releases:
    print(release.name)

Get a specific release for this dataset, gets the latest release by default.

from darwin.client import Client

# Authenticate
client = Client.local()

# Get remote dataset
dataset = client.get_remote_dataset('my-team-slug/my-dataset-slug')

# Access the latest release
release = dataset.get_release()
print(release.name)

4. pull

Downloads a remote project (images and annotations) in the datasets directory at ~/.darwin/datasets/.

from darwin.client import Client

# Authenticate
client = Client.local()


# Get remote dataaset
dataset = client.get_remote_dataset('my-team-slug/my-dataset-slug')

# Get latest release
release = dataset.get_release()

# Download the completed files from the given release
dataset.pull(release=release)

Once you have an Export/Release created, you can convert it to other formats. Currently the SDK supports conversion from v7's proprietary format to the following formats:

  • coco
  • cvat
  • dataloop
  • instance_mask
  • pascalvoc
  • semantic_mask

The conversion file in inside your ~/.darwin/datasets folder, like bellow:

/home/user/.darwin/datasets/my-team-slug/my-dataset-slug/
β”œβ”€β”€ images
β”‚Β Β  β”œβ”€β”€ SampleVideo_1280x720_2mb
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 0000000.png
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 0000001.png
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 0000002.png
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 0000003.png
β”‚Β Β  β”‚Β Β  └── 0000004.png
β”‚Β Β  └── SampleVideo_1280x720_2mb.mp4
└── releases
    β”œβ”€β”€ my-release
    β”‚Β Β  β”œβ”€β”€ annotations
    β”‚Β Β  β”‚Β Β  └── SampleVideo_1280x720_2mb.json
    β”‚Β Β  └── lists
    β”‚Β Β      β”œβ”€β”€ classes_bounding_box.txt
    β”‚Β Β      └── classes_polygon.txt
    └── latest -> /home/user/.darwin/datasets/my-team-slug/my-dataset-slug/releases/my-release

Following is an example of converting the annotations from a small video to the coco.json format. Be mindful that you cannot use the ~character to represent your home directory:

from darwin.exporter import export_annotations, get_exporter
from pathlib import Path

output_dir = Path('/home/user/Workplace/my_python_app/')
files = [
    Path('/home/user/.darwin/datasets/my-team-slug/my-dataset-slug/releases/my-release/annotations/SampleVideo_1280x720_2mb.json')
]

parser = get_exporter('coco')
export_annotations(parser, files, output_dir)

When opening the /home/user/Workplace/my_python_app/ directory you can not see an output.json file with the conversion done.

Importing data from other formats

The Darwin SDK does not allow you to convert data from other formats into V7's proprietary format, however you can import it directly.
We currently support importing data from the following formats:

  • coco
  • csvtags
  • csvtagsvideo
  • dataloop
  • pascalvoc
  • labelbox
  • superannotate
from darwin.importer.formats.pascalvoc import parse_file
from pathlib import Path

content = """
<root>
    <filename>image.jpg</filename>
    <object>
        <name>Class</name>
        <bndbox>
            <xmin>10</xmin>
            <xmax>10</xmax>
            <ymin>10</ymin>
            <ymax>10</ymax>
        </bndbox>
    </object>
</root>
"""

file = open("pascalvoc.xml", "w")
file.write(content)
file.close()

file_path = Path('pascalvoc.xml')

annotation = parse_file(file_path)
print(annotation)

Once you have the Annotation Class, you can then manipulate it within the Dataset as we will see in section Annotation Classes.

If you have a folder with several annotations there is a more efficient way to import them into your dataset. The import_annotations function allows that.

import darwin.importer as importer
from darwin.client import Client
from darwin.importer import get_importer
from pathlib import Path

# Authenticate
client = Client.local()

# Get the remote dataset
dataset = client.get_remote_dataset('my-team-slug/my-dataset-slug')

format_name = 'darwin'
annotation_paths = [Path('folder/the_annotation.json')]

parser = get_importer(format_name)
importer.import_annotations(dataset, parser, annotation_paths, False)

πŸ“˜

Good to know:

It is important to mention that the paths in the annotation files you are importing must be the same as the path in the V7 platform. So if you have pushed images while using the path parameter, make sure it matches.