Once you have a Dataset ready, you will probably want to download your images/videos from it.
The first step to do that is to create an Export.
Exports are immutable snapshots of all the completed images for a given Dataset at the time the Export was created.
This means that if you create an Export now, and your Dataset has no completed images, your export will be empty. As a result, when you try to download your images, you will get nothing.
Later on, if you add more images or if you complete them later, to download them, you will need to create a new Export.
In this section we will focus on:
By end of the section you should be able to understand:
- What Exports/Releases are,
- How to create Exports/Releases and what they contain,
- How to list Exports/Releases,
- How to access a specific Export/Release
- How to download your files from an Export/Release
- How to convert an annotation_export from v7's proprietary format to other formats
Creates an Export/Release with the given name.
Can optionally filter by annotation_class and it can also optional include tokens in the image's url to allow access to other people that are not part of the team.
from darwin.client import Client # Authenticate client = Client.local() # Get remote dataset dataset = client.get_remote_dataset('my-team-slug/my-dataset-slug') # Create an export release_name = 'all-cars-v3' release = dataset.export(name=release_name)
Get a sorted list of releases with the most recent first.
from darwin.client import Client # Authenticate client = Client.local() # Get remote dataset dataset = client.get_remote_dataset('my-team-slug/my-dataset-slug') # Access all releases all_releases = dataset.get_releases() for release in all_releases: print(release.name)
Get a specific release for this dataset, gets the latest release by default.
from darwin.client import Client # Authenticate client = Client.local() # Get remote dataset dataset = client.get_remote_dataset('my-team-slug/my-dataset-slug') # Access the latest release release = dataset.get_release() print(release.name)
Downloads a remote project (images and annotations) in the datasets directory at
from darwin.client import Client # Authenticate client = Client.local() # Get remote dataaset dataset = client.get_remote_dataset('my-team-slug/my-dataset-slug') # Get latest release release = dataset.get_release() # Download the completed files from the given release dataset.pull(release=release)
Once you have an Export/Release created, you can convert it to other formats. Currently the SDK supports conversion from v7's proprietary format to the following formats:
The conversion file in inside your
~/.darwin/datasets folder, like bellow:
/home/user/.darwin/datasets/my-team-slug/my-dataset-slug/ ├── images │ ├── SampleVideo_1280x720_2mb │ │ ├── 0000000.png │ │ ├── 0000001.png │ │ ├── 0000002.png │ │ ├── 0000003.png │ │ └── 0000004.png │ └── SampleVideo_1280x720_2mb.mp4 └── releases ├── my-release │ ├── annotations │ │ └── SampleVideo_1280x720_2mb.json │ └── lists │ ├── classes_bounding_box.txt │ └── classes_polygon.txt └── latest -> /home/user/.darwin/datasets/my-team-slug/my-dataset-slug/releases/my-release
Following is an example of converting the annotations from a small video to the
coco.json format. Be mindful that you cannot use the
~character to represent your home directory:
from darwin.exporter import export_annotations, get_exporter from pathlib import Path output_dir = Path('/home/user/Workplace/my_python_app/') files = [ Path('/home/user/.darwin/datasets/my-team-slug/my-dataset-slug/releases/my-release/annotations/SampleVideo_1280x720_2mb.json') ] parser = get_exporter('coco') export_annotations(parser, files, output_dir)
When opening the
/home/user/Workplace/my_python_app/ directory you can not see an
output.json file with the conversion done.
The Darwin SDK does not allow you to convert data from other formats into V7's proprietary format, however you can import it directly.
We currently support importing data from the following formats:
from darwin.importer.formats.pascalvoc import parse_file from pathlib import Path content = """ <root> <filename>image.jpg</filename> <object> <name>Class</name> <bndbox> <xmin>10</xmin> <xmax>10</xmax> <ymin>10</ymin> <ymax>10</ymax> </bndbox> </object> </root> """ file = open("pascalvoc.xml", "w") file.write(content) file.close() file_path = Path('pascalvoc.xml') annotation = parse_file(file_path) print(annotation)
Once you have the Annotation Class, you can then manipulate it within the Dataset as we will see in section Annotation Classes.
If you have a folder with several annotations there is a more efficient way to import them into your dataset. The
import_annotations function allows that.
import darwin.importer as importer from darwin.client import Client from darwin.importer import get_importer from pathlib import Path # Authenticate client = Client.local() # Get the remote dataset dataset = client.get_remote_dataset('my-team-slug/my-dataset-slug') format_name = 'darwin' annotation_paths = [Path('folder/the_annotation.json')] parser = get_importer(format_name) importer.import_annotations(dataset, parser, annotation_paths, False)
Good to know:
It is important to mention that the paths in the annotation files you are importing must be the same as the path in the V7 platform. So if you have pushed images while using the
pathparameter, make sure it matches.