Export your data
An SDK how-to guide. SDK power-users can refer to our full SDK docs generated from our source code here
Once your data is annotated, you'll likely want to export it. To do so, you need to create and download an export. Here's a quick refresher of what generating a release looks like in the UI:
View releases
If you've already generated one or more releases in V7, you can view each them using the following CLI command:
darwin dataset releases [DATASET_NAME]
You can generate a new release using the snippet below:
release_name = "name_of_export_here"
dataset.export(release_name)
This release will contain all of the completed images and videos within your dataset. You can also filter for specific classes within your dataset by adding the names of your annotation classes:
dataset.export(release_name, annotation_class_ids=[...])
From there, you can specify which release you'd like to pull:
release_name = "name_of_export_here"
try:
release = dataset.get_release(release_name)
except NotFound:
print(f"Dataset release {release_name} not found")
Once you have generated your release object, it's time to pull that release. This will pull all completed images and videos and their annotations. Note that currently, the SDK only supports pulling releases in the Darwin JSON 2.0 format:
dataset.pull(release=release)
Multi-Processing Errors
By default,
pull()
uses the Python multiprocessing library to significant increase the speed of the download. Because the multiprocessing library imports and runs the script that invoked it when spawning processes, it's necessary to protect all code that invokes it in anif __name__ == "__main__"
block. Otherwise processes will be spawned in a loop.if __name__ == "__main__": dataset.pull(release=release, multi_processed=True)
Note that there may be a few seconds delays between the exporting and pulling.
Waiting for Release Creation
It may be necessary to factor in the release creation time (typically a few seconds) before attempting to pull the release. Otherwise, you may run into a release not found exception.
You can also copy the pre-populated command above to your clipboard by clicking the copy icon for any release from the GUI:
You can pull just the annotations by adding the only_annotators
argument.
dataset.pull(release=release, only_annotations=True)
If your dataset has multiple folders, you can keep that structure by using the use_folders
argument:
dataset.pull(release=release, use_folders=True)
Finally, if you're exporting video, you can choose to either pull it as a video, or as individual frames. By default, the parameter video_frames
is false, but if it's set as true, your video will be pulled with each frame as its own image:
dataset.pull(release=release, video_frames=True)
Video frame extraction
When you upload videos to Darwin at a non-native framerate, specific video frames are extracted and displayed. When pulling videos uploaded at a non-native framerate, please pass
video_frames=True
to guarantee a match between your annotations and the resulting frames.If you instead extract frames from the video files later on, you might experience a mismatch between annotations and frames
The release can also be downloaded using the below line instead of dataset.pull
release.download_zip(Path(f"./{release_name}.zip"))
By putting all the above together, the full code for locating, creating and pulling a release can be found on the recipes page below.
Updated 6 months ago