Import Visual Data with Python
How to upload or import images, videos or any other supported visual file into V7 Darwin for dataset and annotation workflow management.
Uploading your local visual data is the first step to enable for both dataset management and annotation functionalities on V7 Darwin.
In this guide, we list all the supported methods to import visual data to Darwin, by taking advantage of V7 Darwin Python bindings SDK.
Example: Upload 10 local images to an existing Dataset, tagging and organizing them in appropriate folders
Imagine you have a local folder on your laptop's /Users/darwin/Desktop
directory, including images you'd like to import to an existing Dataset on V7 Darwin. Your Team is called Life and your Dataset is called Animals, and typically contains images of animals, organized by species into folders and tagged differently depending on the particular breed.
Your goal is to upload these cats and dogs local images to V7 Darwin successfully, organized in the right folders and correctly tagged.
Before being able to write and run your Python script, we need three additional values: an API Key, and your Team and Dataset slugged names.
How to find your Team and Dataset slugged names with your API Key
You need an API Key and your Team and Dataset slugged names from now on. Once you generate an API Key, learn how to find your Team and Dataset slugged name using the CLI here.
from pathlib import Path
from darwin.client import Client
from darwin.dataset.upload_manager import LocalFile, UploadHandlerV2
# Your personal API Key, needed to authenticate into your V7 Darwin Team.
API_KEY = "..."
# The directory where the data you want to import is stored.
DATA_DIRECTORY = Path.home() / "Desktop"
# Initialize the V7 Darwin Client and RemoteDataset objects.
client = Client.from_api_key(API_KEY)
dataset = client.get_remote_dataset("life/animals")
# Point to the files you wish to upload.
local_files = [
LocalFile(DATA_DIRECTORY / "cat_1.jpg", path="cats", tags=["shorthair"]),
LocalFile(DATA_DIRECTORY / "cat_2.jpg", path="cats", tags=["ragdoll"]),
LocalFile(DATA_DIRECTORY / "cat_3.jpg", path="cats", tags=["persian"]),
LocalFile(DATA_DIRECTORY / "cat_4.jpg", path="cats", tags=["sphynx"]),
LocalFile(DATA_DIRECTORY / "cat_5.jpg", path="cats", tags=["unknown"]),
LocalFile(DATA_DIRECTORY / "dog_1.jpg", path="dogs", tags=["labrador"]),
LocalFile(DATA_DIRECTORY / "dog_2.jpg", path="dogs", tags=["unknown"]),
LocalFile(DATA_DIRECTORY / "dog_3.jpg", path="dogs", tags=["german-shepherd"]),
LocalFile(DATA_DIRECTORY / "dog_4.jpg", path="dogs", tags=["beagle"]),
LocalFile(DATA_DIRECTORY / "dog_5.jpg", path="dogs", tags=["bulldog"]),
]
# Upload your files to your remote dataset.
upload_handler = UploadHandlerV2(dataset, local_files)
upload_handler.upload()
If everything goes as expected, you will see your files populating the specified dataset.
In this example, we even specified folders to organize you files in.
Error handling and monitoring of the uploading files
UploadHandler
keeps track of all the files you want to upload, and tracks their status throughout the upload process. Let's see how you can count files in each stage.
print(f"\nUploaded {upload_handler.total_count} files to {DATASET_IDENTIFIER}.\n")
print(f"{upload_handler.blocked_count} files can't be uploaded. Here's a breakdown:")
for item in upload_handler.blocked_items:
print(f"{item.path}\t{item.filename}\t{item.reason}")
print(f"\n{upload_handler.error_count} files can't be uploaded because an error occurred in one of the upload stages:")
for error in upload_handler.errors:
print(f"{error.file_path}\t{error.stage}")
Note that the most common reasons for a file to be blocked is that it already has been uploaded to the same path and dataset under the same name, or specifying tags that do not exist for that dataset.
Track live upload progress
Let's see how we can use a progress library to keep track of the upload progress. We use the rich library in this example, but any other progress API would do.
In order to get progress to work, you need to implement a progress_callback
function, that basically dictates how your progress updates.
Once that's defined, you then need to pass progress_callback
to the upload
method of your UploadHandler
object.
from rich.live import Live
from rich.progress_bar import ProgressBar
pbar = ProgressBar(total=len(local_files))
def progress_callback(_total_file_count: int, file_advancement: int):
pbar.completed += file_advancement
with Live(pbar):
upload_handler.upload(progress_callback=progress_callback)
Updated about 1 year ago