Upload data to a dataset (Python)

Now that you know how to manage datasets and how to download data from them, we are going to cover the reverse operation, which is uploading data to a dataset.

By end of the section you should be able to understand:

  • How to upload an image to a dataset

1. def push(self, files_to_upload: Optional[List[Union[str, Path, LocalFile]]], *, blocking: bool = True, multi_threaded: bool = True, fps: int = 0, as_frames: bool = False, files_to_exclude: Optional[List[Union[str, Path]]] = None, path: Optional[str] = None, preserve_folders: bool = False, progress_callback: Optional[ProgressCallback] = None, file_upload_callback: Optional[FileUploadCallback] = None):

Uploads the given files to the given dataset.

Takes as optional parameters:

  • files_to_upload: List of files to upload. Those can be folders.
  • blocking: If False, the dataset is not uploaded and a generator function is returned instead.
  • multi_threaded: Uses multiprocessing to upload the dataset in parallel. If blocking is False this has no effect.
  • files_to_exclude: Optional list of files to exclude from the file scan. Those can be folders.
  • fps: When the uploading file is a video, specify its framerate.
  • as_frames: When the uploading file is a video, specify whether it's going to be uploaded as a list of frames.
  • path: Optional path to store the files in.
  • preserve_folders: Specify whether or not to preserve folder paths when uploading
  • progress_callback: Optional callback, called every time the progress of an uploading files is reported.
  • file_upload_callback: Optional callback, called every time a file chunk is uploaded.
from darwin.client import Client
from pathlib import Path

# Authenticate
client = Client.local()


# Get the dataset
dataset = client.get_remote_dataset('my-team-slug/my-dataset-slug')

files = [
    Path('/home/user/new_images/my-image.jpg')
]

# Push the files to the dataset
handler = dataset.push(files)

#Print handler information
print(handler.blocked_items, handler.pending_items, handler.errors)

The push function returns a handler. This handler has three important parts:

  • blocked_items: repeated items are marked as blocked and are not uploaded
  • pending_items: items that have not yet been uploaded but are in the queue
  • errors: any errors that may have occurred when uploading the files

When an upload fails it is retried every 2 seconds for 5 times. After this, if it still fails, the file wont be uploaded.