Now that you know how to manage datasets and how to download data from them, we are going to cover the reverse operation, which is uploading data to a dataset.
By end of the section you should be able to understand:
- How to upload an image to a dataset
1. def push(self, files_to_upload: Optional[List[Union[str, Path, LocalFile]]], *, blocking: bool = True, multi_threaded: bool = True, fps: int = 0, as_frames: bool = False, files_to_exclude: Optional[List[Union[str, Path]]] = None, path: Optional[str] = None, preserve_folders: bool = False, progress_callback: Optional[ProgressCallback] = None, file_upload_callback: Optional[FileUploadCallback] = None):
def push(self, files_to_upload: Optional[List[Union[str, Path, LocalFile]]], *, blocking: bool = True, multi_threaded: bool = True, fps: int = 0, as_frames: bool = False, files_to_exclude: Optional[List[Union[str, Path]]] = None, path: Optional[str] = None, preserve_folders: bool = False, progress_callback: Optional[ProgressCallback] = None, file_upload_callback: Optional[FileUploadCallback] = None):
Uploads the given files to the given dataset.
Takes as optional parameters:
- files_to_upload: List of files to upload. Those can be folders.
- blocking: If False, the dataset is not uploaded and a generator function is returned instead.
- multi_threaded: Uses multiprocessing to upload the dataset in parallel. If blocking is False this has no effect.
- files_to_exclude: Optional list of files to exclude from the file scan. Those can be folders.
- fps: When the uploading file is a video, specify its framerate.
- as_frames: When the uploading file is a video, specify whether it's going to be uploaded as a list of frames.
- path: Optional path to store the files in.
- preserve_folders: Specify whether or not to preserve folder paths when uploading
- progress_callback: Optional callback, called every time the progress of an uploading files is reported.
- file_upload_callback: Optional callback, called every time a file chunk is uploaded.
from darwin.client import Client
from pathlib import Path
# Authenticate
client = Client.local()
# Get the dataset
dataset = client.get_remote_dataset('my-team-slug/my-dataset-slug')
files = [
Path('/home/user/new_images/my-image.jpg')
]
# Push the files to the dataset
handler = dataset.push(files)
#Print handler information
print(handler.blocked_items, handler.pending_items, handler.errors)
The push function returns a handler
. This handler has three important parts:
- blocked_items: repeated items are marked as blocked and are not uploaded
- pending_items: items that have not yet been uploaded but are in the queue
- errors: any errors that may have occurred when uploading the files
When an upload fails it is retried every 2 seconds for 5 times. After this, if it still fails, the file wont be uploaded.