As mentioned before, Datasets are collections of items (videos or images) that you can then act upon.
Therefore it is important to identify and access these collections so you can work with them.
Datasets are identified by a string, comprised of your team and dataset slugs: team-slug/dataset-slug
.
The team slug is the team's name with everything lower-case, removed specials characters and spaces replaced by dashes, e.g., bird-species
. This string is unique across v7. The dataset slug is the same, except that its slug is only unique within the team it belongs to.
For example, if my team is "v7" and I created a dataset called "my first dataset", its final identifier would be: v7/my-first-dataset
.
With the SDK's Client
module, you can access datasets and their information with the following methods:
- list_remote_datasets
- list_local_datasets
- get_remote_dataset
1. def list_remote_datasets(self, team: Optional[str] = None) -> Iterator[RemoteDataset]:
def list_remote_datasets(self, team: Optional[str] = None) -> Iterator[RemoteDataset]:
Returns an iterator with all the remote datasets you are currently authenticated for.
from darwin.client import Client
# Authenticate
client = Client.local()
# Create a remote dataset
dataset = client.create_dataset('my-first-dataset')
print(dataset.slug)
#List remote datasets
remote_datasets = client.list_remote_datasets()
for dataset in remote_datasets:
print(dataset.slug)
Do have in mind that created datasets are RemoteDataset
s and that invoking list_local_datasets
after creating one will not return it in the list. Instead you should use list_remote_datasets
.
2. def list_local_datasets(self, team: Optional[str] = None) -> Iterator[Path]:
def list_local_datasets(self, team: Optional[str] = None) -> Iterator[Path]:
List the local datasets on your machine.
Local datasets are folders inside your ~/.darwin/datasets/
folder.
To be considered a local dataset folder, the folders must have a name comprised as the identifier of the dataset (team-slug/dataset-slug
) and they must contain two additional folders inside:
releases
which contains releasesimages
which contains the data you want to work on
The following example is a valid local dataset folder:
~/.darwin/datasets/
└── andreas-team
└── cats-dataset
├── images
└── releases
from darwin.client import Client
# Authenticate
client = Client.local()
#List local datasets
local_datasets = client.list_local_datasets()
for dataset in local_datasets:
print(dataset.name)
3. def get_remote_dataset(self, dataset_identifier: Union[str, DatasetIdentifier]) -> RemoteDataset:
def get_remote_dataset(self, dataset_identifier: Union[str, DatasetIdentifier]) -> RemoteDataset:
Returns the remote dataset with the given identifier.
from darwin.client import Client
# Authenticate
client = Client.local()
# Create a remote dataset
dataset = client.create_dataset('my-first-dataset')
print(dataset.name)
#List remote datasets
dataset = client.get_remote_dataset('my-team-slug/my-first-dataset')
print(dataset.slug)