Description of the "darwin dataset split" option from the CLI

Splits a LocalDataset using two strategies:

  • random
  • stratified

With the random strategy, DatasetItems are randomly assigned to the Validation, Test and Training partitions, while with the stratified one, the assignment will consider the type of Annotations and use this to create more balanced partitions.

> darwin dataset split dog-dataset -v 0.1 -t 0.2
Partition lists saved at /Users/john/.darwin/datasets/v7-john/dog-dataset/releases/latest/lists/145_20_41

By the end of the operation, the result of the operation for both strategies will be under the displayed folder.

Positional arguments:

  • dataset: Local dataset name to split.

Optional arguments:

  • -v VAL_PERCENTAGE, --val-percentage VAL_PERCENTAGE: Validation percentage.
  • -t TEST_PERCENTAGE, --test-percentage TEST_PERCENTAGE: Test percentage.
  • -s SEED, --seed SEED: Split seed.


Mandatory arguments

Even though -v and -t are marked as "Optional arguments", they are mandatory and the command will break unless you provide them.


Understanding Divisions

The name of the output folder tells you which division was made. For example, if the folder is 18_2_5, this means that:

  • 18 DatasetItems were used in Training
  • 2 DatasetItems were used in Validation
  • 5 DatasetItems were used in Testing

Watch it in action!


Next up

Having trouble understanding the docs? Have a refresh on V7 Concepts!