split

Description of the "darwin dataset split" option from the CLI

Splits a LocalDataset using two strategies:

  • random
  • stratified

With the random strategy, DatasetItems are randomly assigned t the Validation, Test and Training partitions, while with the stratified one, the assignment will consider the type of Annotations and use this to create more balanced partitions.

$darwin dataset split solar-panels -v 0.1 -t 0.2
Partition lists saved at /home/user/.darwin/datasets/v7-labs/solar-panels/releases/latest/lists/18_2_5

By the end of the operation, the result of the operation for both strategies will be under the displayed folder.

Positional arguments:

  • dataset: Local dataset name to split.

Optional arguments:

  • -v VAL_PERCENTAGE, --val-percentage VAL_PERCENTAGE: Validation percentage.
  • -t TEST_PERCENTAGE, --test-percentage TEST_PERCENTAGE: Test percentage.
  • -s SEED, --seed SEED: Split seed.

🚧

Mandatory arguments

Even though -v and -t are marked as "Optional arguments", they are mandatory and the command will break unless you provide them.

📘

Understanding Divisions

The name of the output folder tells you which division was made. For example, if the folder is 18_2_5, this means that:

  • 18 DatasetItems were used in Training
  • 2 DatasetItems were used in Validation
  • 5 DatasetItems were used in Testing

Watch it in action!


Next up

Having trouble understanding the docs? Have a refresh on V7 Concepts!