split
A CLI how-to guide. SDK power-users can refer to our full SDK docs generated from our source code here
Splits a LocalDataset
using two strategies:
- random
- stratified
With the random strategy, DatasetItems
are randomly assigned to the Validation, Test and Training partitions, while with the stratified one, the assignment will consider the type of Annotation
s and use this to create more balanced partitions.
> darwin dataset split dog-dataset -v 0.1 -t 0.2
Partition lists saved at /Users/john/.darwin/datasets/v7-john/dog-dataset/releases/latest/lists/145_20_41
By the end of the operation, the result of the operation for both strategies will be under the displayed folder.
Positional arguments:
dataset
: Local dataset name to split.
Optional arguments:
-v VAL_PERCENTAGE, --val-percentage VAL_PERCENTAGE
: Validation percentage.-t TEST_PERCENTAGE, --test-percentage TEST_PERCENTAGE
: Test percentage.-s SEED, --seed SEED
: Split seed.
Mandatory arguments
Even though
-v
and-t
are marked as "Optional arguments", they are mandatory and the command will break unless you provide them.
Understanding Divisions
The name of the output folder tells you which division was made. For example, if the folder is
18_2_5
, this means that:
- 18
DatasetItems
were used in Training- 2
DatasetItems
were used in Validation- 5
DatasetItems
were used in Testing
Watch it in action!
Updated 4 months ago
Having trouble understanding the docs? Have a refresh on V7 Concepts!