wildboar.model_selection
#
Methods for model selection.
Package Contents#
Classes#
Repeated random outlier cross-validator. |
Functions#
|
Outlier training and testing split from classification dataset. |
- class wildboar.model_selection.RepeatedOutlierSplit(n_splits=None, *, test_size=0.2, n_outlier=0.05, shuffle=True, random_state=None)[source]#
Repeated random outlier cross-validator.
- Parameters:
- n_splitsint, optional
The maximum number of splits. - if None, the number of splits is determined by the number of outliers as, total_n_outliers/(n_inliers * n_outliers) - if int, the number of splits is an upper-bound.
- test_sizefloat, optional
The size of the test set.
- n_outlierfloat, optional
The fraction of outliers in the training and test sets.
- shufflebool, optional
Shuffle the training indicies in each iteration.
- random_stateint or RandomState, optional
The psudo-random number generator.
Notes
Contrary to other cross-validation strategies, the random outlier cross-validator does not ensure that all folds will be different. Instead, the inlier samples are shuffled and new outlier samples are inserted in the training and test sets repeatedly.
- get_n_splits(X, y, groups=None)[source]#
Return the number of splitting iterations in the cross-validator.
- Parameters:
- Xobject
The samples.
- yobject
The labels.
- groupsobject, optional
Always ignored, exists for compatibility.
- Returns:
- int
Returns the number of splitting iterations in the cross-validator.
- wildboar.model_selection.outlier_train_test_split(x, y, normal_class, test_size=0.2, anomalies_train_size=0.05, random_state=None)[source]#
Outlier training and testing split from classification dataset.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dim, n_timestep)
Input data samples.
- yarray-like of shape (n_samples,)
Input class label.
- normal_classint
Class label that should be considered as the normal class.
- test_sizefloat, optional
Size of the test set.
- anomalies_train_sizefloat, optional
Contamination of anomalies in the training dataset.
- random_stateint or RandomState, optional
Psudo random state used for stable results.
- Returns:
- x_trainarray-like
Training samples.
- x_testarray-like
Test samples.
- y_trainarray-like
Training labels (either 1 or -1, where 1 denotes normal and -1 anomalous).
- y_testarray-like
Test labels (either 1 or -1, where 1 denotes normal and -1 anomalous).
Examples
>>> from wildboar.datasets import load_two_lead_ecg >>> x, y = load_two_lead_ecg() >>> x_train, x_test, y_train, y_test = train_test_split( ... x, y, 1, test_size=0.2, anomalies_train_size=0.05 ... )