wildboar.model_selection#

Methods for model selection.

Package Contents#

Classes#

RepeatedOutlierSplit

Repeated random outlier cross-validator.

Functions#

outlier_train_test_split(x, y, normal_class[, ...])

Outlier training and testing split from classification dataset.

class wildboar.model_selection.RepeatedOutlierSplit(n_splits=None, *, test_size=0.2, n_outlier=0.05, shuffle=True, random_state=None)[source]#

Repeated random outlier cross-validator.

Parameters:
n_splitsint, optional

The maximum number of splits. - if None, the number of splits is determined by the number of outliers as, total_n_outliers/(n_inliers * n_outliers) - if int, the number of splits is an upper-bound.

test_sizefloat, optional

The size of the test set.

n_outlierfloat, optional

The fraction of outliers in the training and test sets.

shufflebool, optional

Shuffle the training indicies in each iteration.

random_stateint or RandomState, optional

The psudo-random number generator.

Notes

Contrary to other cross-validation strategies, the random outlier cross-validator does not ensure that all folds will be different. Instead, the inlier samples are shuffled and new outlier samples are inserted in the training and test sets repeatedly.

get_n_splits(X, y, groups=None)[source]#

Return the number of splitting iterations in the cross-validator.

Parameters:
Xobject

The samples.

yobject

The labels.

groupsobject, optional

Always ignored, exists for compatibility.

Returns:
int

Returns the number of splitting iterations in the cross-validator.

split(x, y, groups=None)[source]#

Return training and test indicies.

Parameters:
xobject

Always ignored, exists for compatibility.

yobject

The labels.

groupsobject, optional

Always ignored, exists for compatibility.

Yields:
train_idx, test_idxndarray

The training and test indicies

wildboar.model_selection.outlier_train_test_split(x, y, normal_class, test_size=0.2, anomalies_train_size=0.05, random_state=None)[source]#

Outlier training and testing split from classification dataset.

Parameters:
xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dim, n_timestep)

Input data samples.

yarray-like of shape (n_samples,)

Input class label.

normal_classint

Class label that should be considered as the normal class.

test_sizefloat, optional

Size of the test set.

anomalies_train_sizefloat, optional

Contamination of anomalies in the training dataset.

random_stateint or RandomState, optional

Psudo random state used for stable results.

Returns:
x_trainarray-like

Training samples.

x_testarray-like

Test samples.

y_trainarray-like

Training labels (either 1 or -1, where 1 denotes normal and -1 anomalous).

y_testarray-like

Test labels (either 1 or -1, where 1 denotes normal and -1 anomalous).

Examples

>>> from wildboar.datasets import load_two_lead_ecg
>>> x, y = load_two_lead_ecg()
>>> x_train, x_test, y_train, y_test = train_test_split(
...     x, y, 1, test_size=0.2, anomalies_train_size=0.05
... )