`wildboar.model_selection`#

Methods for model selection.

Package Contents#

Classes#

RepeatedOutlierSplit

Repeated random outlier cross-validator.

Functions#

outlier_train_test_split(x, y, normal_class[, ...])

Outlier training and testing split from classification dataset.

class wildboar.model_selection.RepeatedOutlierSplit(n_splits=None, *, test_size=0.2, n_outlier=0.05, shuffle=True, random_state=None)[source]#

Repeated random outlier cross-validator.

Parameters:

n_splitsint, optional: The maximum number of splits. - if None, the number of splits is determined by the number of outliers as, total_n_outliers/(n_inliers * n_outliers) - if int, the number of splits is an upper-bound.
test_sizefloat, optional: The size of the test set.
n_outlierfloat, optional: The fraction of outliers in the training and test sets.
shufflebool, optional: Shuffle the training indicies in each iteration.
random_stateint or RandomState, optional: The psudo-random number generator.

Notes

Contrary to other cross-validation strategies, the random outlier cross-validator does not ensure that all folds will be different. Instead, the inlier samples are shuffled and new outlier samples are inserted in the training and test sets repeatedly.

get_n_splits(X, y, groups=None)[source]#

Return the number of splitting iterations in the cross-validator.

Parameters:

Xobject: The samples.
yobject: The labels.
groupsobject, optional: Always ignored, exists for compatibility.

Returns:

int: Returns the number of splitting iterations in the cross-validator.

split(x, y, groups=None)[source]#

Return training and test indicies.

Parameters:

xobject: Always ignored, exists for compatibility.
yobject: The labels.
groupsobject, optional: Always ignored, exists for compatibility.

Yields:

train_idx, test_idxndarray: The training and test indicies

wildboar.model_selection.outlier_train_test_split(x, y, normal_class, test_size=0.2, anomalies_train_size=0.05, random_state=None)[source]#

Outlier training and testing split from classification dataset.

Parameters:

xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dim, n_timestep): Input data samples.
yarray-like of shape (n_samples,): Input class label.
normal_classint: Class label that should be considered as the normal class.
test_sizefloat, optional: Size of the test set.
anomalies_train_sizefloat, optional: Contamination of anomalies in the training dataset.
random_stateint or RandomState, optional: Psudo random state used for stable results.

Returns:

x_trainarray-like: Training samples.
x_testarray-like: Test samples.
y_trainarray-like: Training labels (either 1 or -1, where 1 denotes normal and -1 anomalous).
y_testarray-like: Test labels (either 1 or -1, where 1 denotes normal and -1 anomalous).

Examples

>>> from wildboar.datasets import load_two_lead_ecg
>>> x, y = load_two_lead_ecg()
>>> x_train, x_test, y_train, y_test = train_test_split(
...     x, y, 1, test_size=0.2, anomalies_train_size=0.05
... )

wildboar.model_selection#

Package Contents#

Classes#

Functions#

This Page

`wildboar.model_selection`#