wildboar.model_selection._cv#

Module Contents#

Classes#

RepeatedOutlierSplit

Repeated random outlier cross-validator

class wildboar.model_selection._cv.RepeatedOutlierSplit(n_splits=None, *, test_size=0.2, n_outlier=0.05, shuffle=True, random_state=None)[source]#

Repeated random outlier cross-validator

Yields indicies that split the dataset into training and test sets.

Note

Contrary to other cross-validation strategies, the random outlier cross-validator does not ensure that all folds will be different. Instead, the inlier samples are shuffled and new outlier samples are inserted in the training and test sets repeatedly.

Parameters:
  • n_splits (int, optional) –

    The maximum number of splits.

    • if None, the number of splits is determined by the number of outliers as, total_n_outliers/(n_inliers * n_outliers)

    • if int, the number of splits is an upper-bound

  • test_size (float, optional) – The size of the test set.

  • n_outlier (float, optional) – The fraction of outliers in the training and test sets.

  • shuffle (bool, optional) – Shuffle the training indicies in each iteration.

  • random_state (int or RandomState, optional) – The psudo-random number generator

__repr__()[source]#

Return repr(self).

get_n_splits(X, y, groups=None)[source]#

Returns the number of splitting iterations in the cross-validator :param X: The samples :type X: object :param y: The labels :type y: object :param groups: Always ignored, exists for compatibility. :type groups: object

Returns:

n_splits – Returns the number of splitting iterations in the cross-validator.

Return type:

int

split(x, y, groups=None)[source]#

Return training and test indicies

Parameters:
  • x (object) – Always ignored, exists for compatibility.

  • y (object) – The labels

  • groups (object, optional) – Always ignored, exists for compatibility.

Yields:

train_idx, test_idx (ndarray) – The training and test indicies