wildboar.datasets.outlier#
Module Contents#
Classes#
Base-class for outlier labelers |
|
KMeans labeler that assign an outlier label to the most deviating cluster |
|
Density based clustering labeler |
|
Labels the majority class as inliers |
|
Labels the minority class as the outlier |
|
Create a synthetic outlier detection dataset from a labeled classification dataset |
- class wildboar.datasets.outlier.OutlierLabeler#
Base-class for outlier labelers
- abstract fit(x, y=None)#
Fit the outlier labeler to the give samples
- Parameters:
x (array-like of shape (n_samples, n_timestep)) – The time series samples
y (array-like of shape (n_samples, ) optional) – The optional original labels
- abstract transform(x, y=None)#
Transform the labels of (a subset) of samples in x to inlier and outliers
- Parameters:
x (array-like of shape (n_samples, n_timestep)) – The time series samples
y (array-like of shape (n_samples, ), optional) – The optional original labels
- Returns:
x_new (array-like of shape (n_samples_new, n_timestep)) – The outlier and inlier samples
y_new (array-like of shape (n_samples_new, )) – The labels
- fit_transform(x, y=None)#
- class wildboar.datasets.outlier.KMeansLabeler(*, n_clusters=None, n_outliers=None, random_state=None)#
Bases:
OutlierLabelerKMeans labeler that assign an outlier label to the most deviating cluster
- k_means_#
The estimator for assigning points to the outlier class
- Type:
object
- outlier_cluster_#
The cluster index that is considered as outlier
- Type:
int
Warning
The implementation does not yet work as expected.
- fit(x, y=None)#
Fit the outlier labeler to the give samples
- Parameters:
x (array-like of shape (n_samples, n_timestep)) – The time series samples
y (array-like of shape (n_samples, ) optional) – The optional original labels
- fit_transform(x, y=None)#
- transform(x, y=None)#
Transform the labels of (a subset) of samples in x to inlier and outliers
- Parameters:
x (array-like of shape (n_samples, n_timestep)) – The time series samples
y (array-like of shape (n_samples, ), optional) – The optional original labels
- Returns:
x_new (array-like of shape (n_samples_new, n_timestep)) – The outlier and inlier samples
y_new (array-like of shape (n_samples_new, )) – The labels
- class wildboar.datasets.outlier.DensityLabeler(*, estimator=None, estimator_params=None)#
Bases:
OutlierLabelerDensity based clustering labeler
Labels samples as outliers if a density cluster algorithm fail to assign them to a cluster
- fit(x, y=None)#
Fit the outlier labeler to the give samples
- Parameters:
x (array-like of shape (n_samples, n_timestep)) – The time series samples
y (array-like of shape (n_samples, ) optional) – The optional original labels
- fit_transform(x, y=None)#
- transform(x, y=None)#
Transform the labels of (a subset) of samples in x to inlier and outliers
- Parameters:
x (array-like of shape (n_samples, n_timestep)) – The time series samples
y (array-like of shape (n_samples, ), optional) – The optional original labels
- Returns:
x_new (array-like of shape (n_samples_new, n_timestep)) – The outlier and inlier samples
y_new (array-like of shape (n_samples_new, )) – The labels
- class wildboar.datasets.outlier.MajorityLabeler(n_outliers=None, random_state=None)#
Bases:
OutlierLabelerLabels the majority class as inliers
- outlier_labels_#
The outlier labels
- Type:
ndarray
- fit(x, y=None)#
Fit the outlier labeler to the give samples
- Parameters:
x (array-like of shape (n_samples, n_timestep)) – The time series samples
y (array-like of shape (n_samples, ) optional) – The optional original labels
- transform(x, y=None)#
Transform the labels of (a subset) of samples in x to inlier and outliers
- Parameters:
x (array-like of shape (n_samples, n_timestep)) – The time series samples
y (array-like of shape (n_samples, ), optional) – The optional original labels
- Returns:
x_new (array-like of shape (n_samples_new, n_timestep)) – The outlier and inlier samples
y_new (array-like of shape (n_samples_new, )) – The labels
- class wildboar.datasets.outlier.MinorityLabeler(n_outliers=None, random_state=None)#
Bases:
OutlierLabelerLabels the minority class as the outlier
- outlier_label_#
The label of the outlier class
- Type:
object
- fit(x, y=None)#
Fit the outlier labeler to the give samples
- Parameters:
x (array-like of shape (n_samples, n_timestep)) – The time series samples
y (array-like of shape (n_samples, ) optional) – The optional original labels
- transform(x, y=None)#
Transform the labels of (a subset) of samples in x to inlier and outliers
- Parameters:
x (array-like of shape (n_samples, n_timestep)) – The time series samples
y (array-like of shape (n_samples, ), optional) – The optional original labels
- Returns:
x_new (array-like of shape (n_samples_new, n_timestep)) – The outlier and inlier samples
y_new (array-like of shape (n_samples_new, )) – The labels
- class wildboar.datasets.outlier.EmmottLabeler(n_outliers=None, *, confusion_estimator=None, difficulty_estimator=None, difficulty='simplest', scale=None, variation='tight', random_state=None)#
Bases:
OutlierLabelerCreate a synthetic outlier detection dataset from a labeled classification dataset using a method described by Emmott et.al. (2013).
The Emmott labeler can reliably label both binary and multiclass datasets. For binary datasets a random label is selected as the outlier class. For multiclass datasets a set of classes with maximal confusion (as measured by
confusion_estimatoris selected as outlier label. For each outlier sample thedifficulty_estimatorassigns a difficulty score which is digitized into ranges and selected according to thedifficultyparameters. Finally a sample of approximatelyn_outlieris selected either maximally dispersed or tight.- outlier_label_#
The class or collection of classes used as outliers
- Type:
object
- difficulty_estimator_#
The estimator used to assess the difficulty of outlier samples
- Type:
object
- confusion_estimator_#
The estimator used to asses the class confusion (only if n_classes > 2)
- Type:
object
- n_classes_#
The number of classes
- Type:
int
Notes
For multiclass datasets the Emmott labeler require the package networkx
For dispersed outlier selection the Emmott labeler require the package scikit-learn-extra
The difficulty parameters ‘simplest’ and ‘hardest’ are not described by Emmott et.al. (2013)
Warning
- n_outliers
The number of outliers returned is dependent on the difficulty setting and the available number of samples of the minority class. If the minority class does not contain sufficient number of samples of the desired difficulty, fewer than n_outliers may be returned.
References
- Emmott, A. F., Das, S., Dietterich, T., Fern, A., & Wong, W. K. (2013).
Systematic construction of anomaly detection benchmarks from real data. In Proceedings of the ACM SIGKDD workshop on outlier detection and description (pp. 16-21).
- fit(x, y=None)#
Fit the outlier labeler to the give samples
- Parameters:
x (array-like of shape (n_samples, n_timestep)) – The time series samples
y (array-like of shape (n_samples, ) optional) – The optional original labels
- fit_transform(x, y=None)#
- transform(x, y=None)#
Transform the labels of (a subset) of samples in x to inlier and outliers
- Parameters:
x (array-like of shape (n_samples, n_timestep)) – The time series samples
y (array-like of shape (n_samples, ), optional) – The optional original labels
- Returns:
x_new (array-like of shape (n_samples_new, n_timestep)) – The outlier and inlier samples
y_new (array-like of shape (n_samples_new, )) – The labels