wildboar.ensemble#

Package Contents#

Classes#

BaggingClassifier

Base class for all estimators in scikit-learn.

BaggingRegressor

Base class for all estimators in scikit-learn.

BaseBagging

Base class for all estimators in scikit-learn.

ExtraShapeletTreesClassifier

An ensemble of extremely random shapelet trees for time series regression.

ExtraShapeletTreesRegressor

An ensemble of extremely random shapelet trees for time series regression.

IntervalForestClassifier

An ensemble of interval tree classifiers.

IntervalForestRegressor

An ensemble of interval tree regressors.

IsolationShapeletForest

A isolation shapelet forest.

PivotForestClassifier

An ensemble of interval tree classifiers.

ProximityForestClassifier

A forest of proximity trees

RocketForestClassifier

An ensemble of rocket tree classifiers.

RocketForestRegressor

An ensemble of rocket tree regressors.

ShapeletForestClassifier

An ensemble of random shapelet tree classifiers.

ShapeletForestEmbedding

An ensemble of random shapelet trees

ShapeletForestRegressor

An ensemble of random shapelet regression trees.

class wildboar.ensemble.BaggingClassifier(base_estimator=None, n_estimators=10, *, max_samples=1.0, bootstrap=True, oob_score=False, class_weight=None, warm_start=False, n_jobs=None, random_state=None, verbose=0)[source]#

Bases: BaseBagging, sklearn.ensemble.BaggingClassifier

Base class for all estimators in scikit-learn.

Notes

All estimators should specify all the parameters that can be set at the class level in their __init__ as explicit keyword arguments (no *args or **kwargs).

fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:
  • X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

  • y (array-like of shape (n_samples,)) – The target values (class labels in classification, real numbers in regression).

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.

Returns:

self – Fitted estimator.

Return type:

object

class wildboar.ensemble.BaggingRegressor(base_estimator=None, n_estimators=100, *, max_samples=1.0, bootstrap=True, oob_score=False, warm_start=False, n_jobs=None, random_state=None, verbose=0)[source]#

Bases: BaseBagging, sklearn.ensemble.BaggingRegressor

Base class for all estimators in scikit-learn.

Notes

All estimators should specify all the parameters that can be set at the class level in their __init__ as explicit keyword arguments (no *args or **kwargs).

fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:
  • X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

  • y (array-like of shape (n_samples,)) – The target values (class labels in classification, real numbers in regression).

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.

Returns:

self – Fitted estimator.

Return type:

object

class wildboar.ensemble.BaseBagging(base_estimator=None, n_estimators=10, *, max_samples=1.0, bootstrap=True, oob_score=False, warm_start=False, n_jobs=None, random_state=None, verbose=0)[source]#

Bases: wildboar.base.BaseEstimator, sklearn.ensemble._bagging.BaseBagging

Base class for all estimators in scikit-learn.

Notes

All estimators should specify all the parameters that can be set at the class level in their __init__ as explicit keyword arguments (no *args or **kwargs).

fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:
  • X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

  • y (array-like of shape (n_samples,)) – The target values (class labels in classification, real numbers in regression).

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.

Returns:

self – Fitted estimator.

Return type:

object

class wildboar.ensemble.ExtraShapeletTreesClassifier(n_estimators=100, *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, metric='euclidean', metric_params=None, criterion='entropy', oob_score=False, bootstrap=True, warm_start=False, class_weight=None, n_jobs=None, random_state=None)[source]#

Bases: BaseShapeletForestClassifier

An ensemble of extremely random shapelet trees for time series regression.

Examples

>>> from wildboar.ensemble import ExtraShapeletTreesClassifier
>>> from wildboar.datasets import load_synthetic_control
>>> x, y = load_synthetic_control()
>>> f = ExtraShapeletTreesClassifier(n_estimators=100, metric='scaled_euclidean')
>>> f.fit(x, y)
>>> y_hat = f.predict(x)

Construct a extra shapelet trees classifier.

Parameters:
  • n_estimators (int, optional) – The number of estimators

  • bootstrap (bool, optional) – Use bootstrap sampling to fit the base estimators

  • n_jobs (int, optional) – The number of processor cores used for fitting the ensemble

  • min_shapelet_size (float, optional) – The minimum shapelet size to sample

  • max_shapelet_size (float, optional) – The maximum shapelet size to sample

  • min_samples_split (int, optional) – The minimum samples required to split the decision trees

  • min_samples_leaf (int, optional) – The minimum number of samples in a leaf

  • criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split

  • min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value

  • warm_start (bool, optional) – When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.

  • metric ({'euclidean', 'scaled_euclidean', 'scaled_dtw'}, optional) – Set the metric used to compute the distance between shapelet and time series

  • metric_params (dict, optional) – Parameters passed to the metric construction

  • class_weight (dict or "balanced", optional) –

    Weights associated with the labels

    • if dict, weights on the form {label: weight}

    • if “balanced” each class weight inversely proportional to the class frequency

    • if None, each class has equal weight

  • random_state (int or RandomState) –

    • If int, random_state is the seed used by the random number generator

    • If RandomState instance, random_state is the random number generator

    • If None, the random number generator is the RandomState instance used by np.random.

class wildboar.ensemble.ExtraShapeletTreesRegressor(n_estimators=100, *, max_depth=None, min_samples_split=2, min_shapelet_size=0, max_shapelet_size=1, metric='euclidean', metric_params=None, criterion='squared_error', oob_score=False, bootstrap=True, warm_start=False, n_jobs=None, random_state=None)[source]#

Bases: BaseShapeletForestRegressor

An ensemble of extremely random shapelet trees for time series regression.

Examples

>>> from wildboar.ensemble import ExtraShapeletTreesRegressor
>>> from wildboar.datasets import load_synthetic_control
>>> x, y = load_synthetic_control()
>>> f = ExtraShapeletTreesRegressor(n_estimators=100, metric='scaled_euclidean')
>>> f.fit(x, y)
>>> y_hat = f.predict(x)

Construct a extra shapelet trees regressor.

Parameters:
  • n_estimators (int, optional) – The number of estimators

  • bootstrap (bool, optional) – Use bootstrap sampling to fit the base estimators

  • n_jobs (int, optional) – The number of processor cores used for fitting the ensemble

  • min_shapelet_size (float, optional) – The minimum shapelet size to sample

  • max_shapelet_size (float, optional) – The maximum shapelet size to sample

  • min_samples_split (int, optional) – The minimum samples required to split the decision trees

  • warm_start (bool, optional) – When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.

  • metric ({'euclidean', 'scaled_euclidean', 'scaled_dtw'}, optional) – Set the metric used to compute the distance between shapelet and time series

  • metric_params (dict, optional) – Parameters passed to the metric construction

  • random_state (int or RandomState) –

    • If int, random_state is the seed used by the random number generator

    • If RandomState instance, random_state is the random number generator

    • If None, the random number generator is the RandomState instance used by np.random.

class wildboar.ensemble.IntervalForestClassifier(n_estimators=100, *, n_intervals='sqrt', intervals='fixed', summarizer='auto', sample_size=0.5, min_size=0.0, max_size=1.0, oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0, criterion='entropy', bootstrap=True, warm_start=False, n_jobs=None, class_weight=None, random_state=None)[source]#

Bases: BaseForestClassifier

An ensemble of interval tree classifiers.

class wildboar.ensemble.IntervalForestRegressor(n_estimators=100, *, n_intervals='sqrt', intervals='fixed', summarizer='auto', sample_size=0.5, min_size=0.0, max_size=1.0, oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0, criterion='squared_error', bootstrap=True, warm_start=False, n_jobs=None, random_state=None)[source]#

Bases: BaseForestRegressor

An ensemble of interval tree regressors.

class wildboar.ensemble.IsolationShapeletForest(*, n_shapelets=1, n_estimators=100, bootstrap=False, n_jobs=None, min_shapelet_size=0, max_shapelet_size=1, min_samples_split=2, max_samples='auto', contamination='auto', warm_start=False, metric='euclidean', metric_params=None, random_state=None)[source]#

Bases: sklearn.base.OutlierMixin, ForestMixin, BaseBagging

A isolation shapelet forest.

New in version 0.3.5.

offset_[source]#

The offset for computing the final decision

Type:

float

Examples

>>> from wildboar.ensemble import IsolationShapeletForest
>>> from wildboar.datasets import load_two_lead_ecg
>>> from wildboar.model_selection.outlier import train_test_split
>>> from sklearn.metrics import balanced_accuracy_score
>>> x, y = load_two_lead_ecg()
>>> x_train, x_test, y_train, y_test = train_test_split(
...    x, y, 1, test_size=0.2, anomalies_train_size=0.05
... )
>>> f = IsolationShapeletForest(
...     n_estimators=100, contamination=balanced_accuracy_score
... )
>>> f.fit(x_train, y_train)
>>> y_pred = f.predict(x_test)
>>> balanced_accuracy_score(y_test, y_pred)

Or using default offset threshold

>>> from wildboar.ensemble import IsolationShapeletForest
>>> from wildboar.datasets import load_two_lead_ecg
>>> from wildboar.model_selection.outlier import train_test_split
>>> from sklearn.metrics import balanced_accuracy_score
>>> f = IsolationShapeletForest()
>>> x, y = load_two_lead_ecg()
>>> x_train, x_test, y_train, y_test = train_test_split(
...     x, y, 1, test_size=0.2, anomalies_train_size=0.05
... )
>>> f.fit(x_train)
>>> y_pred = f.predict(x_test)
>>> balanced_accuracy_score(y_test, y_pred)

Construct a shapelet isolation forest

Parameters:
  • n_estimators (int, optional) – The number of estimators

  • bootstrap (bool, optional) – Use bootstrap sampling to fit the base estimators

  • n_jobs (int, optional) – The number of processor cores used for fitting the ensemble

  • min_shapelet_size (float, optional) – The minimum shapelet size to sample

  • max_shapelet_size (float, optional) – The maximum shapelet size to sample

  • min_samples_split (int, optional) – The minimum samples required to split the decision trees

  • max_samples ("auto", float or int, optional) – The number of samples to draw to train each base estimator

  • contamination ('auto' or float, optional) –

    The strategy for computing the offset (see offset_)

    • if ‘auto’, offset_=-0.5

    • if float offset_ is computed as the c:th percentile of scores.

    If bootstrap=True, out-of-bag samples are used for computing the scores.

  • warm_start (bool, optional) – When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.

  • metric ({'euclidean', 'scaled_euclidean', 'scaled_dtw'}, optional) – Set the metric used to compute the distance between shapelet and time series

  • metric_params (dict, optional) – Parameters passed to the metric construction

  • random_state (int or RandomState) –

    • If int, random_state is the seed used by the random number generator

    • If RandomState instance, random_state is the random number generator

    • If None, the random number generator is the RandomState instance used by np.random.

decision_function(x)[source]#
fit(x, y=None, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:
  • X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

  • y (array-like of shape (n_samples,)) – The target values (class labels in classification, real numbers in regression).

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.

Returns:

self – Fitted estimator.

Return type:

object

predict(x)[source]#
score_samples(x)[source]#
class wildboar.ensemble.PivotForestClassifier(n_estimators=100, *, n_pivot='sqrt', metrics='all', oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0, criterion='entropy', bootstrap=True, warm_start=False, n_jobs=None, class_weight=None, random_state=None)[source]#

Bases: BaseForestClassifier

An ensemble of interval tree classifiers.

class wildboar.ensemble.ProximityForestClassifier(n_estimators=100, *, n_pivot=1, pivot_sample='label', metric_sample='weighted', metric_factories='default', oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0, criterion='entropy', bootstrap=True, warm_start=False, n_jobs=None, class_weight=None, random_state=None)[source]#

Bases: BaseForestClassifier

A forest of proximity trees

References

Lucas, Benjamin, Ahmed Shifaz, Charlotte Pelletier, Lachlan O’Neill, Nayyar Zaidi, Bart Goethals, François Petitjean, and Geoffrey I. Webb. (2019)

Proximity forest: an effective and scalable distance-based classifier for time series. Data Mining and Knowledge Discovery

class wildboar.ensemble.RocketForestClassifier(n_estimators=100, *, n_kernels=10, oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, sampling='normal', sampling_params=None, kernel_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, criterion='entropy', bootstrap=True, warm_start=False, class_weight=None, n_jobs=None, random_state=None)[source]#

Bases: BaseForestClassifier

An ensemble of rocket tree classifiers.

class wildboar.ensemble.RocketForestRegressor(n_estimators=100, *, n_kernels=10, oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, sampling='normal', sampling_params=None, kernel_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, criterion='squared_error', bootstrap=True, warm_start=False, n_jobs=None, random_state=None)[source]#

Bases: BaseForestRegressor

An ensemble of rocket tree regressors.

class wildboar.ensemble.ShapeletForestClassifier(n_estimators=100, *, n_shapelets='warn', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, alpha=None, metric='euclidean', metric_params=None, criterion='entropy', oob_score=False, bootstrap=True, warm_start=False, class_weight=None, n_jobs=None, random_state=None)[source]#

Bases: BaseShapeletForestClassifier

An ensemble of random shapelet tree classifiers.

Examples

>>> from wildboar.ensemble import ShapeletForestClassifier
>>> from wildboar.datasets import load_synthetic_control
>>> x, y = load_synthetic_control()
>>> f = ShapeletForestClassifier(n_estimators=100, metric='scaled_euclidean')
>>> f.fit(x, y)
>>> y_hat = f.predict(x)

Shapelet forest classifier.

Parameters:
  • n_estimators (int, optional) – The number of estimators

  • n_shapelets (int, optional) – The number of shapelets to sample at each node

  • bootstrap (bool, optional) – Use bootstrap sampling to fit the base estimators

  • n_jobs (int, optional) – The number of processor cores used for fitting the ensemble

  • min_shapelet_size (float, optional) – The minimum shapelet size to sample

  • max_shapelet_size (float, optional) – The maximum shapelet size to sample

  • alpha (float, optional) –

    Dynamically decrease the number of sampled shapelets at each node according to the current depth.

    • if \(alpha < 0\), the number of sampled shapelets decrease from n_shapelets towards 1 with increased depth.

    • if \(alpha > 0\), the number of sampled shapelets increase from 1 towards n_shapelets with increased depth.

    • if None, the number of sampled shapelets are the same independeth of depth.

  • min_samples_split (int, optional) – The minimum samples required to split the decision trees

  • min_samples_leaf (int, optional) – The minimum number of samples in a leaf

  • criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split

  • min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value

  • warm_start (bool, optional) – When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.

  • metric ({'euclidean', 'scaled_euclidean', 'scaled_dtw'}, optional) – Set the metric used to compute the distance between shapelet and time series

  • metric_params (dict, optional) – Parameters passed to the metric construction

  • oob_score (bool, optional) – Compute out-of-bag estimates of the ensembles performance.

  • class_weight (dict or "balanced", optional) –

    Weights associated with the labels

    • if dict, weights on the form {label: weight}

    • if “balanced” each class weight inversely proportional to the class frequency

    • if None, each class has equal weight

  • random_state (int or RandomState) –

    • If int, random_state is the seed used by the random number generator

    • If RandomState instance, random_state is the random number generator

    • If None, the random number generator is the RandomState instance used by np.random.

class wildboar.ensemble.ShapeletForestEmbedding(n_estimators=100, *, n_shapelets=1, max_depth=5, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, metric='euclidean', metric_params=None, criterion='squared_error', bootstrap=True, warm_start=False, n_jobs=None, sparse_output=True, random_state=None)[source]#

Bases: BaseShapeletForestRegressor

An ensemble of random shapelet trees

An unsupervised transformation of a time series dataset to a high-dimensional sparse representation. A time series i indexed by the leaf that it falls into. This leads to a binary coding of a time series with as many ones as trees in the forest.

The dimensionality of the resulting representation is <= n_estimators * 2^max_depth

Parameters:
  • n_estimators (int, optional) – The number of estimators

  • bootstrap (bool, optional) – Use bootstrap sampling to fit the base estimators

  • n_jobs (int, optional) – The number of processor cores used for fitting the ensemble

  • min_shapelet_size (float, optional) – The minimum shapelet size to sample

  • max_shapelet_size (float, optional) – The maximum shapelet size to sample

  • min_samples_split (int, optional) – The minimum samples required to split the decision trees

  • warm_start (bool, optional) – When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.

  • metric ({'euclidean', 'scaled_euclidean', 'scaled_dtw'}, optional) – Set the metric used to compute the distance between shapelet and time series

  • metric_params (dict, optional) – Parameters passed to the metric construction

  • sparse_output (bool, optional) – Return a sparse CSR-matrix.

  • random_state (int or RandomState) –

    • If int, random_state is the seed used by the random number generator

    • If RandomState instance, random_state is the random number generator

    • If None, the random number generator is the RandomState instance used by np.random.

fit(x, y=None, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:
  • X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

  • y (array-like of shape (n_samples,)) – The target values (class labels in classification, real numbers in regression).

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.

Returns:

self – Fitted estimator.

Return type:

object

fit_transform(x, y=None, sample_weight=None)[source]#
transform(x)[source]#
class wildboar.ensemble.ShapeletForestRegressor(n_estimators=100, *, n_shapelets='warn', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, alpha=None, metric='euclidean', metric_params=None, criterion='squared_error', oob_score=False, bootstrap=True, warm_start=False, n_jobs=None, random_state=None)[source]#

Bases: BaseShapeletForestRegressor

An ensemble of random shapelet regression trees.

Examples

>>> from wildboar.ensemble import ShapeletForestRegressor
>>> from wildboar.datasets import load_synthetic_control
>>> x, y = load_synthetic_control()
>>> f = ShapeletForestRegressor(n_estimators=100, metric='scaled_euclidean')
>>> f.fit(x, y)
>>> y_hat = f.predict(x)

Shapelet forest regressor.

Parameters:
  • n_estimators (int, optional) – The number of estimators

  • n_shapelets (int, optional) – The number of shapelets to sample at each node

  • bootstrap (bool, optional) – Use bootstrap sampling to fit the base estimators

  • n_jobs (int, optional) – The number of processor cores used for fitting the ensemble

  • min_shapelet_size (float, optional) – The minimum shapelet size to sample

  • max_shapelet_size (float, optional) – The maximum shapelet size to sample

  • alpha (float, optional) –

    Dynamically decrease the number of sampled shapelets at each node according to the current depth.

    • if \(alpha < 0\), the number of sampled shapelets decrease from n_shapelets towards 1 with increased depth.

    • if \(alpha > 0\), the number of sampled shapelets increase from 1 towards n_shapelets with increased depth.

    • if None, the number of sampled shapelets are the same independeth of depth.

  • min_samples_split (int, optional) – The minimum samples required to split the decision trees

  • warm_start (bool, optional) – When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.

  • metric ({'euclidean', 'scaled_euclidean', 'scaled_dtw'}, optional) – Set the metric used to compute the distance between shapelet and time series

  • metric_params (dict, optional) – Parameters passed to the metric construction

  • oob_score (bool, optional) – Compute out-of-bag estimates of the ensembles performance.

  • random_state (int or RandomState) –

    • If int, random_state is the seed used by the random number generator

    • If RandomState instance, random_state is the random number generator

    • If None, the random number generator is the RandomState instance used by np.random.