`wildboar.ensemble`#

Package Contents#

Classes#

`BaggingClassifier`	Base class for all estimators in scikit-learn.
`BaggingRegressor`	Base class for all estimators in scikit-learn.
`BaseBagging`	Base class for all estimators in scikit-learn.
`ExtraShapeletTreesClassifier`	An ensemble of extremely random shapelet trees for time series regression.
`ExtraShapeletTreesRegressor`	An ensemble of extremely random shapelet trees for time series regression.
`IntervalForestClassifier`	An ensemble of interval tree classifiers.
`IntervalForestRegressor`	An ensemble of interval tree regressors.
`IsolationShapeletForest`	A isolation shapelet forest.
`PivotForestClassifier`	An ensemble of interval tree classifiers.
`ProximityForestClassifier`	A forest of proximity trees
`RocketForestClassifier`	An ensemble of rocket tree classifiers.
`RocketForestRegressor`	An ensemble of rocket tree regressors.
`ShapeletForestClassifier`	An ensemble of random shapelet tree classifiers.
`ShapeletForestEmbedding`	An ensemble of random shapelet trees
`ShapeletForestRegressor`	An ensemble of random shapelet regression trees.

class wildboar.ensemble.BaggingClassifier(base_estimator=None, n_estimators=10, *, max_samples=1.0, bootstrap=True, oob_score=False, class_weight=None, warm_start=False, n_jobs=None, random_state=None, verbose=0)[source]#

Bases: BaseBagging, sklearn.ensemble.BaggingClassifier

Base class for all estimators in scikit-learn.

Notes

All estimators should specify all the parameters that can be set at the class level in their __init__ as explicit keyword arguments (no *args or **kwargs).

fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:

X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
y (array-like of shape (n_samples,)) – The target values (class labels in classification, real numbers in regression).
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.

Returns:

self – Fitted estimator.

Return type:

object

class wildboar.ensemble.BaggingRegressor(base_estimator=None, n_estimators=100, *, max_samples=1.0, bootstrap=True, oob_score=False, warm_start=False, n_jobs=None, random_state=None, verbose=0)[source]#

Bases: BaseBagging, sklearn.ensemble.BaggingRegressor

Base class for all estimators in scikit-learn.

Notes

All estimators should specify all the parameters that can be set at the class level in their __init__ as explicit keyword arguments (no *args or **kwargs).

fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:

X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
y (array-like of shape (n_samples,)) – The target values (class labels in classification, real numbers in regression).
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.

Returns:

self – Fitted estimator.

Return type:

object

class wildboar.ensemble.BaseBagging(base_estimator=None, n_estimators=10, *, max_samples=1.0, bootstrap=True, oob_score=False, warm_start=False, n_jobs=None, random_state=None, verbose=0)[source]#

Bases: wildboar.base.BaseEstimator, sklearn.ensemble._bagging.BaseBagging

Base class for all estimators in scikit-learn.

Notes

All estimators should specify all the parameters that can be set at the class level in their __init__ as explicit keyword arguments (no *args or **kwargs).

fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:

X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
y (array-like of shape (n_samples,)) – The target values (class labels in classification, real numbers in regression).
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.

Returns:

self – Fitted estimator.

Return type:

object

class wildboar.ensemble.ExtraShapeletTreesClassifier(n_estimators=100, *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, metric='euclidean', metric_params=None, criterion='entropy', oob_score=False, bootstrap=True, warm_start=False, class_weight=None, n_jobs=None, random_state=None)[source]#

Bases: BaseShapeletForestClassifier

An ensemble of extremely random shapelet trees for time series regression.

Examples

>>> from wildboar.ensemble import ExtraShapeletTreesClassifier
>>> from wildboar.datasets import load_synthetic_control
>>> x, y = load_synthetic_control()
>>> f = ExtraShapeletTreesClassifier(n_estimators=100, metric='scaled_euclidean')
>>> f.fit(x, y)
>>> y_hat = f.predict(x)

Construct a extra shapelet trees classifier.

Parameters:

n_estimators (int, optional) – The number of estimators
bootstrap (bool, optional) – Use bootstrap sampling to fit the base estimators
n_jobs (int, optional) – The number of processor cores used for fitting the ensemble
min_shapelet_size (float, optional) – The minimum shapelet size to sample
max_shapelet_size (float, optional) – The maximum shapelet size to sample
min_samples_split (int, optional) – The minimum samples required to split the decision trees
min_samples_leaf (int, optional) – The minimum number of samples in a leaf
criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split
min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value
warm_start (bool, optional) – When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
metric ({'euclidean', 'scaled_euclidean', 'scaled_dtw'}, optional) – Set the metric used to compute the distance between shapelet and time series
metric_params (dict, optional) – Parameters passed to the metric construction
class_weight (dict or "balanced", optional) –
Weights associated with the labels
- if dict, weights on the form {label: weight}
- if “balanced” each class weight inversely proportional to the class frequency
- if None, each class has equal weight
random_state (int or RandomState) –
- If int, random_state is the seed used by the random number generator
- If RandomState instance, random_state is the random number generator
- If None, the random number generator is the RandomState instance used by np.random.

class wildboar.ensemble.ExtraShapeletTreesRegressor(n_estimators=100, *, max_depth=None, min_samples_split=2, min_shapelet_size=0, max_shapelet_size=1, metric='euclidean', metric_params=None, criterion='squared_error', oob_score=False, bootstrap=True, warm_start=False, n_jobs=None, random_state=None)[source]#

Bases: BaseShapeletForestRegressor

An ensemble of extremely random shapelet trees for time series regression.

Examples

>>> from wildboar.ensemble import ExtraShapeletTreesRegressor
>>> from wildboar.datasets import load_synthetic_control
>>> x, y = load_synthetic_control()
>>> f = ExtraShapeletTreesRegressor(n_estimators=100, metric='scaled_euclidean')
>>> f.fit(x, y)
>>> y_hat = f.predict(x)

Construct a extra shapelet trees regressor.

Parameters:

n_estimators (int, optional) – The number of estimators
bootstrap (bool, optional) – Use bootstrap sampling to fit the base estimators
n_jobs (int, optional) – The number of processor cores used for fitting the ensemble
min_shapelet_size (float, optional) – The minimum shapelet size to sample
max_shapelet_size (float, optional) – The maximum shapelet size to sample
min_samples_split (int, optional) – The minimum samples required to split the decision trees
warm_start (bool, optional) – When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
metric ({'euclidean', 'scaled_euclidean', 'scaled_dtw'}, optional) – Set the metric used to compute the distance between shapelet and time series
metric_params (dict, optional) – Parameters passed to the metric construction
random_state (int or RandomState) –
- If int, random_state is the seed used by the random number generator
- If RandomState instance, random_state is the random number generator
- If None, the random number generator is the RandomState instance used by np.random.

class wildboar.ensemble.IntervalForestClassifier(n_estimators=100, *, n_intervals='sqrt', intervals='fixed', summarizer='auto', sample_size=0.5, min_size=0.0, max_size=1.0, oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0, criterion='entropy', bootstrap=True, warm_start=False, n_jobs=None, class_weight=None, random_state=None)[source]#

Bases: BaseForestClassifier

An ensemble of interval tree classifiers.

class wildboar.ensemble.IntervalForestRegressor(n_estimators=100, *, n_intervals='sqrt', intervals='fixed', summarizer='auto', sample_size=0.5, min_size=0.0, max_size=1.0, oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0, criterion='squared_error', bootstrap=True, warm_start=False, n_jobs=None, random_state=None)[source]#

Bases: BaseForestRegressor

An ensemble of interval tree regressors.

class wildboar.ensemble.IsolationShapeletForest(*, n_shapelets=1, n_estimators=100, bootstrap=False, n_jobs=None, min_shapelet_size=0, max_shapelet_size=1, min_samples_split=2, max_samples='auto', contamination='auto', warm_start=False, metric='euclidean', metric_params=None, random_state=None)[source]#

Bases: sklearn.base.OutlierMixin, ForestMixin, BaseBagging

A isolation shapelet forest.

New in version 0.3.5.

offset_[source]#

The offset for computing the final decision

Type:: float

Examples

>>> from wildboar.ensemble import IsolationShapeletForest
>>> from wildboar.datasets import load_two_lead_ecg
>>> from wildboar.model_selection.outlier import train_test_split
>>> from sklearn.metrics import balanced_accuracy_score
>>> x, y = load_two_lead_ecg()
>>> x_train, x_test, y_train, y_test = train_test_split(
...    x, y, 1, test_size=0.2, anomalies_train_size=0.05
... )
>>> f = IsolationShapeletForest(
...     n_estimators=100, contamination=balanced_accuracy_score
... )
>>> f.fit(x_train, y_train)
>>> y_pred = f.predict(x_test)
>>> balanced_accuracy_score(y_test, y_pred)

Or using default offset threshold

>>> from wildboar.ensemble import IsolationShapeletForest
>>> from wildboar.datasets import load_two_lead_ecg
>>> from wildboar.model_selection.outlier import train_test_split
>>> from sklearn.metrics import balanced_accuracy_score
>>> f = IsolationShapeletForest()
>>> x, y = load_two_lead_ecg()
>>> x_train, x_test, y_train, y_test = train_test_split(
...     x, y, 1, test_size=0.2, anomalies_train_size=0.05
... )
>>> f.fit(x_train)
>>> y_pred = f.predict(x_test)
>>> balanced_accuracy_score(y_test, y_pred)

Construct a shapelet isolation forest

Parameters:

n_estimators (int, optional) – The number of estimators
bootstrap (bool, optional) – Use bootstrap sampling to fit the base estimators
n_jobs (int, optional) – The number of processor cores used for fitting the ensemble
min_shapelet_size (float, optional) – The minimum shapelet size to sample
max_shapelet_size (float, optional) – The maximum shapelet size to sample
min_samples_split (int, optional) – The minimum samples required to split the decision trees
max_samples ("auto", float or int, optional) – The number of samples to draw to train each base estimator
contamination ('auto' or float, optional) –
The strategy for computing the offset (see offset_)
- if ‘auto’, offset_=-0.5
- if float offset_ is computed as the c:th percentile of scores.
If bootstrap=True, out-of-bag samples are used for computing the scores.
warm_start (bool, optional) – When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
metric ({'euclidean', 'scaled_euclidean', 'scaled_dtw'}, optional) – Set the metric used to compute the distance between shapelet and time series
metric_params (dict, optional) – Parameters passed to the metric construction
random_state (int or RandomState) –
- If int, random_state is the seed used by the random number generator
- If RandomState instance, random_state is the random number generator
- If None, the random number generator is the RandomState instance used by np.random.

decision_function(x)[source]#

fit(x, y=None, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:

X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
y (array-like of shape (n_samples,)) – The target values (class labels in classification, real numbers in regression).
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.

Returns:

self – Fitted estimator.

Return type:

object

predict(x)[source]#

score_samples(x)[source]#

class wildboar.ensemble.PivotForestClassifier(n_estimators=100, *, n_pivot='sqrt', metrics='all', oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0, criterion='entropy', bootstrap=True, warm_start=False, n_jobs=None, class_weight=None, random_state=None)[source]#

Bases: BaseForestClassifier

An ensemble of interval tree classifiers.

class wildboar.ensemble.ProximityForestClassifier(n_estimators=100, *, n_pivot=1, pivot_sample='label', metric_sample='weighted', metric_factories='default', oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0, criterion='entropy', bootstrap=True, warm_start=False, n_jobs=None, class_weight=None, random_state=None)[source]#

Bases: BaseForestClassifier

A forest of proximity trees

References

Lucas, Benjamin, Ahmed Shifaz, Charlotte Pelletier, Lachlan O’Neill, Nayyar Zaidi, Bart Goethals, François Petitjean, and Geoffrey I. Webb. (2019): Proximity forest: an effective and scalable distance-based classifier for time series. Data Mining and Knowledge Discovery

class wildboar.ensemble.RocketForestClassifier(n_estimators=100, *, n_kernels=10, oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, sampling='normal', sampling_params=None, kernel_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, criterion='entropy', bootstrap=True, warm_start=False, class_weight=None, n_jobs=None, random_state=None)[source]#

Bases: BaseForestClassifier

An ensemble of rocket tree classifiers.

class wildboar.ensemble.RocketForestRegressor(n_estimators=100, *, n_kernels=10, oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, sampling='normal', sampling_params=None, kernel_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, criterion='squared_error', bootstrap=True, warm_start=False, n_jobs=None, random_state=None)[source]#

Bases: BaseForestRegressor

An ensemble of rocket tree regressors.

class wildboar.ensemble.ShapeletForestClassifier(n_estimators=100, *, n_shapelets='warn', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, alpha=None, metric='euclidean', metric_params=None, criterion='entropy', oob_score=False, bootstrap=True, warm_start=False, class_weight=None, n_jobs=None, random_state=None)[source]#

Bases: BaseShapeletForestClassifier

An ensemble of random shapelet tree classifiers.

Examples

>>> from wildboar.ensemble import ShapeletForestClassifier
>>> from wildboar.datasets import load_synthetic_control
>>> x, y = load_synthetic_control()
>>> f = ShapeletForestClassifier(n_estimators=100, metric='scaled_euclidean')
>>> f.fit(x, y)
>>> y_hat = f.predict(x)

Shapelet forest classifier.

Parameters:

n_estimators (int, optional) – The number of estimators
n_shapelets (int, optional) – The number of shapelets to sample at each node
bootstrap (bool, optional) – Use bootstrap sampling to fit the base estimators
n_jobs (int, optional) – The number of processor cores used for fitting the ensemble
min_shapelet_size (float, optional) – The minimum shapelet size to sample
max_shapelet_size (float, optional) – The maximum shapelet size to sample
alpha (float, optional) –
Dynamically decrease the number of sampled shapelets at each node according to the current depth.
- if \(alpha < 0\), the number of sampled shapelets decrease from n_shapelets towards 1 with increased depth.
- if \(alpha > 0\), the number of sampled shapelets increase from 1 towards n_shapelets with increased depth.
- if None, the number of sampled shapelets are the same independeth of depth.
min_samples_split (int, optional) – The minimum samples required to split the decision trees
min_samples_leaf (int, optional) – The minimum number of samples in a leaf
criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split
min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value
warm_start (bool, optional) – When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
metric ({'euclidean', 'scaled_euclidean', 'scaled_dtw'}, optional) – Set the metric used to compute the distance between shapelet and time series
metric_params (dict, optional) – Parameters passed to the metric construction
oob_score (bool, optional) – Compute out-of-bag estimates of the ensembles performance.
class_weight (dict or "balanced", optional) –
Weights associated with the labels
- if dict, weights on the form {label: weight}
- if “balanced” each class weight inversely proportional to the class frequency
- if None, each class has equal weight
random_state (int or RandomState) –
- If int, random_state is the seed used by the random number generator
- If RandomState instance, random_state is the random number generator
- If None, the random number generator is the RandomState instance used by np.random.

class wildboar.ensemble.ShapeletForestEmbedding(n_estimators=100, *, n_shapelets=1, max_depth=5, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, metric='euclidean', metric_params=None, criterion='squared_error', bootstrap=True, warm_start=False, n_jobs=None, sparse_output=True, random_state=None)[source]#

Bases: BaseShapeletForestRegressor

An ensemble of random shapelet trees

An unsupervised transformation of a time series dataset to a high-dimensional sparse representation. A time series i indexed by the leaf that it falls into. This leads to a binary coding of a time series with as many ones as trees in the forest.

The dimensionality of the resulting representation is <= n_estimators * 2^max_depth

Parameters:

n_estimators (int, optional) – The number of estimators
bootstrap (bool, optional) – Use bootstrap sampling to fit the base estimators
n_jobs (int, optional) – The number of processor cores used for fitting the ensemble
min_shapelet_size (float, optional) – The minimum shapelet size to sample
max_shapelet_size (float, optional) – The maximum shapelet size to sample
min_samples_split (int, optional) – The minimum samples required to split the decision trees
warm_start (bool, optional) – When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
metric ({'euclidean', 'scaled_euclidean', 'scaled_dtw'}, optional) – Set the metric used to compute the distance between shapelet and time series
metric_params (dict, optional) – Parameters passed to the metric construction
sparse_output (bool, optional) – Return a sparse CSR-matrix.
random_state (int or RandomState) –
- If int, random_state is the seed used by the random number generator
- If RandomState instance, random_state is the random number generator
- If None, the random number generator is the RandomState instance used by np.random.

fit(x, y=None, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:

X ({array-like, sparse matrix} of shape (n_samples, n_features)) – The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
y (array-like of shape (n_samples,)) – The target values (class labels in classification, real numbers in regression).
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.

Returns:

self – Fitted estimator.

Return type:

object

fit_transform(x, y=None, sample_weight=None)[source]#

transform(x)[source]#

class wildboar.ensemble.ShapeletForestRegressor(n_estimators=100, *, n_shapelets='warn', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, alpha=None, metric='euclidean', metric_params=None, criterion='squared_error', oob_score=False, bootstrap=True, warm_start=False, n_jobs=None, random_state=None)[source]#

Bases: BaseShapeletForestRegressor

An ensemble of random shapelet regression trees.

Examples

>>> from wildboar.ensemble import ShapeletForestRegressor
>>> from wildboar.datasets import load_synthetic_control
>>> x, y = load_synthetic_control()
>>> f = ShapeletForestRegressor(n_estimators=100, metric='scaled_euclidean')
>>> f.fit(x, y)
>>> y_hat = f.predict(x)

Shapelet forest regressor.

Parameters:

n_estimators (int, optional) – The number of estimators
n_shapelets (int, optional) – The number of shapelets to sample at each node
bootstrap (bool, optional) – Use bootstrap sampling to fit the base estimators
n_jobs (int, optional) – The number of processor cores used for fitting the ensemble
min_shapelet_size (float, optional) – The minimum shapelet size to sample
max_shapelet_size (float, optional) – The maximum shapelet size to sample
alpha (float, optional) –
Dynamically decrease the number of sampled shapelets at each node according to the current depth.
- if \(alpha < 0\), the number of sampled shapelets decrease from n_shapelets towards 1 with increased depth.
- if \(alpha > 0\), the number of sampled shapelets increase from 1 towards n_shapelets with increased depth.
- if None, the number of sampled shapelets are the same independeth of depth.
min_samples_split (int, optional) – The minimum samples required to split the decision trees
warm_start (bool, optional) – When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
metric ({'euclidean', 'scaled_euclidean', 'scaled_dtw'}, optional) – Set the metric used to compute the distance between shapelet and time series
metric_params (dict, optional) – Parameters passed to the metric construction
oob_score (bool, optional) – Compute out-of-bag estimates of the ensembles performance.
random_state (int or RandomState) –
- If int, random_state is the seed used by the random number generator
- If RandomState instance, random_state is the random number generator
- If None, the random number generator is the RandomState instance used by np.random.

wildboar.ensemble#

Package Contents#

Classes#

`wildboar.ensemble`#