wildboar.ensemble#

Ensemble methods for classification, regression and outlier detection.

Package Contents#

Classes#

BaggingClassifier

A bagging classifier.

BaggingRegressor

A bagging regressor.

BaseBagging

Base estimator for Wildboar ensemble estimators.

ElasticEnsembleClassifier

Ensemble of wildboar.distance.KNeighborsClassifier.

ExtraShapeletTreesClassifier

An ensemble of extremely random shapelet trees.

ExtraShapeletTreesRegressor

An ensemble of extremely random shapelet tree regressors.

IntervalForestClassifier

An ensemble of interval tree classifiers.

IntervalForestRegressor

An ensemble of interval tree regressors.

IsolationShapeletForest

An isolation shapelet forest.

PivotForestClassifier

An ensemble of interval tree classifiers.

ProximityForestClassifier

A forest of proximity trees.

RocketForestClassifier

An ensemble of rocket tree classifiers.

RocketForestRegressor

An ensemble of rocket tree regressors.

ShapeletForestClassifier

An ensemble of random shapelet tree classifiers.

ShapeletForestEmbedding

An ensemble of random shapelet trees.

ShapeletForestRegressor

An ensemble of random shapelet tree regressors.

class wildboar.ensemble.BaggingClassifier(estimator=None, n_estimators=10, *, max_samples=1.0, bootstrap=True, oob_score=False, class_weight=None, warm_start=False, n_jobs=None, random_state=None, verbose=0, base_estimator='deprecated')[source]#

A bagging classifier.

A bagging regressor is a meta-estimator that fits base classifiers on random subsets of the original data.

Parameters:
estimatorobject, optional

Base estimator of the ensemble. If None, the base estimator is a ShapeletTreeRegressor.

n_estimatorsint, optional

The number of base estimators in the ensemble.

max_samplesint or float, optional

The number of samples to draw from X to train each base estimator.

  • if int, then draw max_samples samples.

  • if float, then draw max_samples * n_samples samples.

bootstrapbool, optional

If the samples are drawn with replacement.

oob_scorebool, optional

Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.

class_weightdict or “balanced”, optional

Weights associated with the labels

  • if dict, weights on the form {label: weight}.

  • if “balanced” each class weight inversely proportional to the class frequency.

  • if None, each class has equal weight.

warm_startbool, optional

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.

n_jobsint, optional

The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.

random_stateint or RandomState, optional

Controls the random resampling of the original dataset.

  • If int, random_state is the seed used by the random number generator.

  • If numpy.random.RandomState instance, random_state is the random number generator.

  • If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

verboseint, optional

Controls the output to standard error while fitting and predicting.

base_estimatorobject, optional

Use estimator instead.

Deprecated since version 1.2: base_estimator has been deprecated and will be removed in 1.4. Use estimator instead.

decision_function(X)[source]#

Average of the decision functions of the base classifiers.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
scorendarray of shape (n_samples, k)

The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute classes_. Regression and binary classification are special cases with k == 1, otherwise k==n_classes.

fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

yarray-like of shape (n_samples,)

The target values (class labels in classification, real numbers in regression).

sample_weightarray-like of shape (n_samples,), default=None

Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.

**fit_paramsdict

Parameters to pass to the underlying estimators.

Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:
selfobject

Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

predict(X)[source]#

Predict class for X.

The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a predict_proba method, then it resorts to voting.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
yndarray of shape (n_samples,)

The predicted classes.

predict_log_proba(X)[source]#

Predict class log-probabilities for X.

The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
pndarray of shape (n_samples, n_classes)

The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

predict_proba(X)[source]#

Predict class probabilities for X.

The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a predict_proba method, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
pndarray of shape (n_samples, n_classes)

The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

score(X, y, sample_weight=None)[source]#

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

property estimators_samples_[source]#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

class wildboar.ensemble.BaggingRegressor(estimator=None, n_estimators=100, *, max_samples=1.0, bootstrap=True, oob_score=False, warm_start=False, n_jobs=None, random_state=None, verbose=0, base_estimator='deprecated')[source]#

A bagging regressor.

A bagging regressor is a meta-estimator that fits base classifiers on random subsets of the original data.

Parameters:
estimatorobject, optional

Base estimator of the ensemble. If None, the base estimator is a ShapeletTreeRegressor.

n_estimatorsint, optional

The number of base estimators in the ensemble.

max_samplesint or float, optional

The number of samples to draw from X to train each base estimator.

  • if int, then draw max_samples samples.

  • if float, then draw max_samples * n_samples samples.

bootstrapbool, optional

If the samples are drawn with replacement.

oob_scorebool, optional

Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.

warm_startbool, optional

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.

n_jobsint, optional

The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.

random_stateint or RandomState, optional

Controls the random resampling of the original dataset.

  • If int, random_state is the seed used by the random number generator.

  • If numpy.random.RandomState instance, random_state is the random number generator.

  • If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

verboseint, optional

Controls the output to standard error while fitting and predicting.

base_estimatorobject, optional

Use estimator instead.

Deprecated since version 1.2: base_estimator has been deprecated and will be removed in 1.4. Use estimator instead.

fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

yarray-like of shape (n_samples,)

The target values (class labels in classification, real numbers in regression).

sample_weightarray-like of shape (n_samples,), default=None

Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.

**fit_paramsdict

Parameters to pass to the underlying estimators.

Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:
selfobject

Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

predict(X)[source]#

Predict regression target for X.

The predicted regression target of an input sample is computed as the mean predicted regression targets of the estimators in the ensemble.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
yndarray of shape (n_samples,)

The predicted values.

score(X, y, sample_weight=None)[source]#

Return the coefficient of determination of the prediction.

The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True values for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

\(R^2\) of self.predict(X) w.r.t. y.

Notes

The \(R^2\) score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score. This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

property estimators_samples_[source]#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

class wildboar.ensemble.BaseBagging(estimator=None, n_estimators=10, *, max_samples=1.0, bootstrap=True, oob_score=False, warm_start=False, n_jobs=None, random_state=None, verbose=0, base_estimator='deprecated')[source]#

Base estimator for Wildboar ensemble estimators.

fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

yarray-like of shape (n_samples,)

The target values (class labels in classification, real numbers in regression).

sample_weightarray-like of shape (n_samples,), default=None

Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.

**fit_paramsdict

Parameters to pass to the underlying estimators.

Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:
selfobject

Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

property estimators_samples_[source]#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

class wildboar.ensemble.ElasticEnsembleClassifier(n_neighbors=1, *, metric='auto', n_jobs=None)[source]#

Ensemble of wildboar.distance.KNeighborsClassifier.

Each classifier is fitted with an optimized parameter grid over metric parameters.

Parameters:
n_neighborsint, optional

The number of neighbors.

metric{“auto”, “elastic”, “non_elastic”, “all”} or dict, optional

The metric specification.

  • if “auto” or “elastic”, fit one classifier for each elastic distance as described by Lines and Bagnall (2015). We use a slightly smaller parameter grid.

  • if “non_elastic”, fit one classifier for each non-elastic distance measure.

  • if “all”, fit one classifier for the metrics in both “elastic” and “non_elastic”.

  • if dict, a custom metric specification.

n_jobsint, optional

The number of paralell jobs.

References

Jason Lines and Anthony Bagnall,

Time Series Classification with Ensembles of Elastic Distance Measures, Data Mining and Knowledge Discovery, 29(3), 2015.

Examples

>>> from wildboar.datasets import load_gun_point
>>> from wildboar.ensemble import ElasticEnsembleClassifier
>>> X_train, X_test, y_train, y_test = load_gun_point(merge_train_test=False)
>>> clf = ElasticEnsembleClassifier(
...     metric={
...         "dtw": {"min_r": 0.1, "max_r": 0.3},
...         "ddtw": {"min_r": 0.1, "max_r": 0.3},
...     },
... )
>>> clf.fit(X_train, y_train)
ElasticEnsembleClassifier(metric={'ddtw': {'max_r': 0.3, 'min_r': 0.1},
                                  'dtw': {'max_r': 0.3, 'min_r': 0.1}})
>>> clf.score(X_test, y_test)
0.9866666666666667
Attributes:
scorestuple

A tuple of metric name and cross-validation score.

fit(x, y)[source]#

Fit the estimator.

Parameters:
xarray-like of shape (n_samples, n_timesteps) or (n_samples, n_dim, n_timesteps)

The input samples.

yarray-like of shape (n_samples, )

The input labels.

Returns:
object

This estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

predict(x)[source]#

Compute the class label for the samples in x.

Parameters:
xarray-like of shape (n_samples, n_timesteps) or (n_samples, n_dim, n_timesteps)

The input samples.

Returns:
ndarray of shape (n_samples, )

The class label for each sample.

predict_proba(x)[source]#

Compute probability estimates for the samples in x.

Parameters:
xarray-like of shape (n_samples, n_timesteps) or (n_samples, n_dim, n_timesteps)

The input time series.

Returns:
ndarray of shape (n_samples, n_classes)

The probabilities.

score(X, y, sample_weight=None)[source]#

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

class wildboar.ensemble.ExtraShapeletTreesClassifier(n_estimators=100, *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, metric='euclidean', metric_params=None, criterion='entropy', oob_score=False, bootstrap=True, warm_start=False, class_weight=None, n_jobs=None, random_state=None)[source]#

An ensemble of extremely random shapelet trees.

Parameters:
n_estimatorsint, optional

The number of estimators.

max_depthint, optional

The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_splitint, optional

The minimum number of samples to split an internal node.

min_samples_leafint, optional

The minimum number of samples in a leaf.

min_impurity_decreasefloat, optional

A split will be introduced only if the impurity decrease is larger than or equal to this value.

min_shapelet_sizefloat, optional

The minimum length of a shapelets expressed as a fraction of n_timestep.

max_shapelet_sizefloat, optional

The maximum length of a shapelets expressed as a fraction of n_timestep.

metricstr or list, optional

The distance metric.

  • If str, the distance metric used to identify the best shapelet.

  • If list, multiple metrics specified as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument r with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).

    Read more about metric specifications in the User guide.

Changed in version 1.2: Added support for multi-metric shapelet transform

metric_paramsdict, optional

Parameters for the distance measure. Ignored unless metric is a string.

Read more about the parameters in the User guide.

criterion{“entropy”, “gini”}, optional

The criterion used to evaluate the utility of a split.

oob_scorebool, optional

Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.

bootstrapbool, optional

If the samples are drawn with replacement.

warm_startbool, optional

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.

class_weightdict or “balanced”, optional

Weights associated with the labels

  • if dict, weights on the form {label: weight}.

  • if “balanced” each class weight inversely proportional to the class frequency.

  • if None, each class has equal weight.

n_jobsint, optional

The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.

random_stateint or RandomState, optional

Controls the random resampling of the original dataset.

  • If int, random_state is the seed used by the random number generator

  • If numpy.random.RandomState instance, random_state is the random number generator

  • If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

Examples

>>> from wildboar.ensemble import ExtraShapeletTreesClassifier
>>> from wildboar.datasets import load_synthetic_control
>>> x, y = load_synthetic_control()
>>> f = ExtraShapeletTreesClassifier(n_estimators=100, metric='scaled_euclidean')
>>> f.fit(x, y)
ExtraShapeletTreesClassifier(metric='scaled_euclidean')
>>> y_hat = f.predict(x)
decision_function(X)[source]#

Average of the decision functions of the base classifiers.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
scorendarray of shape (n_samples, k)

The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute classes_. Regression and binary classification are special cases with k == 1, otherwise k==n_classes.

fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

yarray-like of shape (n_samples,)

The target values (class labels in classification, real numbers in regression).

sample_weightarray-like of shape (n_samples,), default=None

Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.

**fit_paramsdict

Parameters to pass to the underlying estimators.

Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:
selfobject

Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

predict(X)[source]#

Predict class for X.

The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a predict_proba method, then it resorts to voting.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
yndarray of shape (n_samples,)

The predicted classes.

predict_log_proba(X)[source]#

Predict class log-probabilities for X.

The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
pndarray of shape (n_samples, n_classes)

The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

predict_proba(X)[source]#

Predict class probabilities for X.

The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a predict_proba method, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
pndarray of shape (n_samples, n_classes)

The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

score(X, y, sample_weight=None)[source]#

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

property estimators_samples_[source]#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

class wildboar.ensemble.ExtraShapeletTreesRegressor(n_estimators=100, *, max_depth=None, min_samples_split=2, min_shapelet_size=0, max_shapelet_size=1, metric='euclidean', metric_params=None, criterion='squared_error', oob_score=False, bootstrap=True, warm_start=False, n_jobs=None, random_state=None)[source]#

An ensemble of extremely random shapelet tree regressors.

Parameters:
n_estimatorsint, optional

The number of estimators.

max_depthint, optional

The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_splitint, optional

The minimum number of samples to split an internal node.

min_shapelet_sizefloat, optional

The minimum length of a shapelets expressed as a fraction of n_timestep.

max_shapelet_sizefloat, optional

The maximum length of a shapelets expressed as a fraction of n_timestep.

metricstr or list, optional

The distance metric.

  • If str, the distance metric used to identify the best shapelet.

  • If list, multiple metrics specified as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument r with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).

    Read more about metric specifications in the User guide.

Changed in version 1.2: Added support for multi-metric shapelet transform

metric_paramsdict, optional

Parameters for the distance measure. Ignored unless metric is a string.

Read more about the parameters in the User guide.

criterion{“squared_error”}, optional

The criterion used to evaluate the utility of a split.

Deprecated since version 1.1: Criterion “mse” was deprecated in v1.1 and removed in version 1.2.

oob_scorebool, optional

Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.

bootstrapbool, optional

If the samples are drawn with replacement.

warm_startbool, optional

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.

n_jobsint, optional

The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.

random_stateint or RandomState, optional

Controls the random resampling of the original dataset.

  • If int, random_state is the seed used by the random number generator

  • If numpy.random.RandomState instance, random_state is the random number generator

  • If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

Examples

>>> from wildboar.ensemble import ExtraShapeletTreesRegressor
>>> from wildboar.datasets import load_synthetic_control
>>> x, y = load_synthetic_control()
>>> f = ExtraShapeletTreesRegressor(n_estimators=100, metric='scaled_euclidean')
>>> f.fit(x, y)
ExtraShapeletTreesRegressor(metric='scaled_euclidean')
>>> y_hat = f.predict(x)
fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

yarray-like of shape (n_samples,)

The target values (class labels in classification, real numbers in regression).

sample_weightarray-like of shape (n_samples,), default=None

Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.

**fit_paramsdict

Parameters to pass to the underlying estimators.

Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:
selfobject

Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

predict(X)[source]#

Predict regression target for X.

The predicted regression target of an input sample is computed as the mean predicted regression targets of the estimators in the ensemble.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
yndarray of shape (n_samples,)

The predicted values.

score(X, y, sample_weight=None)[source]#

Return the coefficient of determination of the prediction.

The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True values for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

\(R^2\) of self.predict(X) w.r.t. y.

Notes

The \(R^2\) score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score. This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

property estimators_samples_[source]#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

class wildboar.ensemble.IntervalForestClassifier(n_estimators=100, *, n_intervals='sqrt', intervals='fixed', summarizer='mean_var_std', sample_size=0.5, min_size=0.0, max_size=1.0, oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0, criterion='entropy', bootstrap=True, warm_start=False, n_jobs=None, class_weight=None, random_state=None)[source]#

An ensemble of interval tree classifiers.

decision_function(X)[source]#

Average of the decision functions of the base classifiers.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
scorendarray of shape (n_samples, k)

The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute classes_. Regression and binary classification are special cases with k == 1, otherwise k==n_classes.

fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

yarray-like of shape (n_samples,)

The target values (class labels in classification, real numbers in regression).

sample_weightarray-like of shape (n_samples,), default=None

Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.

**fit_paramsdict

Parameters to pass to the underlying estimators.

Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:
selfobject

Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

predict(X)[source]#

Predict class for X.

The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a predict_proba method, then it resorts to voting.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
yndarray of shape (n_samples,)

The predicted classes.

predict_log_proba(X)[source]#

Predict class log-probabilities for X.

The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
pndarray of shape (n_samples, n_classes)

The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

predict_proba(X)[source]#

Predict class probabilities for X.

The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a predict_proba method, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
pndarray of shape (n_samples, n_classes)

The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

score(X, y, sample_weight=None)[source]#

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

property estimators_samples_[source]#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

class wildboar.ensemble.IntervalForestRegressor(n_estimators=100, *, n_intervals='sqrt', intervals='fixed', summarizer='auto', sample_size=0.5, min_size=0.0, max_size=1.0, oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0, criterion='squared_error', bootstrap=True, warm_start=False, n_jobs=None, random_state=None)[source]#

An ensemble of interval tree regressors.

fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

yarray-like of shape (n_samples,)

The target values (class labels in classification, real numbers in regression).

sample_weightarray-like of shape (n_samples,), default=None

Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.

**fit_paramsdict

Parameters to pass to the underlying estimators.

Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:
selfobject

Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

predict(X)[source]#

Predict regression target for X.

The predicted regression target of an input sample is computed as the mean predicted regression targets of the estimators in the ensemble.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
yndarray of shape (n_samples,)

The predicted values.

score(X, y, sample_weight=None)[source]#

Return the coefficient of determination of the prediction.

The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True values for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

\(R^2\) of self.predict(X) w.r.t. y.

Notes

The \(R^2\) score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score. This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

property estimators_samples_[source]#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

class wildboar.ensemble.IsolationShapeletForest(n_estimators=100, *, n_shapelets=1, bootstrap=False, n_jobs=None, min_shapelet_size=0, max_shapelet_size=1, min_samples_split=2, max_samples='auto', contamination='auto', warm_start=False, metric='euclidean', metric_params=None, random_state=None)[source]#

An isolation shapelet forest.

Added in version 0.3.5.

Parameters:
n_estimatorsint, optional

The number of estimators in the ensemble.

n_shapeletsint, optional

The number of shapelets to sample at each node.

bootstrapbool, optional

If the samples are drawn with replacement.

n_jobsint, optional

The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.

min_shapelet_sizefloat, optional

The minimum length of a shapelets expressed as a fraction of n_timestep.

max_shapelet_sizefloat, optional

The maximum length of a shapelets expressed as a fraction of n_timestep.

min_samples_splitint, optional

The minimum number of samples to split an internal node.

max_samples“auto”, float or int, optional

The number of samples to draw to train each base estimator.

contamination‘auto’ or float, optional

The strategy for computing the offset.

  • if “auto” then offset_ is set to -0.5.

  • if float offset_ is computed as the c:th percentile of scores.

If bootstrap=True, out-of-bag samples are used for computing the scores.

warm_startbool, optional

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.

metricstr or list, optional

The distance metric.

  • If str, the distance metric used to identify the best shapelet.

  • If list, multiple metrics specified as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument r with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).

    Read more about metric specifications in the User guide.

Changed in version 1.2: Added support for multi-metric shapelet transform

metric_paramsdict, optional

Parameters for the distance measure. Ignored unless metric is a string.

Read more about the parameters in the User guide.

random_stateint or RandomState, optional

Controls the random resampling of the original dataset.

  • If int, random_state is the seed used by the random number generator.

  • If numpy.random.RandomState instance, random_state is the random number generator.

  • If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

Examples

Using default offset threshold

>>> from wildboar.ensemble import IsolationShapeletForest
>>> from wildboar.datasets import load_two_lead_ecg
>>> from wildboar.model_selection import outlier_train_test_split
>>> from sklearn.metrics import balanced_accuracy_score
>>> f = IsolationShapeletForest(random_state=1)
>>> x, y = load_two_lead_ecg()
>>> x_train, x_test, y_train, y_test = outlier_train_test_split(
...     x, y, 1, test_size=0.2, anomalies_train_size=0.05, random_state=1
... )
>>> f.fit(x_train)
IsolationShapeletForest(random_state=1)
>>> y_pred = f.predict(x_test)
>>> balanced_accuracy_score(y_test, y_pred) 
0.8674
Attributes:
offset_float

The offset for computing the final decision

fit(x, y=None, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

yarray-like of shape (n_samples,)

The target values (class labels in classification, real numbers in regression).

sample_weightarray-like of shape (n_samples,), default=None

Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.

**fit_paramsdict

Parameters to pass to the underlying estimators.

Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:
selfobject

Fitted estimator.

fit_predict(X, y=None, **kwargs)[source]#

Perform fit on X and returns labels for X.

Returns -1 for outliers and 1 for inliers.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The input samples.

yIgnored

Not used, present for API consistency by convention.

**kwargsdict

Arguments to be passed to fit.

Added in version 1.4.

Returns:
yndarray of shape (n_samples,)

1 for inliers, -1 for outliers.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

property estimators_samples_[source]#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

class wildboar.ensemble.PivotForestClassifier(n_estimators=100, *, n_pivot='sqrt', metrics='all', oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0, criterion='entropy', bootstrap=True, warm_start=False, n_jobs=None, class_weight=None, random_state=None)[source]#

An ensemble of interval tree classifiers.

decision_function(X)[source]#

Average of the decision functions of the base classifiers.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
scorendarray of shape (n_samples, k)

The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute classes_. Regression and binary classification are special cases with k == 1, otherwise k==n_classes.

fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

yarray-like of shape (n_samples,)

The target values (class labels in classification, real numbers in regression).

sample_weightarray-like of shape (n_samples,), default=None

Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.

**fit_paramsdict

Parameters to pass to the underlying estimators.

Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:
selfobject

Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

predict(X)[source]#

Predict class for X.

The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a predict_proba method, then it resorts to voting.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
yndarray of shape (n_samples,)

The predicted classes.

predict_log_proba(X)[source]#

Predict class log-probabilities for X.

The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
pndarray of shape (n_samples, n_classes)

The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

predict_proba(X)[source]#

Predict class probabilities for X.

The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a predict_proba method, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
pndarray of shape (n_samples, n_classes)

The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

score(X, y, sample_weight=None)[source]#

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

property estimators_samples_[source]#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

class wildboar.ensemble.ProximityForestClassifier(n_estimators=100, *, n_pivot=1, pivot_sample='label', metric_sample='weighted', metric='auto', metric_params=None, metric_factories=None, oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0, criterion='entropy', bootstrap=True, warm_start=False, n_jobs=None, class_weight=None, random_state=None)[source]#

A forest of proximity trees.

Parameters:
n_estimatorsint, optional

The number of estimators.

n_pivotint, optional

The number of pivots to sample at each node.

pivot_sample{“label”, “uniform”}, optional

The pivot sampling method.

metric_sample{“uniform”, “weighted”}, optional

The metric sampling method.

metric{“auto”, “default”}, str or list, optional

The distance metrics. By default, we use the parameterization suggested by Lucas et.al (2019).

  • If “auto”, use the default metric specification, suggested by (Lucas et. al, 2020).

  • If str, use a single metric or default metric specification.

  • If list, custom metric specification can be given as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values as well as the number of values in the grid. For example, to specifiy a grid over the argument ‘r’ with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).

Read more about the metrics and their parameters in the User guide.

metric_paramsdict, optional

Parameters for the distance measure. Ignored unless metric is a string.

Read more about the parameters in the User guide.

metric_factoriesdict, optional

A metric specification.

Deprecated since version 1.2: Use the combination of metric and metric params.

oob_scorebool, optional

Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.

max_depthint, optional

The maximum tree depth.

min_samples_splitint, optional

The minimum number of samples to consider a split.

min_samples_leafint, optional

The minimum number of samples in a leaf.

min_impurity_decreasefloat, optional

The minimum impurity decrease to build a sub-tree.

criterion{“entropy”, “gini”}, optional

The impurity criterion.

bootstrapbool, optional

If the samples are drawn with replacement.

warm_startbool, optional

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.

n_jobsint, optional

The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.

class_weightdict or “balanced”, optional

Weights associated with the labels.

  • if dict, weights on the form {label: weight}.

  • if “balanced” each class weight inversely proportional to the class

    frequency.

  • if None, each class has equal weight.

random_stateint or RandomState, optional
  • If int, random_state is the seed used by the random number generator

  • If RandomState instance, random_state is the random number generator

  • If None, the random number generator is the RandomState instance used

    by np.random.

References

Lucas, Benjamin, Ahmed Shifaz, Charlotte Pelletier, Lachlan O’Neill, Nayyar Zaidi, Bart Goethals, François Petitjean, and Geoffrey I. Webb. (2019)

Proximity forest: an effective and scalable distance-based classifier for time series. Data Mining and Knowledge Discovery

decision_function(X)[source]#

Average of the decision functions of the base classifiers.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
scorendarray of shape (n_samples, k)

The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute classes_. Regression and binary classification are special cases with k == 1, otherwise k==n_classes.

fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

yarray-like of shape (n_samples,)

The target values (class labels in classification, real numbers in regression).

sample_weightarray-like of shape (n_samples,), default=None

Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.

**fit_paramsdict

Parameters to pass to the underlying estimators.

Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:
selfobject

Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

predict(X)[source]#

Predict class for X.

The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a predict_proba method, then it resorts to voting.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
yndarray of shape (n_samples,)

The predicted classes.

predict_log_proba(X)[source]#

Predict class log-probabilities for X.

The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
pndarray of shape (n_samples, n_classes)

The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

predict_proba(X)[source]#

Predict class probabilities for X.

The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a predict_proba method, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
pndarray of shape (n_samples, n_classes)

The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

score(X, y, sample_weight=None)[source]#

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

property estimators_samples_[source]#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

class wildboar.ensemble.RocketForestClassifier(n_estimators=100, *, n_kernels=10, oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, sampling='normal', sampling_params=None, kernel_size=None, min_size=None, max_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, criterion='entropy', bootstrap=True, warm_start=False, class_weight=None, n_jobs=None, random_state=None)[source]#

An ensemble of rocket tree classifiers.

decision_function(X)[source]#

Average of the decision functions of the base classifiers.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
scorendarray of shape (n_samples, k)

The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute classes_. Regression and binary classification are special cases with k == 1, otherwise k==n_classes.

fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

yarray-like of shape (n_samples,)

The target values (class labels in classification, real numbers in regression).

sample_weightarray-like of shape (n_samples,), default=None

Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.

**fit_paramsdict

Parameters to pass to the underlying estimators.

Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:
selfobject

Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

predict(X)[source]#

Predict class for X.

The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a predict_proba method, then it resorts to voting.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
yndarray of shape (n_samples,)

The predicted classes.

predict_log_proba(X)[source]#

Predict class log-probabilities for X.

The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
pndarray of shape (n_samples, n_classes)

The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

predict_proba(X)[source]#

Predict class probabilities for X.

The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a predict_proba method, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
pndarray of shape (n_samples, n_classes)

The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

score(X, y, sample_weight=None)[source]#

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

property estimators_samples_[source]#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

class wildboar.ensemble.RocketForestRegressor(n_estimators=100, *, n_kernels=10, oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, sampling='normal', sampling_params=None, kernel_size=None, min_size=None, max_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, criterion='squared_error', bootstrap=True, warm_start=False, n_jobs=None, random_state=None)[source]#

An ensemble of rocket tree regressors.

Parameters:
n_estimatorsint, optional

The number of estimators.

n_kernelsint, optional

The number of shapelets to sample at each node.

oob_scorebool, optional

Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.

max_depthint, optional

The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_splitint, optional

The minimum number of samples to split an internal node.

min_samples_leafint, optional

The minimum number of samples in a leaf.

min_impurity_decreasefloat, optional

A split will be introduced only if the impurity decrease is larger than or equal to this value.

sampling{“normal”, “uniform”, “shapelet”}, optional

The sampling of convolutional filters.

  • if “normal”, sample filter according to a normal distribution with

    mean and scale.

  • if “uniform”, sample filter according to a uniform distribution with

    lower and upper.

  • if “shapelet”, sample filters as subsequences in the training data.

sampling_paramsdict, optional

The parameters for the sampling.

  • if “normal”, {"mean": float, "scale": float}, defaults to

    {"mean": 0, "scale": 1}.

  • if “uniform”, {"lower": float, "upper": float}, defaults to

    {"lower": -1, "upper": 1}.

kernel_sizearray-like, optional

The kernel size, by default [7, 11, 13].

min_sizefloat, optional

The minimum length of a shapelets expressed as a fraction of n_timestep.

max_sizefloat, optional

The maximum length of a shapelets expressed as a fraction of n_timestep.

bias_probfloat, optional

The probability of using a bias term.

normalize_probfloat, optional

The probability of performing normalization.

padding_probfloat, optional

The probability of padding with zeros.

criterion{“squared_error”}, optional

The criterion used to evaluate the utility of a split.

Deprecated since version 1.1: Criterion “mse” was deprecated in v1.1 and removed in version 1.2.

bootstrapbool, optional

If the samples are drawn with replacement.

warm_startbool, optional

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.

n_jobsint, optional

The number of processor cores used for fitting the ensemble.

random_stateint or RandomState, optional

Controls the random resampling of the original dataset.

  • If int, random_state is the seed used by the random number generator

  • If numpy.random.RandomState instance, random_state is the random number generator

  • If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

yarray-like of shape (n_samples,)

The target values (class labels in classification, real numbers in regression).

sample_weightarray-like of shape (n_samples,), default=None

Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.

**fit_paramsdict

Parameters to pass to the underlying estimators.

Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:
selfobject

Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

predict(X)[source]#

Predict regression target for X.

The predicted regression target of an input sample is computed as the mean predicted regression targets of the estimators in the ensemble.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
yndarray of shape (n_samples,)

The predicted values.

score(X, y, sample_weight=None)[source]#

Return the coefficient of determination of the prediction.

The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True values for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

\(R^2\) of self.predict(X) w.r.t. y.

Notes

The \(R^2\) score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score. This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

property estimators_samples_[source]#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

class wildboar.ensemble.ShapeletForestClassifier(n_estimators=100, *, n_shapelets='log2', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, alpha=None, metric='euclidean', metric_params=None, criterion='entropy', oob_score=False, bootstrap=True, warm_start=False, class_weight=None, n_jobs=None, random_state=None)[source]#

An ensemble of random shapelet tree classifiers.

A forest of randomized shapelet trees.

Parameters:
n_estimatorsint, optional

The number of estimators.

n_shapeletsint, optional

The number of shapelets to sample at each node.

max_depthint, optional

The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_splitint, optional

The minimum number of samples to split an internal node.

min_samples_leafint, optional

The minimum number of samples in a leaf.

min_impurity_decreasefloat, optional

A split will be introduced only if the impurity decrease is larger than or equal to this value.

min_shapelet_sizefloat, optional

The minimum length of a shapelets expressed as a fraction of n_timestep.

max_shapelet_sizefloat, optional

The maximum length of a shapelets expressed as a fraction of n_timestep.

alphafloat, optional

Dynamically decrease the number of sampled shapelets at each node according to the current depth, i.e.:

::

w = 1 - exp(-abs(alpha) * depth)

  • if alpha < 0, the number of sampled shapelets decrease from n_shapelets towards 1 with increased depth.

  • if alpha > 0, the number of sampled shapelets increase from 1 towards n_shapelets with increased depth.

  • if None, the number of sampled shapelets are the same independeth of depth.

metricstr or list, optional

The distance metric.

  • If str, the distance metric used to identify the best shapelet.

  • If list, multiple metrics specified as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument r with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).

    Read more about metric specifications in the User guide.

Changed in version 1.2: Added support for multi-metric shapelet transform

metric_paramsdict, optional

Parameters for the distance measure. Ignored unless metric is a string.

Read more about the parameters in the User guide.

criterion{“entropy”, “gini”}, optional

The criterion used to evaluate the utility of a split.

oob_scorebool, optional

Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.

bootstrapbool, optional

If the samples are drawn with replacement.

warm_startbool, optional

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.

class_weightdict or “balanced”, optional

Weights associated with the labels

  • if dict, weights on the form {label: weight}.

  • if “balanced” each class weight inversely proportional to the class frequency.

  • if None, each class has equal weight.

n_jobsint, optional

The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.

random_stateint or RandomState, optional

Controls the random resampling of the original dataset.

  • If int, random_state is the seed used by the random number generator

  • If numpy.random.RandomState instance, random_state is the random number generator

  • If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

Examples

>>> from wildboar.ensemble import ShapeletForestClassifier
>>> from wildboar.datasets import load_synthetic_control
>>> x, y = load_synthetic_control()
>>> f = ShapeletForestClassifier(n_estimators=100, metric='scaled_euclidean')
>>> f.fit(x, y)
ShapeletForestClassifier(metric='scaled_euclidean')
>>> y_hat = f.predict(x)
decision_function(X)[source]#

Average of the decision functions of the base classifiers.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
scorendarray of shape (n_samples, k)

The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute classes_. Regression and binary classification are special cases with k == 1, otherwise k==n_classes.

fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

yarray-like of shape (n_samples,)

The target values (class labels in classification, real numbers in regression).

sample_weightarray-like of shape (n_samples,), default=None

Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.

**fit_paramsdict

Parameters to pass to the underlying estimators.

Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:
selfobject

Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

predict(X)[source]#

Predict class for X.

The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a predict_proba method, then it resorts to voting.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
yndarray of shape (n_samples,)

The predicted classes.

predict_log_proba(X)[source]#

Predict class log-probabilities for X.

The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
pndarray of shape (n_samples, n_classes)

The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

predict_proba(X)[source]#

Predict class probabilities for X.

The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a predict_proba method, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
pndarray of shape (n_samples, n_classes)

The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

score(X, y, sample_weight=None)[source]#

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

property estimators_samples_[source]#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

class wildboar.ensemble.ShapeletForestEmbedding(n_estimators=100, *, n_shapelets=1, max_depth=5, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, metric='euclidean', metric_params=None, criterion='squared_error', bootstrap=True, warm_start=False, n_jobs=None, sparse_output=True, random_state=None)[source]#

An ensemble of random shapelet trees.

An unsupervised transformation of a time series dataset to a high-dimensional sparse representation. A time series i indexed by the leaf that it falls into. This leads to a binary coding of a time series with as many ones as trees in the forest.

The dimensionality of the resulting representation is <= n_estimators * 2^max_depth

Parameters:
n_estimatorsint, optional

The number of estimators.

n_shapeletsint, optional

The number of shapelets to sample at each node.

max_depthint, optional

The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_splitint, optional

The minimum number of samples to split an internal node.

min_samples_leafint, optional

The minimum number of samples in a leaf.

min_impurity_decreasefloat, optional

A split will be introduced only if the impurity decrease is larger than or equal to this value.

min_shapelet_sizefloat, optional

The minimum length of a shapelets expressed as a fraction of n_timestep.

max_shapelet_sizefloat, optional

The maximum length of a shapelets expressed as a fraction of n_timestep.

metricstr or list, optional

The distance metric.

  • If str, the distance metric used to identify the best shapelet.

  • If list, multiple metrics specified as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument r with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).

    Read more about metric specifications in the User guide.

Changed in version 1.2: Added support for multi-metric shapelet transform

metric_paramsdict, optional

Parameters for the distance measure. Ignored unless metric is a string.

Read more about the parameters in the User guide.

criterion{“squared_error”}, optional

The criterion used to evaluate the utility of a split.

Deprecated since version 1.1: Criterion “mse” was deprecated in v1.1 and removed in version 1.2.

bootstrapbool, optional

If the samples are drawn with replacement.

warm_startbool, optional

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.

n_jobsint, optional

The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.

sparse_outputbool, optional

Return a sparse CSR-matrix.

random_stateint or RandomState, optional

Controls the random resampling of the original dataset.

  • If int, random_state is the seed used by the random number generator.

  • If numpy.random.RandomState instance, random_state is the random number generator.

  • If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

fit(x, y=None, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

yarray-like of shape (n_samples,)

The target values (class labels in classification, real numbers in regression).

sample_weightarray-like of shape (n_samples,), default=None

Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.

**fit_paramsdict

Parameters to pass to the underlying estimators.

Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:
selfobject

Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

predict(X)[source]#

Predict regression target for X.

The predicted regression target of an input sample is computed as the mean predicted regression targets of the estimators in the ensemble.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
yndarray of shape (n_samples,)

The predicted values.

score(X, y, sample_weight=None)[source]#

Return the coefficient of determination of the prediction.

The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True values for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

\(R^2\) of self.predict(X) w.r.t. y.

Notes

The \(R^2\) score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score. This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

property estimators_samples_[source]#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

class wildboar.ensemble.ShapeletForestRegressor(n_estimators=100, *, n_shapelets='log2', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, alpha=None, metric='euclidean', metric_params=None, criterion='squared_error', oob_score=False, bootstrap=True, warm_start=False, n_jobs=None, random_state=None)[source]#

An ensemble of random shapelet tree regressors.

Parameters:
n_estimatorsint, optional

The number of estimators.

n_shapeletsint, optional

The number of shapelets to sample at each node.

max_depthint, optional

The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_splitint, optional

The minimum number of samples to split an internal node.

min_samples_leafint, optional

The minimum number of samples in a leaf.

min_impurity_decreasefloat, optional

A split will be introduced only if the impurity decrease is larger than or equal to this value.

min_shapelet_sizefloat, optional

The minimum length of a shapelets expressed as a fraction of n_timestep.

max_shapelet_sizefloat, optional

The maximum length of a shapelets expressed as a fraction of n_timestep.

alphafloat, optional

Dynamically decrease the number of sampled shapelets at each node according to the current depth, i.e.::

w = 1 - exp(-abs(alpha) * depth)
  • if alpha < 0, the number of sampled shapelets decrease from n_shapelets towards 1 with increased depth.

  • if alpha > 0, the number of sampled shapelets increase from 1 towards n_shapelets with increased depth.

  • if None, the number of sampled shapelets are the same independeth of depth.

metricstr or list, optional

The distance metric.

  • If str, the distance metric used to identify the best shapelet.

  • If list, multiple metrics specified as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument r with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).

    Read more about metric specifications in the User guide.

Changed in version 1.2: Added support for multi-metric shapelet transform

metric_paramsdict, optional

Parameters for the distance measure. Ignored unless metric is a string.

Read more about the parameters in the User guide.

criterion{“squared_error”}, optional

The criterion used to evaluate the utility of a split.

Deprecated since version 1.1: Criterion “mse” was deprecated in v1.1 and removed in version 1.2.

oob_scorebool, optional

Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.

bootstrapbool, optional

If the samples are drawn with replacement.

warm_startbool, optional

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.

n_jobsint, optional

The number of processor cores used for fitting the ensemble.

random_stateint or RandomState, optional

Controls the random resampling of the original dataset.

  • If int, random_state is the seed used by the random number generator

  • If numpy.random.RandomState instance, random_state is the random number generator

  • If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

Examples

>>> from wildboar.ensemble import ShapeletForestRegressor
>>> from wildboar.datasets import load_synthetic_control
>>> x, y = load_synthetic_control()
>>> f = ShapeletForestRegressor(n_estimators=100, metric='scaled_euclidean')
>>> f.fit(x, y)
ShapeletForestRegressor(metric='scaled_euclidean')
>>> y_hat = f.predict(x)
fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

yarray-like of shape (n_samples,)

The target values (class labels in classification, real numbers in regression).

sample_weightarray-like of shape (n_samples,), default=None

Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.

**fit_paramsdict

Parameters to pass to the underlying estimators.

Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:
selfobject

Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

predict(X)[source]#

Predict regression target for X.

The predicted regression target of an input sample is computed as the mean predicted regression targets of the estimators in the ensemble.

Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)

The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.

Returns:
yndarray of shape (n_samples,)

The predicted values.

score(X, y, sample_weight=None)[source]#

Return the coefficient of determination of the prediction.

The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True values for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

\(R^2\) of self.predict(X) w.r.t. y.

Notes

The \(R^2\) score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score. This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

property estimators_samples_[source]#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.