`wildboar.ensemble`#

Ensemble methods for classification, regression and outlier detection.

Classes#

`BaggingClassifier`	A bagging classifier.
`BaggingRegressor`	A bagging regressor.
`BaseBagging`	Base estimator for Wildboar ensemble estimators.
`ElasticEnsembleClassifier`	Ensemble of `wildboar.distance.KNeighborsClassifier`.
`ExtraShapeletTreesClassifier`	An ensemble of extremely random shapelet trees.
`ExtraShapeletTreesRegressor`	An ensemble of extremely random shapelet tree regressors.
`IntervalForestClassifier`	An ensemble of interval tree classifiers.
`IntervalForestRegressor`	An ensemble of interval tree regressors.
`IsolationShapeletForest`	An isolation shapelet forest.
`PivotForestClassifier`	An ensemble of interval tree classifiers.
`ProximityForestClassifier`	A forest of proximity trees.
`RocketForestClassifier`	An ensemble of rocket tree classifiers.
`RocketForestRegressor`	An ensemble of rocket tree regressors.
`ShapeletForestClassifier`	An ensemble of random shapelet tree classifiers.
`ShapeletForestEmbedding`	An ensemble of random shapelet trees.
`ShapeletForestRegressor`	An ensemble of random shapelet tree regressors.

class wildboar.ensemble.BaggingClassifier(estimator=None, n_estimators=10, *, max_samples=1.0, bootstrap=True, oob_score=False, class_weight=None, warm_start=False, n_jobs=None, random_state=None, verbose=0, base_estimator='deprecated')[source]#

A bagging classifier.

A bagging regressor is a meta-estimator that fits base classifiers on random subsets of the original data.

Parameters:

estimatorobject, optional

Base estimator of the ensemble. If None, the base estimator is a ShapeletTreeRegressor.

n_estimatorsint, optional

The number of base estimators in the ensemble.

max_samplesint or float, optional

The number of samples to draw from X to train each base estimator.

if int, then draw max_samples samples.
if float, then draw max_samples * n_samples samples.

bootstrapbool, optional

If the samples are drawn with replacement.

oob_scorebool, optional

Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.

class_weightdict or “balanced”, optional

Weights associated with the labels

if dict, weights on the form {label: weight}.
if “balanced” each class weight inversely proportional to the class frequency.
if None, each class has equal weight.

warm_startbool, optional

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.

n_jobsint, optional

The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.

random_stateint or RandomState, optional

Controls the random resampling of the original dataset.

If int, random_state is the seed used by the random number generator.
If numpy.random.RandomState instance, random_state is the random number generator.
If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

verboseint, optional

Controls the output to standard error while fitting and predicting.

base_estimatorobject, optional

Use estimator instead.

Deprecated since version 1.2: base_estimator has been deprecated and will be removed in 1.4. Use estimator instead.

decision_function(X, **params)[source]#

Average of the decision functions of the base classifiers.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the decision_function method of the sub-estimators via the metadata routing API.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

scorendarray of shape (n_samples, k): The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute classes_. Regression and binary classification are special cases with k == 1, otherwise k==n_classes.

fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
yarray-like of shape (n_samples,): The target values (class labels in classification, real numbers in regression).
sample_weightarray-like of shape (n_samples,), default=None: Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
**fit_paramsdict: Parameters to pass to the underlying estimators.

Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:

selfobject: Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(X, **params)[source]#

Predict class for X.

The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a predict_proba method, then it resorts to voting.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the predict_proba (if available) or the predict method (otherwise) of the sub-estimators via the metadata routing API.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

yndarray of shape (n_samples,): The predicted classes.

predict_log_proba(X, **params)[source]#

Predict class log-probabilities for X.

The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the predict_log_proba, the predict_proba or the proba method of the sub-estimators via the metadata routing API. The routing is tried in the mentioned order depending on whether this method is available on the sub-estimator.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

pndarray of shape (n_samples, n_classes): The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

predict_proba(X)[source]#

Predict class probabilities for X.

The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a predict_proba method, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the predict_proba (if available) or the predict method (otherwise) of the sub-estimators via the metadata routing API.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

pndarray of shape (n_samples, n_classes): The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

score(X, y, sample_weight=None)[source]#

Return accuracy on provided data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:

Xarray-like of shape (n_samples, n_features): Test samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True labels for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns:

scorefloat: Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

property estimators_samples_[source]#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

class wildboar.ensemble.BaggingRegressor(estimator=None, n_estimators=100, *, max_samples=1.0, bootstrap=True, oob_score=False, warm_start=False, n_jobs=None, random_state=None, verbose=0, base_estimator='deprecated')[source]#

A bagging regressor.

A bagging regressor is a meta-estimator that fits base classifiers on random subsets of the original data.

Parameters:

estimatorobject, optional

Base estimator of the ensemble. If None, the base estimator is a ShapeletTreeRegressor.

n_estimatorsint, optional

The number of base estimators in the ensemble.

max_samplesint or float, optional

The number of samples to draw from X to train each base estimator.

if int, then draw max_samples samples.
if float, then draw max_samples * n_samples samples.

bootstrapbool, optional

If the samples are drawn with replacement.

oob_scorebool, optional

Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.

warm_startbool, optional

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.

n_jobsint, optional

The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.

random_stateint or RandomState, optional

Controls the random resampling of the original dataset.

If int, random_state is the seed used by the random number generator.
If numpy.random.RandomState instance, random_state is the random number generator.
If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

verboseint, optional

Controls the output to standard error while fitting and predicting.

base_estimatorobject, optional

Use estimator instead.

Deprecated since version 1.2: base_estimator has been deprecated and will be removed in 1.4. Use estimator instead.

fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
yarray-like of shape (n_samples,): The target values (class labels in classification, real numbers in regression).
sample_weightarray-like of shape (n_samples,), default=None: Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
**fit_paramsdict: Parameters to pass to the underlying estimators.

Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:

selfobject: Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(X)[source]#

Predict regression target for X.

The predicted regression target of an input sample is computed as the mean predicted regression targets of the estimators in the ensemble.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the predict method of the sub-estimators via the metadata routing API.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

yndarray of shape (n_samples,): The predicted values.

score(X, y, sample_weight=None)[source]#

Return coefficient of determination on test data.

The coefficient of determination, \(R^2\), is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters:

Xarray-like of shape (n_samples, n_features): Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True values for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns:

scorefloat: \(R^2\) of self.predict(X) w.r.t. y.

Notes

The \(R^2\) score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score. This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

property estimators_samples_[source]#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

class wildboar.ensemble.BaseBagging(estimator=None, n_estimators=10, *, max_samples=1.0, bootstrap=True, oob_score=False, warm_start=False, n_jobs=None, random_state=None, verbose=0, base_estimator='deprecated')[source]#

Base estimator for Wildboar ensemble estimators.

fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
yarray-like of shape (n_samples,): The target values (class labels in classification, real numbers in regression).
sample_weightarray-like of shape (n_samples,), default=None: Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
**fit_paramsdict: Parameters to pass to the underlying estimators.

Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:

selfobject: Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

property estimators_samples_[source]#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

class wildboar.ensemble.ElasticEnsembleClassifier(n_neighbors=1, *, metric='auto', n_jobs=None)[source]#

Ensemble of wildboar.distance.KNeighborsClassifier.

Each classifier is fitted with an optimized parameter grid over metric parameters.

Parameters:

n_neighborsint, optional

The number of neighbors.

metric{“auto”, “elastic”, “non_elastic”, “all”} or dict, optional

The metric specification.

if “auto” or “elastic”, fit one classifier for each elastic distance as described by Lines and Bagnall (2015). We use a slightly smaller parameter grid.
if “non_elastic”, fit one classifier for each non-elastic distance measure.
if “all”, fit one classifier for the metrics in both “elastic” and “non_elastic”.
if dict, a custom metric specification.

n_jobsint, optional

The number of paralell jobs.

Attributes:

scorestuple: A tuple of metric name and cross-validation score.

References

Jason Lines and Anthony Bagnall,: Time Series Classification with Ensembles of Elastic Distance Measures, Data Mining and Knowledge Discovery, 29(3), 2015.

Examples

>>> from wildboar.datasets import load_gun_point
>>> from wildboar.ensemble import ElasticEnsembleClassifier
>>> X_train, X_test, y_train, y_test = load_gun_point(merge_train_test=False)
>>> clf = ElasticEnsembleClassifier(
...     metric={
...         "dtw": {"min_r": 0.1, "max_r": 0.3},
...         "ddtw": {"min_r": 0.1, "max_r": 0.3},
...     },
... )
>>> clf.fit(X_train, y_train)
ElasticEnsembleClassifier(metric={'ddtw': {'max_r': 0.3, 'min_r': 0.1},
                                  'dtw': {'max_r': 0.3, 'min_r': 0.1}})
>>> clf.score(X_test, y_test)
0.9866666666666667

fit(x, y)[source]#

Fit the estimator.

Parameters:

xarray-like of shape (n_samples, n_timesteps) or (n_samples, n_dim, n_timesteps): The input samples.
yarray-like of shape (n_samples, ): The input labels.

Returns:

object: This estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(x)[source]#

Compute the class label for the samples in x.

Parameters:

xarray-like of shape (n_samples, n_timesteps) or (n_samples, n_dim, n_timesteps): The input samples.

Returns:

ndarray of shape (n_samples, ): The class label for each sample.

predict_proba(x)[source]#

Compute probability estimates for the samples in x.

Parameters:

xarray-like of shape (n_samples, n_timesteps) or (n_samples, n_dim, n_timesteps): The input time series.

Returns:

ndarray of shape (n_samples, n_classes): The probabilities.

score(X, y, sample_weight=None)[source]#

Return accuracy on provided data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:

Xarray-like of shape (n_samples, n_features): Test samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True labels for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns:

scorefloat: Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

class wildboar.ensemble.ExtraShapeletTreesClassifier(n_estimators=100, *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, coverage_probability=None, variability=1, metric='euclidean', metric_params=None, criterion='entropy', oob_score=False, bootstrap=True, warm_start=False, class_weight=None, n_jobs=None, random_state=None)[source]#

An ensemble of extremely random shapelet trees.

Parameters:

n_estimatorsint, optional

The number of estimators.

max_depthint, optional

The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_splitint, optional

The minimum number of samples to split an internal node.

min_samples_leafint, optional

The minimum number of samples in a leaf.

min_impurity_decreasefloat, optional

A split will be introduced only if the impurity decrease is larger than or equal to this value.

min_shapelet_sizefloat, optional

The minimum length of a shapelets expressed as a fraction of n_timestep.

max_shapelet_sizefloat, optional

The maximum length of a shapelets expressed as a fraction of n_timestep.

coverage_probabilityfloat, optional

The probability that a time step is covered by a shapelet, in the range 0 < coverage_probability <= 1.

For larger coverage_probability, we get larger shapelets.
For smaller coverage_probability, we get shorter shapelets.

variabilityfloat, optional

Controls the shape of the Beta distribution used to sample shapelets. Defaults to 1.

Higher variability creates more uniform intervals.
Lower variability creates more variable intervals sizes.

metricstr or list, optional

The distance metric.

If str, the distance metric used to identify the best shapelet.
If list, multiple metrics specified as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument r with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).

Read more about metric specifications in the User guide.

Changed in version 1.2: Added support for multi-metric shapelet transform

metric_paramsdict, optional

Parameters for the distance measure. Ignored unless metric is a string.

Read more about the parameters in the User guide.

criterion{“entropy”, “gini”}, optional

The criterion used to evaluate the utility of a split.

oob_scorebool, optional

Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.

bootstrapbool, optional

If the samples are drawn with replacement.

warm_startbool, optional

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.

class_weightdict or “balanced”, optional

Weights associated with the labels

if dict, weights on the form {label: weight}.
if “balanced” each class weight inversely proportional to the class frequency.
if None, each class has equal weight.

n_jobsint, optional

The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.

random_stateint or RandomState, optional

Controls the random resampling of the original dataset.

If int, random_state is the seed used by the random number generator
If numpy.random.RandomState instance, random_state is the random number generator
If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

Examples

>>> from wildboar.ensemble import ExtraShapeletTreesClassifier
>>> from wildboar.datasets import load_synthetic_control
>>> x, y = load_synthetic_control()
>>> f = ExtraShapeletTreesClassifier(n_estimators=100, metric='scaled_euclidean')
>>> f.fit(x, y)
ExtraShapeletTreesClassifier(metric='scaled_euclidean')
>>> y_hat = f.predict(x)

decision_function(X, **params)[source]#

Average of the decision functions of the base classifiers.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the decision_function method of the sub-estimators via the metadata routing API.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

scorendarray of shape (n_samples, k): The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute classes_. Regression and binary classification are special cases with k == 1, otherwise k==n_classes.

fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
yarray-like of shape (n_samples,): The target values (class labels in classification, real numbers in regression).
sample_weightarray-like of shape (n_samples,), default=None: Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
**fit_paramsdict: Parameters to pass to the underlying estimators.

Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:

selfobject: Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(X, **params)[source]#

Predict class for X.

The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a predict_proba method, then it resorts to voting.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the predict_proba (if available) or the predict method (otherwise) of the sub-estimators via the metadata routing API.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

yndarray of shape (n_samples,): The predicted classes.

predict_log_proba(X, **params)[source]#

Predict class log-probabilities for X.

The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the predict_log_proba, the predict_proba or the proba method of the sub-estimators via the metadata routing API. The routing is tried in the mentioned order depending on whether this method is available on the sub-estimator.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

pndarray of shape (n_samples, n_classes): The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

predict_proba(X)[source]#

Predict class probabilities for X.

The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a predict_proba method, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the predict_proba (if available) or the predict method (otherwise) of the sub-estimators via the metadata routing API.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

pndarray of shape (n_samples, n_classes): The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

score(X, y, sample_weight=None)[source]#

Return accuracy on provided data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:

Xarray-like of shape (n_samples, n_features): Test samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True labels for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns:

scorefloat: Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

property estimators_samples_[source]#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

class wildboar.ensemble.ExtraShapeletTreesRegressor(n_estimators=100, *, max_depth=None, min_samples_split=2, min_shapelet_size=0, max_shapelet_size=1, coverage_probability=None, variability=1, metric='euclidean', metric_params=None, criterion='squared_error', oob_score=False, bootstrap=True, warm_start=False, n_jobs=None, random_state=None)[source]#

An ensemble of extremely random shapelet tree regressors.

Parameters:

n_estimatorsint, optional

The number of estimators.

max_depthint, optional

The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_splitint, optional

The minimum number of samples to split an internal node.

min_shapelet_sizefloat, optional

The minimum length of a shapelets expressed as a fraction of n_timestep.

max_shapelet_sizefloat, optional

The maximum length of a shapelets expressed as a fraction of n_timestep.

coverage_probabilityfloat, optional

The probability that a time step is covered by a shapelet, in the range 0 < coverage_probability <= 1.

For larger coverage_probability, we get larger shapelets.
For smaller coverage_probability, we get shorter shapelets.

variabilityfloat, optional

Controls the shape of the Beta distribution used to sample shapelets. Defaults to 1.

Higher variability creates more uniform intervals.
Lower variability creates more variable intervals sizes.

metricstr or list, optional

The distance metric.

If str, the distance metric used to identify the best shapelet.
If list, multiple metrics specified as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument r with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).

Read more about metric specifications in the User guide.

Changed in version 1.2: Added support for multi-metric shapelet transform

metric_paramsdict, optional

Parameters for the distance measure. Ignored unless metric is a string.

Read more about the parameters in the User guide.

criterion{“squared_error”}, optional

The criterion used to evaluate the utility of a split.

Deprecated since version 1.1: Criterion “mse” was deprecated in v1.1 and removed in version 1.2.

oob_scorebool, optional

Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.

bootstrapbool, optional

If the samples are drawn with replacement.

warm_startbool, optional

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.

n_jobsint, optional

The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.

random_stateint or RandomState, optional

Controls the random resampling of the original dataset.

If int, random_state is the seed used by the random number generator
If numpy.random.RandomState instance, random_state is the random number generator
If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

Examples

>>> from wildboar.ensemble import ExtraShapeletTreesRegressor
>>> from wildboar.datasets import load_synthetic_control
>>> x, y = load_synthetic_control()
>>> f = ExtraShapeletTreesRegressor(n_estimators=100, metric='scaled_euclidean')
>>> f.fit(x, y)
ExtraShapeletTreesRegressor(metric='scaled_euclidean')
>>> y_hat = f.predict(x)

fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
yarray-like of shape (n_samples,): The target values (class labels in classification, real numbers in regression).
sample_weightarray-like of shape (n_samples,), default=None: Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
**fit_paramsdict: Parameters to pass to the underlying estimators.

Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:

selfobject: Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(X)[source]#

Predict regression target for X.

The predicted regression target of an input sample is computed as the mean predicted regression targets of the estimators in the ensemble.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the predict method of the sub-estimators via the metadata routing API.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

yndarray of shape (n_samples,): The predicted values.

score(X, y, sample_weight=None)[source]#

Return coefficient of determination on test data.

The coefficient of determination, \(R^2\), is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters:

Xarray-like of shape (n_samples, n_features): Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True values for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns:

scorefloat: \(R^2\) of self.predict(X) w.r.t. y.

Notes

The \(R^2\) score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score. This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

property estimators_samples_[source]#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

class wildboar.ensemble.IntervalForestClassifier(n_estimators=100, *, n_intervals='sqrt', intervals='random', summarizer='mean_var_slope', sample_size=0.5, min_size=0.0, max_size=1.0, oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0, criterion='entropy', bootstrap=True, warm_start=False, n_jobs=None, class_weight=None, random_state=None)[source]#

An ensemble of interval tree classifiers.

Parameters:

n_estimatorsint, optional

The number of estimators.

n_intervalsstr, int or float, optional

The number of intervals to use for the transform.

if “log2”, the number of intervals is log2(n_timestep).
if “sqrt”, the number of intervals is sqrt(n_timestep).
if int, the number of intervals is n_intervals.
if float, the number of intervals is n_intervals * n_timestep, with 0 < n_intervals < 1.

Deprecated since version 1.2: The option “log” has been renamed to “log2”.

intervalsstr, optional

The method for selecting intervals.

if “fixed”, n_intervals non-overlapping intervals.
if “random”, n_intervals possibly overlapping intervals of randomly sampled in [min_size * n_timestep, max_size * n_timestep].

Deprecated since version 1.3: The option “sample” has been deprecated. Use “fixed” with sample_size.

summarizerstr or list, optional

The method to summarize each interval.

if str, the summarizer is determined by _SUMMARIZERS.keys().
if list, the summarizer is a list of functions f(x) -> float, where x is a numpy array.

The default summarizer summarizes each interval as its mean, standard deviation and slope.

sample_sizefloat, optional

The sub-sample fixed intervals.

min_sizefloat, optional

The minimum interval size if intervals=”random”.

max_sizefloat, optional

The maximum interval size if intervals=”random”.

oob_scorebool, optional

Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.

max_depthint, optional

The maximum tree depth.

min_samples_splitint, optional

The minimum number of samples to consider a split.

min_samples_leafint, optional

The minimum number of samples in a leaf.

min_impurity_decreasefloat, optional

The minimum impurity decrease to build a sub-tree.

criterion{“entropy”, “gini”}, optional

The impurity criterion.

bootstrapbool, optional

If the samples are drawn with replacement.

warm_startbool, optional

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.

n_jobsint, optional

The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.

class_weightdict or “balanced”, optional

Weights associated with the labels.

if dict, weights on the form {label: weight}.
if “balanced” each class weight inversely proportional to the class
frequency.
if None, each class has equal weight.

random_stateint or RandomState, optional

If int, random_state is the seed used by the random number generator
If RandomState instance, random_state is the random number generator
If None, the random number generator is the RandomState instance used by np.random.

decision_function(X, **params)[source]#

Average of the decision functions of the base classifiers.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the decision_function method of the sub-estimators via the metadata routing API.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

scorendarray of shape (n_samples, k): The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute classes_. Regression and binary classification are special cases with k == 1, otherwise k==n_classes.

fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
yarray-like of shape (n_samples,): The target values (class labels in classification, real numbers in regression).
sample_weightarray-like of shape (n_samples,), default=None: Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
**fit_paramsdict: Parameters to pass to the underlying estimators.

Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:

selfobject: Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(X, **params)[source]#

Predict class for X.

The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a predict_proba method, then it resorts to voting.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the predict_proba (if available) or the predict method (otherwise) of the sub-estimators via the metadata routing API.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

yndarray of shape (n_samples,): The predicted classes.

predict_log_proba(X, **params)[source]#

Predict class log-probabilities for X.

The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the predict_log_proba, the predict_proba or the proba method of the sub-estimators via the metadata routing API. The routing is tried in the mentioned order depending on whether this method is available on the sub-estimator.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

pndarray of shape (n_samples, n_classes): The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

predict_proba(X)[source]#

Predict class probabilities for X.

The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a predict_proba method, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the predict_proba (if available) or the predict method (otherwise) of the sub-estimators via the metadata routing API.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

pndarray of shape (n_samples, n_classes): The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

score(X, y, sample_weight=None)[source]#

Return accuracy on provided data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:

Xarray-like of shape (n_samples, n_features): Test samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True labels for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns:

scorefloat: Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

property estimators_samples_[source]#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

class wildboar.ensemble.IntervalForestRegressor(n_estimators=100, *, n_intervals='sqrt', intervals='fixed', summarizer='mean_var_slope', sample_size=0.5, min_size=0.0, max_size=1.0, oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0, criterion='squared_error', bootstrap=True, warm_start=False, n_jobs=None, random_state=None)[source]#

An ensemble of interval tree regressors.

Parameters:

n_estimatorsint, optional

The number of estimators.

n_intervalsstr, int or float, optional

The number of intervals to use for the transform.

if “log2”, the number of intervals is log2(n_timestep).
if “sqrt”, the number of intervals is sqrt(n_timestep).
if int, the number of intervals is n_intervals.
if float, the number of intervals is n_intervals * n_timestep, with 0 < n_intervals < 1.

Deprecated since version 1.2: The option “log” has been renamed to “log2”.

intervalsstr, optional

The method for selecting intervals.

if “fixed”, n_intervals non-overlapping intervals.
if “sample”, n_intervals * sample_size non-overlapping intervals.
if “random”, n_intervals possibly overlapping intervals of randomly sampled in [min_size * n_timestep, max_size * n_timestep].

summarizerstr or list, optional

The method to summarize each interval.

if str, the summarizer is determined by _SUMMARIZERS.keys().
if list, the summarizer is a list of functions f(x) -> float, where x is a numpy array.

The default summarizer summarizes each interval as its mean, variance and slope.

sample_sizefloat, optional

The sample size of fixed intervals if intervals=”sample”.

min_sizefloat, optional

The minimum interval size if intervals=”random”.

max_sizefloat, optional

The maximum interval size if intervals=”random”.

oob_scorebool, optional

Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.

max_depthint, optional

The maximum tree depth.

min_samples_splitint, optional

The minimum number of samples to consider a split.

min_samples_leafint, optional

The minimum number of samples in a leaf.

min_impurity_decreasefloat, optional

The minimum impurity decrease to build a sub-tree.

criterion{“entropy”, “gini”}, optional

The impurity criterion.

bootstrapbool, optional

If the samples are drawn with replacement.

warm_startbool, optional

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.

n_jobsint, optional

The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.

random_stateint or RandomState, optional

If int, random_state is the seed used by the random number generator
If RandomState instance, random_state is the random number generator
If None, the random number generator is the RandomState instance used by np.random.

fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
yarray-like of shape (n_samples,): The target values (class labels in classification, real numbers in regression).
sample_weightarray-like of shape (n_samples,), default=None: Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
**fit_paramsdict: Parameters to pass to the underlying estimators.

Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:

selfobject: Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(X)[source]#

Predict regression target for X.

The predicted regression target of an input sample is computed as the mean predicted regression targets of the estimators in the ensemble.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the predict method of the sub-estimators via the metadata routing API.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

yndarray of shape (n_samples,): The predicted values.

score(X, y, sample_weight=None)[source]#

Return coefficient of determination on test data.

The coefficient of determination, \(R^2\), is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters:

Xarray-like of shape (n_samples, n_features): Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True values for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns:

scorefloat: \(R^2\) of self.predict(X) w.r.t. y.

Notes

The \(R^2\) score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score. This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

property estimators_samples_[source]#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

class wildboar.ensemble.IsolationShapeletForest(n_estimators=100, *, n_shapelets=1, bootstrap=False, n_jobs=None, min_shapelet_size=0, max_shapelet_size=1, min_samples_split=2, max_samples='auto', contamination='auto', warm_start=False, metric='euclidean', metric_params=None, random_state=None)[source]#

An isolation shapelet forest.

Added in version 0.3.5.

Parameters:

n_estimatorsint, optional

The number of estimators in the ensemble.

n_shapeletsint, optional

The number of shapelets to sample at each node.

bootstrapbool, optional

If the samples are drawn with replacement.

n_jobsint, optional

The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.

min_shapelet_sizefloat, optional

The minimum length of a shapelets expressed as a fraction of n_timestep.

max_shapelet_sizefloat, optional

The maximum length of a shapelets expressed as a fraction of n_timestep.

min_samples_splitint, optional

The minimum number of samples to split an internal node.

max_samples“auto”, float or int, optional

The number of samples to draw to train each base estimator.

contamination‘auto’ or float, optional

The strategy for computing the offset.

if “auto” then offset_ is set to -0.5.
if float offset_ is computed as the c:th percentile of scores.

If bootstrap=True, out-of-bag samples are used for computing the scores.

warm_startbool, optional

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.

metricstr or list, optional

The distance metric.

If str, the distance metric used to identify the best shapelet.
If list, multiple metrics specified as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument r with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).

Read more about metric specifications in the User guide.

Changed in version 1.2: Added support for multi-metric shapelet transform

metric_paramsdict, optional

Parameters for the distance measure. Ignored unless metric is a string.

Read more about the parameters in the User guide.

random_stateint or RandomState, optional

Controls the random resampling of the original dataset.

If int, random_state is the seed used by the random number generator.
If numpy.random.RandomState instance, random_state is the random number generator.
If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

Attributes:

offset_float: The offset for computing the final decision

Examples

Using default offset threshold

>>> from wildboar.ensemble import IsolationShapeletForest
>>> from wildboar.datasets import load_two_lead_ecg
>>> from wildboar.model_selection import outlier_train_test_split
>>> from sklearn.metrics import balanced_accuracy_score
>>> f = IsolationShapeletForest(random_state=1)
>>> x, y = load_two_lead_ecg()
>>> x_train, x_test, y_train, y_test = outlier_train_test_split(
...     x, y, 1, test_size=0.2, anomalies_train_size=0.05, random_state=1
... )
>>> f.fit(x_train)
IsolationShapeletForest(random_state=1)
>>> y_pred = f.predict(x_test)
>>> balanced_accuracy_score(y_test, y_pred) 
0.8674

fit(x, y=None, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
yarray-like of shape (n_samples,): The target values (class labels in classification, real numbers in regression).
sample_weightarray-like of shape (n_samples,), default=None: Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
**fit_paramsdict: Parameters to pass to the underlying estimators.

Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:

selfobject: Fitted estimator.

fit_predict(X, y=None, **kwargs)[source]#

Perform fit on X and returns labels for X.

Returns -1 for outliers and 1 for inliers.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The input samples.
yIgnored: Not used, present for API consistency by convention.
**kwargsdict: Arguments to be passed to fit.

Added in version 1.4.

Returns:

yndarray of shape (n_samples,): 1 for inliers, -1 for outliers.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

property estimators_samples_[source]#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

class wildboar.ensemble.PivotForestClassifier(n_estimators=100, *, n_pivot='sqrt', metrics='all', oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0, criterion='entropy', bootstrap=True, warm_start=False, n_jobs=None, class_weight=None, random_state=None)[source]#

An ensemble of interval tree classifiers.

decision_function(X, **params)[source]#

Average of the decision functions of the base classifiers.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the decision_function method of the sub-estimators via the metadata routing API.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

scorendarray of shape (n_samples, k): The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute classes_. Regression and binary classification are special cases with k == 1, otherwise k==n_classes.

fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
yarray-like of shape (n_samples,): The target values (class labels in classification, real numbers in regression).
sample_weightarray-like of shape (n_samples,), default=None: Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
**fit_paramsdict: Parameters to pass to the underlying estimators.

Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:

selfobject: Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(X, **params)[source]#

Predict class for X.

The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a predict_proba method, then it resorts to voting.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the predict_proba (if available) or the predict method (otherwise) of the sub-estimators via the metadata routing API.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

yndarray of shape (n_samples,): The predicted classes.

predict_log_proba(X, **params)[source]#

Predict class log-probabilities for X.

The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the predict_log_proba, the predict_proba or the proba method of the sub-estimators via the metadata routing API. The routing is tried in the mentioned order depending on whether this method is available on the sub-estimator.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

pndarray of shape (n_samples, n_classes): The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

predict_proba(X)[source]#

Predict class probabilities for X.

The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a predict_proba method, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the predict_proba (if available) or the predict method (otherwise) of the sub-estimators via the metadata routing API.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

pndarray of shape (n_samples, n_classes): The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

score(X, y, sample_weight=None)[source]#

Return accuracy on provided data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:

Xarray-like of shape (n_samples, n_features): Test samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True labels for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns:

scorefloat: Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

property estimators_samples_[source]#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

class wildboar.ensemble.ProximityForestClassifier(n_estimators=100, *, n_pivot=1, pivot_sample='label', metric_sample='weighted', metric='auto', metric_params=None, metric_factories=None, oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0, criterion='entropy', bootstrap=True, warm_start=False, n_jobs=None, class_weight=None, random_state=None)[source]#

A forest of proximity trees.

Parameters:

n_estimatorsint, optional

The number of estimators.

n_pivotint, optional

The number of pivots to sample at each node.

pivot_sample{“label”, “uniform”}, optional

The pivot sampling method.

metric_sample{“uniform”, “weighted”}, optional

The metric sampling method.

metric{“auto”, “default”}, str or list, optional

The distance metrics. By default, we use the parameterization suggested by Lucas et.al (2019).

If “auto”, use the default metric specification, suggested by (Lucas et. al, 2020).
If str, use a single metric or default metric specification.
If list, custom metric specification can be given as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values as well as the number of values in the grid. For example, to specifiy a grid over the argument ‘r’ with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).

Read more about the metrics and their parameters in the User guide.

metric_paramsdict, optional

Parameters for the distance measure. Ignored unless metric is a string.

Read more about the parameters in the User guide.

metric_factoriesdict, optional

A metric specification.

Deprecated since version 1.2: Use the combination of metric and metric params.

oob_scorebool, optional

Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.

max_depthint, optional

The maximum tree depth.

min_samples_splitint, optional

The minimum number of samples to consider a split.

min_samples_leafint, optional

The minimum number of samples in a leaf.

min_impurity_decreasefloat, optional

The minimum impurity decrease to build a sub-tree.

criterion{“entropy”, “gini”}, optional

The impurity criterion.

bootstrapbool, optional

If the samples are drawn with replacement.

warm_startbool, optional

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.

n_jobsint, optional

The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.

class_weightdict or “balanced”, optional

Weights associated with the labels.

if dict, weights on the form {label: weight}.
if “balanced” each class weight inversely proportional to the class
frequency.
if None, each class has equal weight.

random_stateint or RandomState, optional

If int, random_state is the seed used by the random number generator
If RandomState instance, random_state is the random number generator
If None, the random number generator is the RandomState instance used
by np.random.

References

Lucas, Benjamin, Ahmed Shifaz, Charlotte Pelletier, Lachlan O’Neill, Nayyar Zaidi, Bart Goethals, François Petitjean, and Geoffrey I. Webb. (2019): Proximity forest: an effective and scalable distance-based classifier for time series. Data Mining and Knowledge Discovery

decision_function(X, **params)[source]#

Average of the decision functions of the base classifiers.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the decision_function method of the sub-estimators via the metadata routing API.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

scorendarray of shape (n_samples, k): The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute classes_. Regression and binary classification are special cases with k == 1, otherwise k==n_classes.

fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
yarray-like of shape (n_samples,): The target values (class labels in classification, real numbers in regression).
sample_weightarray-like of shape (n_samples,), default=None: Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
**fit_paramsdict: Parameters to pass to the underlying estimators.

Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:

selfobject: Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(X, **params)[source]#

Predict class for X.

The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a predict_proba method, then it resorts to voting.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the predict_proba (if available) or the predict method (otherwise) of the sub-estimators via the metadata routing API.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

yndarray of shape (n_samples,): The predicted classes.

predict_log_proba(X, **params)[source]#

Predict class log-probabilities for X.

The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the predict_log_proba, the predict_proba or the proba method of the sub-estimators via the metadata routing API. The routing is tried in the mentioned order depending on whether this method is available on the sub-estimator.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

pndarray of shape (n_samples, n_classes): The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

predict_proba(X)[source]#

Predict class probabilities for X.

The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a predict_proba method, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the predict_proba (if available) or the predict method (otherwise) of the sub-estimators via the metadata routing API.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

pndarray of shape (n_samples, n_classes): The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

score(X, y, sample_weight=None)[source]#

Return accuracy on provided data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:

Xarray-like of shape (n_samples, n_features): Test samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True labels for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns:

scorefloat: Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

property estimators_samples_[source]#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

class wildboar.ensemble.RocketForestClassifier(n_estimators=100, *, n_kernels=10, oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, sampling='normal', sampling_params=None, kernel_size=None, min_size=None, max_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, criterion='entropy', bootstrap=True, warm_start=False, class_weight=None, n_jobs=None, random_state=None)[source]#

An ensemble of rocket tree classifiers.

Parameters:

n_estimatorsint, optional

The number of estimators.

n_kernelsint, optional

The number of shapelets to sample at each node.

oob_scorebool, optional

Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.

max_depthint, optional

The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_splitint, optional

The minimum number of samples to split an internal node.

min_samples_leafint, optional

The minimum number of samples in a leaf.

min_impurity_decreasefloat, optional

A split will be introduced only if the impurity decrease is larger than or equal to this value.

sampling{“normal”, “uniform”, “shapelet”}, optional

The sampling of convolutional filters.

if “normal”, sample filter according to a normal distribution with
mean and scale.
if “uniform”, sample filter according to a uniform distribution with
lower and upper.
if “shapelet”, sample filters as subsequences in the training data.

sampling_paramsdict, optional

The parameters for the sampling.

if “normal”, {"mean": float, "scale": float}, defaults to
{"mean": 0, "scale": 1}.
if “uniform”, {"lower": float, "upper": float}, defaults to
{"lower": -1, "upper": 1}.

kernel_sizearray-like, optional

The kernel size, by default [7, 11, 13].

min_sizefloat, optional

The minimum length of a shapelets expressed as a fraction of n_timestep.

max_sizefloat, optional

The maximum length of a shapelets expressed as a fraction of n_timestep.

bias_probfloat, optional

The probability of using a bias term.

normalize_probfloat, optional

The probability of performing normalization.

padding_probfloat, optional

The probability of padding with zeros.

criterion{“entropy”, “gini”}, optional

The criterion used to evaluate the utility of a split.

bootstrapbool, optional

If the samples are drawn with replacement.

warm_startbool, optional

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.

class_weightdict or “balanced”, optional

Weights associated with the labels

if dict, weights on the form {label: weight}.
if “balanced” each class weight inversely proportional to the class frequency.
if None, each class has equal weight.

n_jobsint, optional

The number of processor cores used for fitting the ensemble.

random_stateint or RandomState, optional

Controls the random resampling of the original dataset.

If int, random_state is the seed used by the random number generator
If numpy.random.RandomState instance, random_state is the random number generator
If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

decision_function(X, **params)[source]#

Average of the decision functions of the base classifiers.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the decision_function method of the sub-estimators via the metadata routing API.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

scorendarray of shape (n_samples, k): The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute classes_. Regression and binary classification are special cases with k == 1, otherwise k==n_classes.

fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
yarray-like of shape (n_samples,): The target values (class labels in classification, real numbers in regression).
sample_weightarray-like of shape (n_samples,), default=None: Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
**fit_paramsdict: Parameters to pass to the underlying estimators.

Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:

selfobject: Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(X, **params)[source]#

Predict class for X.

The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a predict_proba method, then it resorts to voting.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the predict_proba (if available) or the predict method (otherwise) of the sub-estimators via the metadata routing API.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

yndarray of shape (n_samples,): The predicted classes.

predict_log_proba(X, **params)[source]#

Predict class log-probabilities for X.

The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the predict_log_proba, the predict_proba or the proba method of the sub-estimators via the metadata routing API. The routing is tried in the mentioned order depending on whether this method is available on the sub-estimator.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

pndarray of shape (n_samples, n_classes): The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

predict_proba(X)[source]#

Predict class probabilities for X.

The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a predict_proba method, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the predict_proba (if available) or the predict method (otherwise) of the sub-estimators via the metadata routing API.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

pndarray of shape (n_samples, n_classes): The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

score(X, y, sample_weight=None)[source]#

Return accuracy on provided data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:

Xarray-like of shape (n_samples, n_features): Test samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True labels for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns:

scorefloat: Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

property estimators_samples_[source]#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

class wildboar.ensemble.RocketForestRegressor(n_estimators=100, *, n_kernels=10, oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, impurity_equality_tolerance=None, sampling='normal', sampling_params=None, kernel_size=None, min_size=None, max_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, criterion='squared_error', bootstrap=True, warm_start=False, n_jobs=None, random_state=None)[source]#

An ensemble of rocket tree regressors.

Parameters:

n_estimatorsint, optional

The number of estimators.

n_kernelsint, optional

The number of shapelets to sample at each node.

oob_scorebool, optional

Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.

max_depthint, optional

The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_splitint, optional

The minimum number of samples to split an internal node.

min_samples_leafint, optional

The minimum number of samples in a leaf.

min_impurity_decreasefloat, optional

A split will be introduced only if the impurity decrease is larger than or equal to this value.

sampling{“normal”, “uniform”, “shapelet”}, optional

The sampling of convolutional filters.

if “normal”, sample filter according to a normal distribution with
mean and scale.
if “uniform”, sample filter according to a uniform distribution with
lower and upper.
if “shapelet”, sample filters as subsequences in the training data.

sampling_paramsdict, optional

The parameters for the sampling.

if “normal”, {"mean": float, "scale": float}, defaults to
{"mean": 0, "scale": 1}.
if “uniform”, {"lower": float, "upper": float}, defaults to
{"lower": -1, "upper": 1}.

kernel_sizearray-like, optional

The kernel size, by default [7, 11, 13].

min_sizefloat, optional

The minimum length of a shapelets expressed as a fraction of n_timestep.

max_sizefloat, optional

The maximum length of a shapelets expressed as a fraction of n_timestep.

bias_probfloat, optional

The probability of using a bias term.

normalize_probfloat, optional

The probability of performing normalization.

padding_probfloat, optional

The probability of padding with zeros.

criterion{“squared_error”}, optional

The criterion used to evaluate the utility of a split.

Deprecated since version 1.1: Criterion “mse” was deprecated in v1.1 and removed in version 1.2.

bootstrapbool, optional

If the samples are drawn with replacement.

warm_startbool, optional

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.

n_jobsint, optional

The number of processor cores used for fitting the ensemble.

random_stateint or RandomState, optional

Controls the random resampling of the original dataset.

If int, random_state is the seed used by the random number generator
If numpy.random.RandomState instance, random_state is the random number generator
If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
yarray-like of shape (n_samples,): The target values (class labels in classification, real numbers in regression).
sample_weightarray-like of shape (n_samples,), default=None: Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
**fit_paramsdict: Parameters to pass to the underlying estimators.

Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:

selfobject: Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(X)[source]#

Predict regression target for X.

The predicted regression target of an input sample is computed as the mean predicted regression targets of the estimators in the ensemble.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the predict method of the sub-estimators via the metadata routing API.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

yndarray of shape (n_samples,): The predicted values.

score(X, y, sample_weight=None)[source]#

Return coefficient of determination on test data.

The coefficient of determination, \(R^2\), is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters:

Xarray-like of shape (n_samples, n_features): Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True values for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns:

scorefloat: \(R^2\) of self.predict(X) w.r.t. y.

Notes

The \(R^2\) score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score. This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

property estimators_samples_[source]#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

class wildboar.ensemble.ShapeletForestClassifier(n_estimators=100, *, n_shapelets='log2', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, impurity_equality_tolerance=None, min_shapelet_size=0.0, max_shapelet_size=1.0, alpha=None, coverage_probability=None, variability=1, metric='euclidean', metric_params=None, criterion='entropy', oob_score=False, bootstrap=True, warm_start=False, class_weight=None, n_jobs=None, random_state=None)[source]#

An ensemble of random shapelet tree classifiers.

A forest of randomized shapelet trees.

Parameters:

n_estimatorsint, optional

The number of estimators.

n_shapeletsint, optional

The number of shapelets to sample at each node.

max_depthint, optional

The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_splitint, optional

The minimum number of samples to split an internal node.

min_samples_leafint, optional

The minimum number of samples in a leaf.

min_impurity_decreasefloat, optional

A split will be introduced only if the impurity decrease is larger than or equal to this value.

impurity_equality_tolerancefloat, optional

Tolerance for considering two impurities as equal. If the impurity decrease is the same, we consider the split that maximizes the gap between the sum of distances.

If None, we never consider the separation gap.

Added in version 1.3.

min_shapelet_sizefloat, optional

The minimum length of a shapelets expressed as a fraction of n_timestep.

max_shapelet_sizefloat, optional

The maximum length of a shapelets expressed as a fraction of n_timestep.

coverage_probabilityfloat, optional

The probability that a time step is covered by a shapelet, in the range 0 < coverage_probability <= 1.

For larger coverage_probability, we get larger shapelets.
For smaller coverage_probability, we get shorter shapelets.

variabilityfloat, optional

Controls the shape of the Beta distribution used to sample shapelets. Defaults to 1.

Higher variability creates more uniform intervals.
Lower variability creates more variable intervals sizes.

alphafloat, optional

Dynamically decrease the number of sampled shapelets at each node according to the current depth, i.e.

w = 1 - exp(-abs(alpha) * depth)

if alpha < 0, the number of sampled shapelets decrease from n_shapelets towards 1 with increased depth.
if alpha > 0, the number of sampled shapelets increase from 1 towards n_shapelets with increased depth.
if None, the number of sampled shapelets are the same independeth of depth.

metricstr or list, optional

The distance metric.

If str, the distance metric used to identify the best shapelet.
If list, multiple metrics specified as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument r with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).

Read more about metric specifications in the User guide.

Changed in version 1.2: Added support for multi-metric shapelet transform

metric_paramsdict, optional

Parameters for the distance measure. Ignored unless metric is a string.

Read more about the parameters in the User guide.

criterion{“entropy”, “gini”}, optional

The criterion used to evaluate the utility of a split.

oob_scorebool, optional

Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.

bootstrapbool, optional

If the samples are drawn with replacement.

warm_startbool, optional

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.

class_weightdict or “balanced”, optional

Weights associated with the labels

if dict, weights on the form {label: weight}.
if “balanced” each class weight inversely proportional to the class frequency.
if None, each class has equal weight.

n_jobsint, optional

The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.

random_stateint or RandomState, optional

Controls the random resampling of the original dataset.

If int, random_state is the seed used by the random number generator
If numpy.random.RandomState instance, random_state is the random number generator
If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

Examples

>>> from wildboar.ensemble import ShapeletForestClassifier
>>> from wildboar.datasets import load_synthetic_control
>>> x, y = load_synthetic_control()
>>> f = ShapeletForestClassifier(n_estimators=100, metric='scaled_euclidean')
>>> f.fit(x, y)
ShapeletForestClassifier(metric='scaled_euclidean')
>>> y_hat = f.predict(x)

decision_function(X, **params)[source]#

Average of the decision functions of the base classifiers.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the decision_function method of the sub-estimators via the metadata routing API.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

scorendarray of shape (n_samples, k): The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute classes_. Regression and binary classification are special cases with k == 1, otherwise k==n_classes.

fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
yarray-like of shape (n_samples,): The target values (class labels in classification, real numbers in regression).
sample_weightarray-like of shape (n_samples,), default=None: Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
**fit_paramsdict: Parameters to pass to the underlying estimators.

Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:

selfobject: Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(X, **params)[source]#

Predict class for X.

The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a predict_proba method, then it resorts to voting.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the predict_proba (if available) or the predict method (otherwise) of the sub-estimators via the metadata routing API.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

yndarray of shape (n_samples,): The predicted classes.

predict_log_proba(X, **params)[source]#

Predict class log-probabilities for X.

The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the predict_log_proba, the predict_proba or the proba method of the sub-estimators via the metadata routing API. The routing is tried in the mentioned order depending on whether this method is available on the sub-estimator.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

pndarray of shape (n_samples, n_classes): The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

predict_proba(X)[source]#

Predict class probabilities for X.

The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a predict_proba method, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the predict_proba (if available) or the predict method (otherwise) of the sub-estimators via the metadata routing API.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

pndarray of shape (n_samples, n_classes): The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

score(X, y, sample_weight=None)[source]#

Return accuracy on provided data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:

Xarray-like of shape (n_samples, n_features): Test samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True labels for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns:

scorefloat: Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

property estimators_samples_[source]#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

class wildboar.ensemble.ShapeletForestEmbedding(n_estimators=100, *, n_shapelets=1, max_depth=5, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, coverage_probability=None, variability=1, metric='euclidean', metric_params=None, criterion='squared_error', bootstrap=True, warm_start=False, n_jobs=None, sparse_output=True, random_state=None)[source]#

An ensemble of random shapelet trees.

An unsupervised transformation of a time series dataset to a high-dimensional sparse representation. A time series i indexed by the leaf that it falls into. This leads to a binary coding of a time series with as many ones as trees in the forest.

The dimensionality of the resulting representation is <= n_estimators * 2^max_depth

Parameters:

n_estimatorsint, optional

The number of estimators.

n_shapeletsint, optional

The number of shapelets to sample at each node.

max_depthint, optional

The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_splitint, optional

The minimum number of samples to split an internal node.

min_samples_leafint, optional

The minimum number of samples in a leaf.

min_impurity_decreasefloat, optional

A split will be introduced only if the impurity decrease is larger than or equal to this value.

min_shapelet_sizefloat, optional

The minimum length of a shapelets expressed as a fraction of n_timestep.

max_shapelet_sizefloat, optional

The maximum length of a shapelets expressed as a fraction of n_timestep.

coverage_probabilityfloat, optional

The probability that a time step is covered by a shapelet, in the range 0 < coverage_probability <= 1.

For larger coverage_probability, we get larger shapelets.
For smaller coverage_probability, we get shorter shapelets.

variabilityfloat, optional

Controls the shape of the Beta distribution used to sample shapelets. Defaults to 1.

Higher variability creates more uniform intervals.
Lower variability creates more variable intervals sizes.

metricstr or list, optional

The distance metric.

If str, the distance metric used to identify the best shapelet.
If list, multiple metrics specified as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument r with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).

Read more about metric specifications in the User guide.

Changed in version 1.2: Added support for multi-metric shapelet transform

metric_paramsdict, optional

Parameters for the distance measure. Ignored unless metric is a string.

Read more about the parameters in the User guide.

criterion{“squared_error”}, optional

The criterion used to evaluate the utility of a split.

Deprecated since version 1.1: Criterion “mse” was deprecated in v1.1 and removed in version 1.2.

bootstrapbool, optional

If the samples are drawn with replacement.

warm_startbool, optional

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.

n_jobsint, optional

The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.

sparse_outputbool, optional

Return a sparse CSR-matrix.

random_stateint or RandomState, optional

Controls the random resampling of the original dataset.

If int, random_state is the seed used by the random number generator.
If numpy.random.RandomState instance, random_state is the random number generator.
If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

fit(x, y=None, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
yarray-like of shape (n_samples,): The target values (class labels in classification, real numbers in regression).
sample_weightarray-like of shape (n_samples,), default=None: Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
**fit_paramsdict: Parameters to pass to the underlying estimators.

Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:

selfobject: Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(X)[source]#

Predict regression target for X.

The predicted regression target of an input sample is computed as the mean predicted regression targets of the estimators in the ensemble.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the predict method of the sub-estimators via the metadata routing API.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

yndarray of shape (n_samples,): The predicted values.

score(X, y, sample_weight=None)[source]#

Return coefficient of determination on test data.

The coefficient of determination, \(R^2\), is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters:

Xarray-like of shape (n_samples, n_features): Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True values for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns:

scorefloat: \(R^2\) of self.predict(X) w.r.t. y.

Notes

The \(R^2\) score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score. This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

property estimators_samples_[source]#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

class wildboar.ensemble.ShapeletForestRegressor(n_estimators=100, *, n_shapelets='log2', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, impurity_equality_tolerance=None, min_shapelet_size=0.0, max_shapelet_size=1.0, coverage_probability=None, variability=1, alpha=None, metric='euclidean', metric_params=None, criterion='squared_error', oob_score=False, bootstrap=True, warm_start=False, n_jobs=None, random_state=None)[source]#

An ensemble of random shapelet tree regressors.

Parameters:

n_estimatorsint, optional

The number of estimators.

n_shapeletsint, optional

The number of shapelets to sample at each node.

max_depthint, optional

The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_splitint, optional

The minimum number of samples to split an internal node.

min_samples_leafint, optional

The minimum number of samples in a leaf.

min_impurity_decreasefloat, optional

A split will be introduced only if the impurity decrease is larger than or equal to this value.

impurity_equality_tolerancefloat, optional

Tolerance for considering two impurities as equal. If the impurity decrease is the same, we consider the split that maximizes the gap between the sum of distances.

If None, we never consider the separation gap.

Added in version 1.3.

min_shapelet_sizefloat, optional

The minimum length of a shapelets expressed as a fraction of n_timestep.

max_shapelet_sizefloat, optional

The maximum length of a shapelets expressed as a fraction of n_timestep.

coverage_probabilityfloat, optional

The probability that a time step is covered by a shapelet, in the range 0 < coverage_probability <= 1.

For larger coverage_probability, we get larger shapelets.
For smaller coverage_probability, we get shorter shapelets.

variabilityfloat, optional

Controls the shape of the Beta distribution used to sample shapelets. Defaults to 1.

Higher variability creates more uniform intervals.
Lower variability creates more variable intervals sizes.

alphafloat, optional

Dynamically decrease the number of sampled shapelets at each node according to the current depth, i.e.

w = 1 - exp(-abs(alpha) * depth)

if alpha < 0, the number of sampled shapelets decrease from n_shapelets towards 1 with increased depth.
if alpha > 0, the number of sampled shapelets increase from 1 towards n_shapelets with increased depth.
if None, the number of sampled shapelets are the same independeth of depth.

metricstr or list, optional

The distance metric.

If str, the distance metric used to identify the best shapelet.
If list, multiple metrics specified as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument r with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).

Read more about metric specifications in the User guide.

Changed in version 1.2: Added support for multi-metric shapelet transform

metric_paramsdict, optional

Parameters for the distance measure. Ignored unless metric is a string.

Read more about the parameters in the User guide.

criterion{“squared_error”}, optional

The criterion used to evaluate the utility of a split.

Deprecated since version 1.1: Criterion “mse” was deprecated in v1.1 and removed in version 1.2.

oob_scorebool, optional

Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.

bootstrapbool, optional

If the samples are drawn with replacement.

warm_startbool, optional

When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.

n_jobsint, optional

The number of processor cores used for fitting the ensemble.

random_stateint or RandomState, optional

Controls the random resampling of the original dataset.

If int, random_state is the seed used by the random number generator
If numpy.random.RandomState instance, random_state is the random number generator
If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

Examples

>>> from wildboar.ensemble import ShapeletForestRegressor
>>> from wildboar.datasets import load_synthetic_control
>>> x, y = load_synthetic_control()
>>> f = ShapeletForestRegressor(n_estimators=100, metric='scaled_euclidean')
>>> f.fit(x, y)
ShapeletForestRegressor(metric='scaled_euclidean')
>>> y_hat = f.predict(x)

fit(x, y, sample_weight=None)[source]#

Build a Bagging ensemble of estimators from the training set (X, y).

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
yarray-like of shape (n_samples,): The target values (class labels in classification, real numbers in regression).
sample_weightarray-like of shape (n_samples,), default=None: Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
**fit_paramsdict: Parameters to pass to the underlying estimators.

Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.

Returns:

selfobject: Fitted estimator.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(X)[source]#

Predict regression target for X.

The predicted regression target of an input sample is computed as the mean predicted regression targets of the estimators in the ensemble.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features): The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
**paramsdict: Parameters routed to the predict method of the sub-estimators via the metadata routing API.

Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.

Returns:

yndarray of shape (n_samples,): The predicted values.

score(X, y, sample_weight=None)[source]#

Return coefficient of determination on test data.

The coefficient of determination, \(R^2\), is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters:

Xarray-like of shape (n_samples, n_features): Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True values for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns:

scorefloat: \(R^2\) of self.predict(X) w.r.t. y.

Notes

The \(R^2\) score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score. This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

property estimators_samples_[source]#

The subset of drawn samples for each base estimator.

Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.

Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.

wildboar.ensemble#

Classes#

This Page

`wildboar.ensemble`#