wildboar.ensemble
#
Ensemble methods for classification, regression and outlier detection.
Package Contents#
Classes#
A bagging classifier. |
|
A bagging regressor. |
|
Base estimator for Wildboar ensemble estimators. |
|
Ensemble of |
|
An ensemble of extremely random shapelet trees. |
|
An ensemble of extremely random shapelet tree regressors. |
|
An ensemble of interval tree classifiers. |
|
An ensemble of interval tree regressors. |
|
An isolation shapelet forest. |
|
An ensemble of interval tree classifiers. |
|
A forest of proximity trees. |
|
An ensemble of rocket tree classifiers. |
|
An ensemble of rocket tree regressors. |
|
An ensemble of random shapelet tree classifiers. |
|
An ensemble of random shapelet trees. |
|
An ensemble of random shapelet tree regressors. |
- class wildboar.ensemble.BaggingClassifier(estimator=None, n_estimators=10, *, max_samples=1.0, bootstrap=True, oob_score=False, class_weight=None, warm_start=False, n_jobs=None, random_state=None, verbose=0, base_estimator='deprecated')[source]#
A bagging classifier.
A bagging regressor is a meta-estimator that fits base classifiers on random subsets of the original data.
- Parameters:
- estimatorobject, optional
Base estimator of the ensemble. If None, the base estimator is a
ShapeletTreeRegressor
.- n_estimatorsint, optional
The number of base estimators in the ensemble.
- max_samplesint or float, optional
The number of samples to draw from X to train each base estimator.
if int, then draw max_samples samples.
if float, then draw max_samples * n_samples samples.
- bootstrapbool, optional
If the samples are drawn with replacement.
- oob_scorebool, optional
Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.
- class_weightdict or “balanced”, optional
Weights associated with the labels
if dict, weights on the form {label: weight}.
if “balanced” each class weight inversely proportional to the class frequency.
if None, each class has equal weight.
- warm_startbool, optional
When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
- n_jobsint, optional
The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.
- random_stateint or RandomState, optional
Controls the random resampling of the original dataset.
If int, random_state is the seed used by the random number generator.
If
numpy.random.RandomState
instance, random_state is the random number generator.If None, the random number generator is the
numpy.random.RandomState
instance used bynumpy.random
.
- verboseint, optional
Controls the output to standard error while fitting and predicting.
- base_estimatorobject, optional
Use estimator instead.
Deprecated since version 1.2: base_estimator has been deprecated and will be removed in 1.4. Use estimator instead.
- decision_function(X)[source]#
Average of the decision functions of the base classifiers.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- scorendarray of shape (n_samples, k)
The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute
classes_
. Regression and binary classification are special cases withk == 1
, otherwisek==n_classes
.
- fit(x, y, sample_weight=None)[source]#
Build a Bagging ensemble of estimators from the training set (X, y).
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- yarray-like of shape (n_samples,)
The target values (class labels in classification, real numbers in regression).
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
- **fit_paramsdict
Parameters to pass to the underlying estimators.
Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using
sklearn.set_config(enable_metadata_routing=True)
. See Metadata Routing User Guide for more details.
- Returns:
- selfobject
Fitted estimator.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(X)[source]#
Predict class for X.
The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a
predict_proba
method, then it resorts to voting.- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- yndarray of shape (n_samples,)
The predicted classes.
- predict_log_proba(X)[source]#
Predict class log-probabilities for X.
The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- pndarray of shape (n_samples, n_classes)
The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- predict_proba(X)[source]#
Predict class probabilities for X.
The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a
predict_proba
method, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- pndarray of shape (n_samples, n_classes)
The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- score(X, y, sample_weight=None)[source]#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- property estimators_samples_[source]#
The subset of drawn samples for each base estimator.
Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.
Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.
- class wildboar.ensemble.BaggingRegressor(estimator=None, n_estimators=100, *, max_samples=1.0, bootstrap=True, oob_score=False, warm_start=False, n_jobs=None, random_state=None, verbose=0, base_estimator='deprecated')[source]#
A bagging regressor.
A bagging regressor is a meta-estimator that fits base classifiers on random subsets of the original data.
- Parameters:
- estimatorobject, optional
Base estimator of the ensemble. If None, the base estimator is a
ShapeletTreeRegressor
.- n_estimatorsint, optional
The number of base estimators in the ensemble.
- max_samplesint or float, optional
The number of samples to draw from X to train each base estimator.
if int, then draw max_samples samples.
if float, then draw max_samples * n_samples samples.
- bootstrapbool, optional
If the samples are drawn with replacement.
- oob_scorebool, optional
Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.
- warm_startbool, optional
When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
- n_jobsint, optional
The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.
- random_stateint or RandomState, optional
Controls the random resampling of the original dataset.
If int, random_state is the seed used by the random number generator.
If
numpy.random.RandomState
instance, random_state is the random number generator.If None, the random number generator is the
numpy.random.RandomState
instance used bynumpy.random
.
- verboseint, optional
Controls the output to standard error while fitting and predicting.
- base_estimatorobject, optional
Use estimator instead.
Deprecated since version 1.2: base_estimator has been deprecated and will be removed in 1.4. Use estimator instead.
- fit(x, y, sample_weight=None)[source]#
Build a Bagging ensemble of estimators from the training set (X, y).
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- yarray-like of shape (n_samples,)
The target values (class labels in classification, real numbers in regression).
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
- **fit_paramsdict
Parameters to pass to the underlying estimators.
Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using
sklearn.set_config(enable_metadata_routing=True)
. See Metadata Routing User Guide for more details.
- Returns:
- selfobject
Fitted estimator.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(X)[source]#
Predict regression target for X.
The predicted regression target of an input sample is computed as the mean predicted regression targets of the estimators in the ensemble.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- yndarray of shape (n_samples,)
The predicted values.
- score(X, y, sample_weight=None)[source]#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True values for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
\(R^2\) of
self.predict(X)
w.r.t. y.
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- property estimators_samples_[source]#
The subset of drawn samples for each base estimator.
Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.
Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.
- class wildboar.ensemble.BaseBagging(estimator=None, n_estimators=10, *, max_samples=1.0, bootstrap=True, oob_score=False, warm_start=False, n_jobs=None, random_state=None, verbose=0, base_estimator='deprecated')[source]#
Base estimator for Wildboar ensemble estimators.
- fit(x, y, sample_weight=None)[source]#
Build a Bagging ensemble of estimators from the training set (X, y).
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- yarray-like of shape (n_samples,)
The target values (class labels in classification, real numbers in regression).
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
- **fit_paramsdict
Parameters to pass to the underlying estimators.
Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using
sklearn.set_config(enable_metadata_routing=True)
. See Metadata Routing User Guide for more details.
- Returns:
- selfobject
Fitted estimator.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- property estimators_samples_[source]#
The subset of drawn samples for each base estimator.
Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.
Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.
- class wildboar.ensemble.ElasticEnsembleClassifier(n_neighbors=1, *, metric='auto', n_jobs=None)[source]#
Ensemble of
wildboar.distance.KNeighborsClassifier
.Each classifier is fitted with an optimized parameter grid over metric parameters.
- Parameters:
- n_neighborsint, optional
The number of neighbors.
- metric{“auto”, “elastic”, “non_elastic”, “all”} or dict, optional
The metric specification.
if “auto” or “elastic”, fit one classifier for each elastic distance as described by Lines and Bagnall (2015). We use a slightly smaller parameter grid.
if “non_elastic”, fit one classifier for each non-elastic distance measure.
if “all”, fit one classifier for the metrics in both “elastic” and “non_elastic”.
if dict, a custom metric specification.
- n_jobsint, optional
The number of paralell jobs.
References
- Jason Lines and Anthony Bagnall,
Time Series Classification with Ensembles of Elastic Distance Measures, Data Mining and Knowledge Discovery, 29(3), 2015.
Examples
>>> from wildboar.datasets import load_gun_point >>> from wildboar.ensemble import ElasticEnsembleClassifier >>> X_train, X_test, y_train, y_test = load_gun_point(merge_train_test=False) >>> clf = ElasticEnsembleClassifier( ... metric={ ... "dtw": {"min_r": 0.1, "max_r": 0.3}, ... "ddtw": {"min_r": 0.1, "max_r": 0.3}, ... }, ... ) >>> clf.fit(X_train, y_train) ElasticEnsembleClassifier(metric={'ddtw': {'max_r': 0.3, 'min_r': 0.1}, 'dtw': {'max_r': 0.3, 'min_r': 0.1}}) >>> clf.score(X_test, y_test) 0.9866666666666667
- Attributes:
- scorestuple
A tuple of metric name and cross-validation score.
- fit(x, y)[source]#
Fit the estimator.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps) or (n_samples, n_dim, n_timesteps)
The input samples.
- yarray-like of shape (n_samples, )
The input labels.
- Returns:
- object
This estimator.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(x)[source]#
Compute the class label for the samples in x.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps) or (n_samples, n_dim, n_timesteps)
The input samples.
- Returns:
- ndarray of shape (n_samples, )
The class label for each sample.
- predict_proba(x)[source]#
Compute probability estimates for the samples in x.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps) or (n_samples, n_dim, n_timesteps)
The input time series.
- Returns:
- ndarray of shape (n_samples, n_classes)
The probabilities.
- score(X, y, sample_weight=None)[source]#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.ensemble.ExtraShapeletTreesClassifier(n_estimators=100, *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, metric='euclidean', metric_params=None, criterion='entropy', oob_score=False, bootstrap=True, warm_start=False, class_weight=None, n_jobs=None, random_state=None)[source]#
An ensemble of extremely random shapelet trees.
- Parameters:
- n_estimatorsint, optional
The number of estimators.
- max_depthint, optional
The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
- min_samples_splitint, optional
The minimum number of samples to split an internal node.
- min_samples_leafint, optional
The minimum number of samples in a leaf.
- min_impurity_decreasefloat, optional
A split will be introduced only if the impurity decrease is larger than or equal to this value.
- min_shapelet_sizefloat, optional
The minimum length of a shapelets expressed as a fraction of n_timestep.
- max_shapelet_sizefloat, optional
The maximum length of a shapelets expressed as a fraction of n_timestep.
- metricstr or list, optional
The distance metric.
If str, the distance metric used to identify the best shapelet.
If list, multiple metrics specified as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument r with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).
Read more about metric specifications in the User guide.
Changed in version 1.2: Added support for multi-metric shapelet transform
- metric_paramsdict, optional
Parameters for the distance measure. Ignored unless metric is a string.
Read more about the parameters in the User guide.
- criterion{“entropy”, “gini”}, optional
The criterion used to evaluate the utility of a split.
- oob_scorebool, optional
Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.
- bootstrapbool, optional
If the samples are drawn with replacement.
- warm_startbool, optional
When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
- class_weightdict or “balanced”, optional
Weights associated with the labels
if dict, weights on the form {label: weight}.
if “balanced” each class weight inversely proportional to the class frequency.
if
None
, each class has equal weight.
- n_jobsint, optional
The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.
- random_stateint or RandomState, optional
Controls the random resampling of the original dataset.
If int, random_state is the seed used by the random number generator
If
numpy.random.RandomState
instance, random_state is the random number generatorIf None, the random number generator is the
numpy.random.RandomState
instance used bynumpy.random
.
Examples
>>> from wildboar.ensemble import ExtraShapeletTreesClassifier >>> from wildboar.datasets import load_synthetic_control >>> x, y = load_synthetic_control() >>> f = ExtraShapeletTreesClassifier(n_estimators=100, metric='scaled_euclidean') >>> f.fit(x, y) ExtraShapeletTreesClassifier(metric='scaled_euclidean') >>> y_hat = f.predict(x)
- decision_function(X)[source]#
Average of the decision functions of the base classifiers.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- scorendarray of shape (n_samples, k)
The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute
classes_
. Regression and binary classification are special cases withk == 1
, otherwisek==n_classes
.
- fit(x, y, sample_weight=None)[source]#
Build a Bagging ensemble of estimators from the training set (X, y).
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- yarray-like of shape (n_samples,)
The target values (class labels in classification, real numbers in regression).
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
- **fit_paramsdict
Parameters to pass to the underlying estimators.
Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using
sklearn.set_config(enable_metadata_routing=True)
. See Metadata Routing User Guide for more details.
- Returns:
- selfobject
Fitted estimator.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(X)[source]#
Predict class for X.
The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a
predict_proba
method, then it resorts to voting.- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- yndarray of shape (n_samples,)
The predicted classes.
- predict_log_proba(X)[source]#
Predict class log-probabilities for X.
The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- pndarray of shape (n_samples, n_classes)
The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- predict_proba(X)[source]#
Predict class probabilities for X.
The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a
predict_proba
method, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- pndarray of shape (n_samples, n_classes)
The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- score(X, y, sample_weight=None)[source]#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- property estimators_samples_[source]#
The subset of drawn samples for each base estimator.
Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.
Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.
- class wildboar.ensemble.ExtraShapeletTreesRegressor(n_estimators=100, *, max_depth=None, min_samples_split=2, min_shapelet_size=0, max_shapelet_size=1, metric='euclidean', metric_params=None, criterion='squared_error', oob_score=False, bootstrap=True, warm_start=False, n_jobs=None, random_state=None)[source]#
An ensemble of extremely random shapelet tree regressors.
- Parameters:
- n_estimatorsint, optional
The number of estimators.
- max_depthint, optional
The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
- min_samples_splitint, optional
The minimum number of samples to split an internal node.
- min_shapelet_sizefloat, optional
The minimum length of a shapelets expressed as a fraction of n_timestep.
- max_shapelet_sizefloat, optional
The maximum length of a shapelets expressed as a fraction of n_timestep.
- metricstr or list, optional
The distance metric.
If str, the distance metric used to identify the best shapelet.
If list, multiple metrics specified as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument r with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).
Read more about metric specifications in the User guide.
Changed in version 1.2: Added support for multi-metric shapelet transform
- metric_paramsdict, optional
Parameters for the distance measure. Ignored unless metric is a string.
Read more about the parameters in the User guide.
- criterion{“squared_error”}, optional
The criterion used to evaluate the utility of a split.
Deprecated since version 1.1: Criterion “mse” was deprecated in v1.1 and removed in version 1.2.
- oob_scorebool, optional
Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.
- bootstrapbool, optional
If the samples are drawn with replacement.
- warm_startbool, optional
When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
- n_jobsint, optional
The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.
- random_stateint or RandomState, optional
Controls the random resampling of the original dataset.
If int, random_state is the seed used by the random number generator
If
numpy.random.RandomState
instance, random_state is the random number generatorIf None, the random number generator is the
numpy.random.RandomState
instance used bynumpy.random
.
Examples
>>> from wildboar.ensemble import ExtraShapeletTreesRegressor >>> from wildboar.datasets import load_synthetic_control >>> x, y = load_synthetic_control() >>> f = ExtraShapeletTreesRegressor(n_estimators=100, metric='scaled_euclidean') >>> f.fit(x, y) ExtraShapeletTreesRegressor(metric='scaled_euclidean') >>> y_hat = f.predict(x)
- fit(x, y, sample_weight=None)[source]#
Build a Bagging ensemble of estimators from the training set (X, y).
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- yarray-like of shape (n_samples,)
The target values (class labels in classification, real numbers in regression).
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
- **fit_paramsdict
Parameters to pass to the underlying estimators.
Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using
sklearn.set_config(enable_metadata_routing=True)
. See Metadata Routing User Guide for more details.
- Returns:
- selfobject
Fitted estimator.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(X)[source]#
Predict regression target for X.
The predicted regression target of an input sample is computed as the mean predicted regression targets of the estimators in the ensemble.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- yndarray of shape (n_samples,)
The predicted values.
- score(X, y, sample_weight=None)[source]#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True values for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
\(R^2\) of
self.predict(X)
w.r.t. y.
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- property estimators_samples_[source]#
The subset of drawn samples for each base estimator.
Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.
Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.
- class wildboar.ensemble.IntervalForestClassifier(n_estimators=100, *, n_intervals='sqrt', intervals='fixed', summarizer='mean_var_std', sample_size=0.5, min_size=0.0, max_size=1.0, oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0, criterion='entropy', bootstrap=True, warm_start=False, n_jobs=None, class_weight=None, random_state=None)[source]#
An ensemble of interval tree classifiers.
- decision_function(X)[source]#
Average of the decision functions of the base classifiers.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- scorendarray of shape (n_samples, k)
The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute
classes_
. Regression and binary classification are special cases withk == 1
, otherwisek==n_classes
.
- fit(x, y, sample_weight=None)[source]#
Build a Bagging ensemble of estimators from the training set (X, y).
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- yarray-like of shape (n_samples,)
The target values (class labels in classification, real numbers in regression).
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
- **fit_paramsdict
Parameters to pass to the underlying estimators.
Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using
sklearn.set_config(enable_metadata_routing=True)
. See Metadata Routing User Guide for more details.
- Returns:
- selfobject
Fitted estimator.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(X)[source]#
Predict class for X.
The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a
predict_proba
method, then it resorts to voting.- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- yndarray of shape (n_samples,)
The predicted classes.
- predict_log_proba(X)[source]#
Predict class log-probabilities for X.
The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- pndarray of shape (n_samples, n_classes)
The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- predict_proba(X)[source]#
Predict class probabilities for X.
The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a
predict_proba
method, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- pndarray of shape (n_samples, n_classes)
The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- score(X, y, sample_weight=None)[source]#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- property estimators_samples_[source]#
The subset of drawn samples for each base estimator.
Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.
Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.
- class wildboar.ensemble.IntervalForestRegressor(n_estimators=100, *, n_intervals='sqrt', intervals='fixed', summarizer='auto', sample_size=0.5, min_size=0.0, max_size=1.0, oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0, criterion='squared_error', bootstrap=True, warm_start=False, n_jobs=None, random_state=None)[source]#
An ensemble of interval tree regressors.
- fit(x, y, sample_weight=None)[source]#
Build a Bagging ensemble of estimators from the training set (X, y).
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- yarray-like of shape (n_samples,)
The target values (class labels in classification, real numbers in regression).
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
- **fit_paramsdict
Parameters to pass to the underlying estimators.
Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using
sklearn.set_config(enable_metadata_routing=True)
. See Metadata Routing User Guide for more details.
- Returns:
- selfobject
Fitted estimator.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(X)[source]#
Predict regression target for X.
The predicted regression target of an input sample is computed as the mean predicted regression targets of the estimators in the ensemble.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- yndarray of shape (n_samples,)
The predicted values.
- score(X, y, sample_weight=None)[source]#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True values for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
\(R^2\) of
self.predict(X)
w.r.t. y.
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- property estimators_samples_[source]#
The subset of drawn samples for each base estimator.
Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.
Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.
- class wildboar.ensemble.IsolationShapeletForest(n_estimators=100, *, n_shapelets=1, bootstrap=False, n_jobs=None, min_shapelet_size=0, max_shapelet_size=1, min_samples_split=2, max_samples='auto', contamination='auto', warm_start=False, metric='euclidean', metric_params=None, random_state=None)[source]#
An isolation shapelet forest.
Added in version 0.3.5.
- Parameters:
- n_estimatorsint, optional
The number of estimators in the ensemble.
- n_shapeletsint, optional
The number of shapelets to sample at each node.
- bootstrapbool, optional
If the samples are drawn with replacement.
- n_jobsint, optional
The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.
- min_shapelet_sizefloat, optional
The minimum length of a shapelets expressed as a fraction of n_timestep.
- max_shapelet_sizefloat, optional
The maximum length of a shapelets expressed as a fraction of n_timestep.
- min_samples_splitint, optional
The minimum number of samples to split an internal node.
- max_samples“auto”, float or int, optional
The number of samples to draw to train each base estimator.
- contamination‘auto’ or float, optional
The strategy for computing the offset.
if “auto” then offset_ is set to -0.5.
if float offset_ is computed as the c:th percentile of scores.
If bootstrap=True, out-of-bag samples are used for computing the scores.
- warm_startbool, optional
When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
- metricstr or list, optional
The distance metric.
If str, the distance metric used to identify the best shapelet.
If list, multiple metrics specified as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument r with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).
Read more about metric specifications in the User guide.
Changed in version 1.2: Added support for multi-metric shapelet transform
- metric_paramsdict, optional
Parameters for the distance measure. Ignored unless metric is a string.
Read more about the parameters in the User guide.
- random_stateint or RandomState, optional
Controls the random resampling of the original dataset.
If int, random_state is the seed used by the random number generator.
If
numpy.random.RandomState
instance, random_state is the random number generator.If None, the random number generator is the
numpy.random.RandomState
instance used bynumpy.random
.
Examples
Using default offset threshold
>>> from wildboar.ensemble import IsolationShapeletForest >>> from wildboar.datasets import load_two_lead_ecg >>> from wildboar.model_selection import outlier_train_test_split >>> from sklearn.metrics import balanced_accuracy_score >>> f = IsolationShapeletForest(random_state=1) >>> x, y = load_two_lead_ecg() >>> x_train, x_test, y_train, y_test = outlier_train_test_split( ... x, y, 1, test_size=0.2, anomalies_train_size=0.05, random_state=1 ... ) >>> f.fit(x_train) IsolationShapeletForest(random_state=1) >>> y_pred = f.predict(x_test) >>> balanced_accuracy_score(y_test, y_pred) 0.8674
- Attributes:
- offset_float
The offset for computing the final decision
- fit(x, y=None, sample_weight=None)[source]#
Build a Bagging ensemble of estimators from the training set (X, y).
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- yarray-like of shape (n_samples,)
The target values (class labels in classification, real numbers in regression).
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
- **fit_paramsdict
Parameters to pass to the underlying estimators.
Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using
sklearn.set_config(enable_metadata_routing=True)
. See Metadata Routing User Guide for more details.
- Returns:
- selfobject
Fitted estimator.
- fit_predict(X, y=None, **kwargs)[source]#
Perform fit on X and returns labels for X.
Returns -1 for outliers and 1 for inliers.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The input samples.
- yIgnored
Not used, present for API consistency by convention.
- **kwargsdict
Arguments to be passed to
fit
.Added in version 1.4.
- Returns:
- yndarray of shape (n_samples,)
1 for inliers, -1 for outliers.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- property estimators_samples_[source]#
The subset of drawn samples for each base estimator.
Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.
Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.
- class wildboar.ensemble.PivotForestClassifier(n_estimators=100, *, n_pivot='sqrt', metrics='all', oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0, criterion='entropy', bootstrap=True, warm_start=False, n_jobs=None, class_weight=None, random_state=None)[source]#
An ensemble of interval tree classifiers.
- decision_function(X)[source]#
Average of the decision functions of the base classifiers.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- scorendarray of shape (n_samples, k)
The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute
classes_
. Regression and binary classification are special cases withk == 1
, otherwisek==n_classes
.
- fit(x, y, sample_weight=None)[source]#
Build a Bagging ensemble of estimators from the training set (X, y).
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- yarray-like of shape (n_samples,)
The target values (class labels in classification, real numbers in regression).
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
- **fit_paramsdict
Parameters to pass to the underlying estimators.
Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using
sklearn.set_config(enable_metadata_routing=True)
. See Metadata Routing User Guide for more details.
- Returns:
- selfobject
Fitted estimator.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(X)[source]#
Predict class for X.
The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a
predict_proba
method, then it resorts to voting.- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- yndarray of shape (n_samples,)
The predicted classes.
- predict_log_proba(X)[source]#
Predict class log-probabilities for X.
The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- pndarray of shape (n_samples, n_classes)
The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- predict_proba(X)[source]#
Predict class probabilities for X.
The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a
predict_proba
method, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- pndarray of shape (n_samples, n_classes)
The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- score(X, y, sample_weight=None)[source]#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- property estimators_samples_[source]#
The subset of drawn samples for each base estimator.
Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.
Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.
- class wildboar.ensemble.ProximityForestClassifier(n_estimators=100, *, n_pivot=1, pivot_sample='label', metric_sample='weighted', metric='auto', metric_params=None, metric_factories=None, oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0, criterion='entropy', bootstrap=True, warm_start=False, n_jobs=None, class_weight=None, random_state=None)[source]#
A forest of proximity trees.
- Parameters:
- n_estimatorsint, optional
The number of estimators.
- n_pivotint, optional
The number of pivots to sample at each node.
- pivot_sample{“label”, “uniform”}, optional
The pivot sampling method.
- metric_sample{“uniform”, “weighted”}, optional
The metric sampling method.
- metric{“auto”, “default”}, str or list, optional
The distance metrics. By default, we use the parameterization suggested by Lucas et.al (2019).
If “auto”, use the default metric specification, suggested by (Lucas et. al, 2020).
If str, use a single metric or default metric specification.
If list, custom metric specification can be given as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values as well as the number of values in the grid. For example, to specifiy a grid over the argument ‘r’ with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).
Read more about the metrics and their parameters in the User guide.
- metric_paramsdict, optional
Parameters for the distance measure. Ignored unless metric is a string.
Read more about the parameters in the User guide.
- metric_factoriesdict, optional
A metric specification.
Deprecated since version 1.2: Use the combination of metric and metric params.
- oob_scorebool, optional
Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.
- max_depthint, optional
The maximum tree depth.
- min_samples_splitint, optional
The minimum number of samples to consider a split.
- min_samples_leafint, optional
The minimum number of samples in a leaf.
- min_impurity_decreasefloat, optional
The minimum impurity decrease to build a sub-tree.
- criterion{“entropy”, “gini”}, optional
The impurity criterion.
- bootstrapbool, optional
If the samples are drawn with replacement.
- warm_startbool, optional
When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
- n_jobsint, optional
The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.
- class_weightdict or “balanced”, optional
Weights associated with the labels.
if dict, weights on the form {label: weight}.
- if “balanced” each class weight inversely proportional to the class
frequency.
if None, each class has equal weight.
- random_stateint or RandomState, optional
If int, random_state is the seed used by the random number generator
If RandomState instance, random_state is the random number generator
- If None, the random number generator is the RandomState instance used
by np.random.
References
- Lucas, Benjamin, Ahmed Shifaz, Charlotte Pelletier, Lachlan O’Neill, Nayyar Zaidi, Bart Goethals, François Petitjean, and Geoffrey I. Webb. (2019)
Proximity forest: an effective and scalable distance-based classifier for time series. Data Mining and Knowledge Discovery
- decision_function(X)[source]#
Average of the decision functions of the base classifiers.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- scorendarray of shape (n_samples, k)
The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute
classes_
. Regression and binary classification are special cases withk == 1
, otherwisek==n_classes
.
- fit(x, y, sample_weight=None)[source]#
Build a Bagging ensemble of estimators from the training set (X, y).
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- yarray-like of shape (n_samples,)
The target values (class labels in classification, real numbers in regression).
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
- **fit_paramsdict
Parameters to pass to the underlying estimators.
Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using
sklearn.set_config(enable_metadata_routing=True)
. See Metadata Routing User Guide for more details.
- Returns:
- selfobject
Fitted estimator.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(X)[source]#
Predict class for X.
The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a
predict_proba
method, then it resorts to voting.- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- yndarray of shape (n_samples,)
The predicted classes.
- predict_log_proba(X)[source]#
Predict class log-probabilities for X.
The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- pndarray of shape (n_samples, n_classes)
The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- predict_proba(X)[source]#
Predict class probabilities for X.
The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a
predict_proba
method, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- pndarray of shape (n_samples, n_classes)
The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- score(X, y, sample_weight=None)[source]#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- property estimators_samples_[source]#
The subset of drawn samples for each base estimator.
Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.
Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.
- class wildboar.ensemble.RocketForestClassifier(n_estimators=100, *, n_kernels=10, oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, sampling='normal', sampling_params=None, kernel_size=None, min_size=None, max_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, criterion='entropy', bootstrap=True, warm_start=False, class_weight=None, n_jobs=None, random_state=None)[source]#
An ensemble of rocket tree classifiers.
- decision_function(X)[source]#
Average of the decision functions of the base classifiers.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- scorendarray of shape (n_samples, k)
The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute
classes_
. Regression and binary classification are special cases withk == 1
, otherwisek==n_classes
.
- fit(x, y, sample_weight=None)[source]#
Build a Bagging ensemble of estimators from the training set (X, y).
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- yarray-like of shape (n_samples,)
The target values (class labels in classification, real numbers in regression).
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
- **fit_paramsdict
Parameters to pass to the underlying estimators.
Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using
sklearn.set_config(enable_metadata_routing=True)
. See Metadata Routing User Guide for more details.
- Returns:
- selfobject
Fitted estimator.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(X)[source]#
Predict class for X.
The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a
predict_proba
method, then it resorts to voting.- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- yndarray of shape (n_samples,)
The predicted classes.
- predict_log_proba(X)[source]#
Predict class log-probabilities for X.
The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- pndarray of shape (n_samples, n_classes)
The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- predict_proba(X)[source]#
Predict class probabilities for X.
The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a
predict_proba
method, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- pndarray of shape (n_samples, n_classes)
The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- score(X, y, sample_weight=None)[source]#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- property estimators_samples_[source]#
The subset of drawn samples for each base estimator.
Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.
Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.
- class wildboar.ensemble.RocketForestRegressor(n_estimators=100, *, n_kernels=10, oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, sampling='normal', sampling_params=None, kernel_size=None, min_size=None, max_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, criterion='squared_error', bootstrap=True, warm_start=False, n_jobs=None, random_state=None)[source]#
An ensemble of rocket tree regressors.
- Parameters:
- n_estimatorsint, optional
The number of estimators.
- n_kernelsint, optional
The number of shapelets to sample at each node.
- oob_scorebool, optional
Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.
- max_depthint, optional
The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
- min_samples_splitint, optional
The minimum number of samples to split an internal node.
- min_samples_leafint, optional
The minimum number of samples in a leaf.
- min_impurity_decreasefloat, optional
A split will be introduced only if the impurity decrease is larger than or equal to this value.
- sampling{“normal”, “uniform”, “shapelet”}, optional
The sampling of convolutional filters.
- if “normal”, sample filter according to a normal distribution with
mean
andscale
.
- if “uniform”, sample filter according to a uniform distribution with
lower
andupper
.
if “shapelet”, sample filters as subsequences in the training data.
- sampling_paramsdict, optional
The parameters for the sampling.
- if “normal”,
{"mean": float, "scale": float}
, defaults to {"mean": 0, "scale": 1}
.
- if “normal”,
- if “uniform”,
{"lower": float, "upper": float}
, defaults to {"lower": -1, "upper": 1}
.
- if “uniform”,
- kernel_sizearray-like, optional
The kernel size, by default
[7, 11, 13]
.- min_sizefloat, optional
The minimum length of a shapelets expressed as a fraction of n_timestep.
- max_sizefloat, optional
The maximum length of a shapelets expressed as a fraction of n_timestep.
- bias_probfloat, optional
The probability of using a bias term.
- normalize_probfloat, optional
The probability of performing normalization.
- padding_probfloat, optional
The probability of padding with zeros.
- criterion{“squared_error”}, optional
The criterion used to evaluate the utility of a split.
Deprecated since version 1.1: Criterion “mse” was deprecated in v1.1 and removed in version 1.2.
- bootstrapbool, optional
If the samples are drawn with replacement.
- warm_startbool, optional
When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
- n_jobsint, optional
The number of processor cores used for fitting the ensemble.
- random_stateint or RandomState, optional
Controls the random resampling of the original dataset.
If int, random_state is the seed used by the random number generator
If
numpy.random.RandomState
instance, random_state is the random number generatorIf None, the random number generator is the
numpy.random.RandomState
instance used bynumpy.random
.
- fit(x, y, sample_weight=None)[source]#
Build a Bagging ensemble of estimators from the training set (X, y).
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- yarray-like of shape (n_samples,)
The target values (class labels in classification, real numbers in regression).
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
- **fit_paramsdict
Parameters to pass to the underlying estimators.
Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using
sklearn.set_config(enable_metadata_routing=True)
. See Metadata Routing User Guide for more details.
- Returns:
- selfobject
Fitted estimator.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(X)[source]#
Predict regression target for X.
The predicted regression target of an input sample is computed as the mean predicted regression targets of the estimators in the ensemble.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- yndarray of shape (n_samples,)
The predicted values.
- score(X, y, sample_weight=None)[source]#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True values for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
\(R^2\) of
self.predict(X)
w.r.t. y.
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- property estimators_samples_[source]#
The subset of drawn samples for each base estimator.
Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.
Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.
- class wildboar.ensemble.ShapeletForestClassifier(n_estimators=100, *, n_shapelets='log2', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, alpha=None, metric='euclidean', metric_params=None, criterion='entropy', oob_score=False, bootstrap=True, warm_start=False, class_weight=None, n_jobs=None, random_state=None)[source]#
An ensemble of random shapelet tree classifiers.
A forest of randomized shapelet trees.
- Parameters:
- n_estimatorsint, optional
The number of estimators.
- n_shapeletsint, optional
The number of shapelets to sample at each node.
- max_depthint, optional
The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
- min_samples_splitint, optional
The minimum number of samples to split an internal node.
- min_samples_leafint, optional
The minimum number of samples in a leaf.
- min_impurity_decreasefloat, optional
A split will be introduced only if the impurity decrease is larger than or equal to this value.
- min_shapelet_sizefloat, optional
The minimum length of a shapelets expressed as a fraction of n_timestep.
- max_shapelet_sizefloat, optional
The maximum length of a shapelets expressed as a fraction of n_timestep.
- alphafloat, optional
Dynamically decrease the number of sampled shapelets at each node according to the current depth, i.e.:
- ::
w = 1 - exp(-abs(alpha) * depth)
if alpha < 0, the number of sampled shapelets decrease from n_shapelets towards 1 with increased depth.
if alpha > 0, the number of sampled shapelets increase from 1 towards n_shapelets with increased depth.
if None, the number of sampled shapelets are the same independeth of depth.
- metricstr or list, optional
The distance metric.
If str, the distance metric used to identify the best shapelet.
If list, multiple metrics specified as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument r with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).
Read more about metric specifications in the User guide.
Changed in version 1.2: Added support for multi-metric shapelet transform
- metric_paramsdict, optional
Parameters for the distance measure. Ignored unless metric is a string.
Read more about the parameters in the User guide.
- criterion{“entropy”, “gini”}, optional
The criterion used to evaluate the utility of a split.
- oob_scorebool, optional
Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.
- bootstrapbool, optional
If the samples are drawn with replacement.
- warm_startbool, optional
When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
- class_weightdict or “balanced”, optional
Weights associated with the labels
if dict, weights on the form {label: weight}.
if “balanced” each class weight inversely proportional to the class frequency.
if None, each class has equal weight.
- n_jobsint, optional
The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.
- random_stateint or RandomState, optional
Controls the random resampling of the original dataset.
If int, random_state is the seed used by the random number generator
If
numpy.random.RandomState
instance, random_state is the random number generatorIf None, the random number generator is the
numpy.random.RandomState
instance used bynumpy.random
.
Examples
>>> from wildboar.ensemble import ShapeletForestClassifier >>> from wildboar.datasets import load_synthetic_control >>> x, y = load_synthetic_control() >>> f = ShapeletForestClassifier(n_estimators=100, metric='scaled_euclidean') >>> f.fit(x, y) ShapeletForestClassifier(metric='scaled_euclidean') >>> y_hat = f.predict(x)
- decision_function(X)[source]#
Average of the decision functions of the base classifiers.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- scorendarray of shape (n_samples, k)
The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute
classes_
. Regression and binary classification are special cases withk == 1
, otherwisek==n_classes
.
- fit(x, y, sample_weight=None)[source]#
Build a Bagging ensemble of estimators from the training set (X, y).
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- yarray-like of shape (n_samples,)
The target values (class labels in classification, real numbers in regression).
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
- **fit_paramsdict
Parameters to pass to the underlying estimators.
Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using
sklearn.set_config(enable_metadata_routing=True)
. See Metadata Routing User Guide for more details.
- Returns:
- selfobject
Fitted estimator.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(X)[source]#
Predict class for X.
The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a
predict_proba
method, then it resorts to voting.- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- yndarray of shape (n_samples,)
The predicted classes.
- predict_log_proba(X)[source]#
Predict class log-probabilities for X.
The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- pndarray of shape (n_samples, n_classes)
The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- predict_proba(X)[source]#
Predict class probabilities for X.
The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a
predict_proba
method, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- pndarray of shape (n_samples, n_classes)
The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- score(X, y, sample_weight=None)[source]#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- property estimators_samples_[source]#
The subset of drawn samples for each base estimator.
Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.
Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.
- class wildboar.ensemble.ShapeletForestEmbedding(n_estimators=100, *, n_shapelets=1, max_depth=5, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, metric='euclidean', metric_params=None, criterion='squared_error', bootstrap=True, warm_start=False, n_jobs=None, sparse_output=True, random_state=None)[source]#
An ensemble of random shapelet trees.
An unsupervised transformation of a time series dataset to a high-dimensional sparse representation. A time series i indexed by the leaf that it falls into. This leads to a binary coding of a time series with as many ones as trees in the forest.
The dimensionality of the resulting representation is <= n_estimators * 2^max_depth
- Parameters:
- n_estimatorsint, optional
The number of estimators.
- n_shapeletsint, optional
The number of shapelets to sample at each node.
- max_depthint, optional
The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
- min_samples_splitint, optional
The minimum number of samples to split an internal node.
- min_samples_leafint, optional
The minimum number of samples in a leaf.
- min_impurity_decreasefloat, optional
A split will be introduced only if the impurity decrease is larger than or equal to this value.
- min_shapelet_sizefloat, optional
The minimum length of a shapelets expressed as a fraction of n_timestep.
- max_shapelet_sizefloat, optional
The maximum length of a shapelets expressed as a fraction of n_timestep.
- metricstr or list, optional
The distance metric.
If str, the distance metric used to identify the best shapelet.
If list, multiple metrics specified as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument r with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).
Read more about metric specifications in the User guide.
Changed in version 1.2: Added support for multi-metric shapelet transform
- metric_paramsdict, optional
Parameters for the distance measure. Ignored unless metric is a string.
Read more about the parameters in the User guide.
- criterion{“squared_error”}, optional
The criterion used to evaluate the utility of a split.
Deprecated since version 1.1: Criterion “mse” was deprecated in v1.1 and removed in version 1.2.
- bootstrapbool, optional
If the samples are drawn with replacement.
- warm_startbool, optional
When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
- n_jobsint, optional
The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.
- sparse_outputbool, optional
Return a sparse CSR-matrix.
- random_stateint or RandomState, optional
Controls the random resampling of the original dataset.
If int, random_state is the seed used by the random number generator.
If
numpy.random.RandomState
instance, random_state is the random number generator.If None, the random number generator is the
numpy.random.RandomState
instance used bynumpy.random
.
- fit(x, y=None, sample_weight=None)[source]#
Build a Bagging ensemble of estimators from the training set (X, y).
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- yarray-like of shape (n_samples,)
The target values (class labels in classification, real numbers in regression).
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
- **fit_paramsdict
Parameters to pass to the underlying estimators.
Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using
sklearn.set_config(enable_metadata_routing=True)
. See Metadata Routing User Guide for more details.
- Returns:
- selfobject
Fitted estimator.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(X)[source]#
Predict regression target for X.
The predicted regression target of an input sample is computed as the mean predicted regression targets of the estimators in the ensemble.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- yndarray of shape (n_samples,)
The predicted values.
- score(X, y, sample_weight=None)[source]#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True values for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
\(R^2\) of
self.predict(X)
w.r.t. y.
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- property estimators_samples_[source]#
The subset of drawn samples for each base estimator.
Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.
Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.
- class wildboar.ensemble.ShapeletForestRegressor(n_estimators=100, *, n_shapelets='log2', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, alpha=None, metric='euclidean', metric_params=None, criterion='squared_error', oob_score=False, bootstrap=True, warm_start=False, n_jobs=None, random_state=None)[source]#
An ensemble of random shapelet tree regressors.
- Parameters:
- n_estimatorsint, optional
The number of estimators.
- n_shapeletsint, optional
The number of shapelets to sample at each node.
- max_depthint, optional
The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
- min_samples_splitint, optional
The minimum number of samples to split an internal node.
- min_samples_leafint, optional
The minimum number of samples in a leaf.
- min_impurity_decreasefloat, optional
A split will be introduced only if the impurity decrease is larger than or equal to this value.
- min_shapelet_sizefloat, optional
The minimum length of a shapelets expressed as a fraction of n_timestep.
- max_shapelet_sizefloat, optional
The maximum length of a shapelets expressed as a fraction of n_timestep.
- alphafloat, optional
Dynamically decrease the number of sampled shapelets at each node according to the current depth, i.e.::
w = 1 - exp(-abs(alpha) * depth)
if alpha < 0, the number of sampled shapelets decrease from n_shapelets towards 1 with increased depth.
if alpha > 0, the number of sampled shapelets increase from 1 towards n_shapelets with increased depth.
if None, the number of sampled shapelets are the same independeth of depth.
- metricstr or list, optional
The distance metric.
If str, the distance metric used to identify the best shapelet.
If list, multiple metrics specified as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument r with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).
Read more about metric specifications in the User guide.
Changed in version 1.2: Added support for multi-metric shapelet transform
- metric_paramsdict, optional
Parameters for the distance measure. Ignored unless metric is a string.
Read more about the parameters in the User guide.
- criterion{“squared_error”}, optional
The criterion used to evaluate the utility of a split.
Deprecated since version 1.1: Criterion “mse” was deprecated in v1.1 and removed in version 1.2.
- oob_scorebool, optional
Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.
- bootstrapbool, optional
If the samples are drawn with replacement.
- warm_startbool, optional
When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
- n_jobsint, optional
The number of processor cores used for fitting the ensemble.
- random_stateint or RandomState, optional
Controls the random resampling of the original dataset.
If int, random_state is the seed used by the random number generator
If
numpy.random.RandomState
instance, random_state is the random number generatorIf None, the random number generator is the
numpy.random.RandomState
instance used bynumpy.random
.
Examples
>>> from wildboar.ensemble import ShapeletForestRegressor >>> from wildboar.datasets import load_synthetic_control >>> x, y = load_synthetic_control() >>> f = ShapeletForestRegressor(n_estimators=100, metric='scaled_euclidean') >>> f.fit(x, y) ShapeletForestRegressor(metric='scaled_euclidean') >>> y_hat = f.predict(x)
- fit(x, y, sample_weight=None)[source]#
Build a Bagging ensemble of estimators from the training set (X, y).
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- yarray-like of shape (n_samples,)
The target values (class labels in classification, real numbers in regression).
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
- **fit_paramsdict
Parameters to pass to the underlying estimators.
Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using
sklearn.set_config(enable_metadata_routing=True)
. See Metadata Routing User Guide for more details.
- Returns:
- selfobject
Fitted estimator.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(X)[source]#
Predict regression target for X.
The predicted regression target of an input sample is computed as the mean predicted regression targets of the estimators in the ensemble.
- Parameters:
- X{array-like, sparse matrix} of shape (n_samples, n_features)
The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- Returns:
- yndarray of shape (n_samples,)
The predicted values.
- score(X, y, sample_weight=None)[source]#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True values for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
\(R^2\) of
self.predict(X)
w.r.t. y.
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- property estimators_samples_[source]#
The subset of drawn samples for each base estimator.
Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.
Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.