wildboar.ensemble#
Ensemble methods for classification, regression and outlier detection.
Classes#
A bagging classifier.  | 
|
A bagging regressor.  | 
|
Base estimator for Wildboar ensemble estimators.  | 
|
Ensemble of   | 
|
An ensemble of extremely random shapelet trees.  | 
|
An ensemble of extremely random shapelet tree regressors.  | 
|
An ensemble of interval tree classifiers.  | 
|
An ensemble of interval tree regressors.  | 
|
An isolation shapelet forest.  | 
|
An ensemble of interval tree classifiers.  | 
|
A forest of proximity trees.  | 
|
An ensemble of rocket tree classifiers.  | 
|
An ensemble of rocket tree regressors.  | 
|
An ensemble of random shapelet tree classifiers.  | 
|
An ensemble of random shapelet trees.  | 
|
An ensemble of random shapelet tree regressors.  | 
- class wildboar.ensemble.BaggingClassifier(estimator=None, n_estimators=10, *, max_samples=1.0, bootstrap=True, oob_score=False, class_weight=None, warm_start=False, n_jobs=None, random_state=None, verbose=0, base_estimator='deprecated')[source]#
 A bagging classifier.
A bagging regressor is a meta-estimator that fits base classifiers on random subsets of the original data.
- Parameters:
 - estimatorobject, optional
 Base estimator of the ensemble. If None, the base estimator is a
ShapeletTreeRegressor.- n_estimatorsint, optional
 The number of base estimators in the ensemble.
- max_samplesint or float, optional
 The number of samples to draw from X to train each base estimator.
if int, then draw max_samples samples.
if float, then draw max_samples * n_samples samples.
- bootstrapbool, optional
 If the samples are drawn with replacement.
- oob_scorebool, optional
 Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.
- class_weightdict or “balanced”, optional
 Weights associated with the labels
if dict, weights on the form {label: weight}.
if “balanced” each class weight inversely proportional to the class frequency.
if None, each class has equal weight.
- warm_startbool, optional
 When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
- n_jobsint, optional
 The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.
- random_stateint or RandomState, optional
 Controls the random resampling of the original dataset.
If int, random_state is the seed used by the random number generator.
If
numpy.random.RandomStateinstance, random_state is the random number generator.If None, the random number generator is the
numpy.random.RandomStateinstance used bynumpy.random.
- verboseint, optional
 Controls the output to standard error while fitting and predicting.
- base_estimatorobject, optional
 Use estimator instead.
Deprecated since version 1.2: base_estimator has been deprecated and will be removed in 1.4. Use estimator instead.
- decision_function(X, **params)[source]#
 Average of the decision functions of the base classifiers.
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the decision_function method of the sub-estimators via the metadata routing API.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - scorendarray of shape (n_samples, k)
 The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute
classes_. Regression and binary classification are special cases withk == 1, otherwisek==n_classes.
- fit(x, y, sample_weight=None)[source]#
 Build a Bagging ensemble of estimators from the training set (X, y).
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- yarray-like of shape (n_samples,)
 The target values (class labels in classification, real numbers in regression).
- sample_weightarray-like of shape (n_samples,), default=None
 Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
- **fit_paramsdict
 Parameters to pass to the underlying estimators.
Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using
sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.
- Returns:
 - selfobject
 Fitted estimator.
- get_metadata_routing()[source]#
 Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
 - routingMetadataRequest
 A
MetadataRequestencapsulating routing information.
- get_params(deep=True)[source]#
 Get parameters for this estimator.
- Parameters:
 - deepbool, default=True
 If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
 - paramsdict
 Parameter names mapped to their values.
- predict(X, **params)[source]#
 Predict class for X.
The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a
predict_probamethod, then it resorts to voting.- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the predict_proba (if available) or the predict method (otherwise) of the sub-estimators via the metadata routing API.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - yndarray of shape (n_samples,)
 The predicted classes.
- predict_log_proba(X, **params)[source]#
 Predict class log-probabilities for X.
The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the predict_log_proba, the predict_proba or the proba method of the sub-estimators via the metadata routing API. The routing is tried in the mentioned order depending on whether this method is available on the sub-estimator.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - pndarray of shape (n_samples, n_classes)
 The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- predict_proba(X)[source]#
 Predict class probabilities for X.
The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a
predict_probamethod, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the predict_proba (if available) or the predict method (otherwise) of the sub-estimators via the metadata routing API.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - pndarray of shape (n_samples, n_classes)
 The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- score(X, y, sample_weight=None)[source]#
 Return accuracy on provided data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
 - Xarray-like of shape (n_samples, n_features)
 Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
 True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
 Sample weights.
- Returns:
 - scorefloat
 Mean accuracy of
self.predict(X)w.r.t. y.
- set_params(**params)[source]#
 Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
 - **paramsdict
 Estimator parameters.
- Returns:
 - selfestimator instance
 Estimator instance.
- property estimators_samples_[source]#
 The subset of drawn samples for each base estimator.
Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.
Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.
- class wildboar.ensemble.BaggingRegressor(estimator=None, n_estimators=100, *, max_samples=1.0, bootstrap=True, oob_score=False, warm_start=False, n_jobs=None, random_state=None, verbose=0, base_estimator='deprecated')[source]#
 A bagging regressor.
A bagging regressor is a meta-estimator that fits base classifiers on random subsets of the original data.
- Parameters:
 - estimatorobject, optional
 Base estimator of the ensemble. If None, the base estimator is a
ShapeletTreeRegressor.- n_estimatorsint, optional
 The number of base estimators in the ensemble.
- max_samplesint or float, optional
 The number of samples to draw from X to train each base estimator.
if int, then draw max_samples samples.
if float, then draw max_samples * n_samples samples.
- bootstrapbool, optional
 If the samples are drawn with replacement.
- oob_scorebool, optional
 Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.
- warm_startbool, optional
 When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
- n_jobsint, optional
 The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.
- random_stateint or RandomState, optional
 Controls the random resampling of the original dataset.
If int, random_state is the seed used by the random number generator.
If
numpy.random.RandomStateinstance, random_state is the random number generator.If None, the random number generator is the
numpy.random.RandomStateinstance used bynumpy.random.
- verboseint, optional
 Controls the output to standard error while fitting and predicting.
- base_estimatorobject, optional
 Use estimator instead.
Deprecated since version 1.2: base_estimator has been deprecated and will be removed in 1.4. Use estimator instead.
- fit(x, y, sample_weight=None)[source]#
 Build a Bagging ensemble of estimators from the training set (X, y).
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- yarray-like of shape (n_samples,)
 The target values (class labels in classification, real numbers in regression).
- sample_weightarray-like of shape (n_samples,), default=None
 Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
- **fit_paramsdict
 Parameters to pass to the underlying estimators.
Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using
sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.
- Returns:
 - selfobject
 Fitted estimator.
- get_metadata_routing()[source]#
 Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
 - routingMetadataRequest
 A
MetadataRequestencapsulating routing information.
- get_params(deep=True)[source]#
 Get parameters for this estimator.
- Parameters:
 - deepbool, default=True
 If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
 - paramsdict
 Parameter names mapped to their values.
- predict(X)[source]#
 Predict regression target for X.
The predicted regression target of an input sample is computed as the mean predicted regression targets of the estimators in the ensemble.
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the predict method of the sub-estimators via the metadata routing API.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - yndarray of shape (n_samples,)
 The predicted values.
- score(X, y, sample_weight=None)[source]#
 Return coefficient of determination on test data.
The coefficient of determination, \(R^2\), is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
 - Xarray-like of shape (n_samples, n_features)
 Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted), wheren_samples_fittedis the number of samples used in the fitting for the estimator.- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
 True values for X.
- sample_weightarray-like of shape (n_samples,), default=None
 Sample weights.
- Returns:
 - scorefloat
 \(R^2\) of
self.predict(X)w.r.t. y.
Notes
The \(R^2\) score used when calling
scoreon a regressor usesmultioutput='uniform_average'from version 0.23 to keep consistent with default value ofr2_score. This influences thescoremethod of all the multioutput regressors (except forMultiOutputRegressor).
- set_params(**params)[source]#
 Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
 - **paramsdict
 Estimator parameters.
- Returns:
 - selfestimator instance
 Estimator instance.
- property estimators_samples_[source]#
 The subset of drawn samples for each base estimator.
Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.
Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.
- class wildboar.ensemble.BaseBagging(estimator=None, n_estimators=10, *, max_samples=1.0, bootstrap=True, oob_score=False, warm_start=False, n_jobs=None, random_state=None, verbose=0, base_estimator='deprecated')[source]#
 Base estimator for Wildboar ensemble estimators.
- fit(x, y, sample_weight=None)[source]#
 Build a Bagging ensemble of estimators from the training set (X, y).
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- yarray-like of shape (n_samples,)
 The target values (class labels in classification, real numbers in regression).
- sample_weightarray-like of shape (n_samples,), default=None
 Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
- **fit_paramsdict
 Parameters to pass to the underlying estimators.
Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using
sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.
- Returns:
 - selfobject
 Fitted estimator.
- get_metadata_routing()[source]#
 Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
 - routingMetadataRequest
 A
MetadataRequestencapsulating routing information.
- get_params(deep=True)[source]#
 Get parameters for this estimator.
- Parameters:
 - deepbool, default=True
 If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
 - paramsdict
 Parameter names mapped to their values.
- set_params(**params)[source]#
 Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
 - **paramsdict
 Estimator parameters.
- Returns:
 - selfestimator instance
 Estimator instance.
- property estimators_samples_[source]#
 The subset of drawn samples for each base estimator.
Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.
Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.
- class wildboar.ensemble.ElasticEnsembleClassifier(n_neighbors=1, *, metric='auto', n_jobs=None)[source]#
 Ensemble of
wildboar.distance.KNeighborsClassifier.Each classifier is fitted with an optimized parameter grid over metric parameters.
- Parameters:
 - n_neighborsint, optional
 The number of neighbors.
- metric{“auto”, “elastic”, “non_elastic”, “all”} or dict, optional
 The metric specification.
if “auto” or “elastic”, fit one classifier for each elastic distance as described by Lines and Bagnall (2015). We use a slightly smaller parameter grid.
if “non_elastic”, fit one classifier for each non-elastic distance measure.
if “all”, fit one classifier for the metrics in both “elastic” and “non_elastic”.
if dict, a custom metric specification.
- n_jobsint, optional
 The number of paralell jobs.
- Attributes:
 - scorestuple
 A tuple of metric name and cross-validation score.
References
- Jason Lines and Anthony Bagnall,
 Time Series Classification with Ensembles of Elastic Distance Measures, Data Mining and Knowledge Discovery, 29(3), 2015.
Examples
>>> from wildboar.datasets import load_gun_point >>> from wildboar.ensemble import ElasticEnsembleClassifier >>> X_train, X_test, y_train, y_test = load_gun_point(merge_train_test=False) >>> clf = ElasticEnsembleClassifier( ... metric={ ... "dtw": {"min_r": 0.1, "max_r": 0.3}, ... "ddtw": {"min_r": 0.1, "max_r": 0.3}, ... }, ... ) >>> clf.fit(X_train, y_train) ElasticEnsembleClassifier(metric={'ddtw': {'max_r': 0.3, 'min_r': 0.1}, 'dtw': {'max_r': 0.3, 'min_r': 0.1}}) >>> clf.score(X_test, y_test) 0.9866666666666667
- fit(x, y)[source]#
 Fit the estimator.
- Parameters:
 - xarray-like of shape (n_samples, n_timesteps) or (n_samples, n_dim, n_timesteps)
 The input samples.
- yarray-like of shape (n_samples, )
 The input labels.
- Returns:
 - object
 This estimator.
- get_metadata_routing()[source]#
 Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
 - routingMetadataRequest
 A
MetadataRequestencapsulating routing information.
- get_params(deep=True)[source]#
 Get parameters for this estimator.
- Parameters:
 - deepbool, default=True
 If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
 - paramsdict
 Parameter names mapped to their values.
- predict(x)[source]#
 Compute the class label for the samples in x.
- Parameters:
 - xarray-like of shape (n_samples, n_timesteps) or (n_samples, n_dim, n_timesteps)
 The input samples.
- Returns:
 - ndarray of shape (n_samples, )
 The class label for each sample.
- predict_proba(x)[source]#
 Compute probability estimates for the samples in x.
- Parameters:
 - xarray-like of shape (n_samples, n_timesteps) or (n_samples, n_dim, n_timesteps)
 The input time series.
- Returns:
 - ndarray of shape (n_samples, n_classes)
 The probabilities.
- score(X, y, sample_weight=None)[source]#
 Return accuracy on provided data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
 - Xarray-like of shape (n_samples, n_features)
 Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
 True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
 Sample weights.
- Returns:
 - scorefloat
 Mean accuracy of
self.predict(X)w.r.t. y.
- set_params(**params)[source]#
 Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
 - **paramsdict
 Estimator parameters.
- Returns:
 - selfestimator instance
 Estimator instance.
- class wildboar.ensemble.ExtraShapeletTreesClassifier(n_estimators=100, *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, coverage_probability=None, variability=1, metric='euclidean', metric_params=None, criterion='entropy', oob_score=False, bootstrap=True, warm_start=False, class_weight=None, n_jobs=None, random_state=None)[source]#
 An ensemble of extremely random shapelet trees.
- Parameters:
 - n_estimatorsint, optional
 The number of estimators.
- max_depthint, optional
 The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
- min_samples_splitint, optional
 The minimum number of samples to split an internal node.
- min_samples_leafint, optional
 The minimum number of samples in a leaf.
- min_impurity_decreasefloat, optional
 A split will be introduced only if the impurity decrease is larger than or equal to this value.
- min_shapelet_sizefloat, optional
 The minimum length of a shapelets expressed as a fraction of n_timestep.
- max_shapelet_sizefloat, optional
 The maximum length of a shapelets expressed as a fraction of n_timestep.
- coverage_probabilityfloat, optional
 The probability that a time step is covered by a shapelet, in the range 0 < coverage_probability <= 1.
For larger coverage_probability, we get larger shapelets.
For smaller coverage_probability, we get shorter shapelets.
- variabilityfloat, optional
 Controls the shape of the Beta distribution used to sample shapelets. Defaults to 1.
Higher variability creates more uniform intervals.
Lower variability creates more variable intervals sizes.
- metricstr or list, optional
 The distance metric.
If str, the distance metric used to identify the best shapelet.
If list, multiple metrics specified as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument r with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).
Read more about metric specifications in the User guide.
Changed in version 1.2: Added support for multi-metric shapelet transform
- metric_paramsdict, optional
 Parameters for the distance measure. Ignored unless metric is a string.
Read more about the parameters in the User guide.
- criterion{“entropy”, “gini”}, optional
 The criterion used to evaluate the utility of a split.
- oob_scorebool, optional
 Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.
- bootstrapbool, optional
 If the samples are drawn with replacement.
- warm_startbool, optional
 When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
- class_weightdict or “balanced”, optional
 Weights associated with the labels
if dict, weights on the form {label: weight}.
if “balanced” each class weight inversely proportional to the class frequency.
if
None, each class has equal weight.
- n_jobsint, optional
 The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.
- random_stateint or RandomState, optional
 Controls the random resampling of the original dataset.
If int, random_state is the seed used by the random number generator
If
numpy.random.RandomStateinstance, random_state is the random number generatorIf None, the random number generator is the
numpy.random.RandomStateinstance used bynumpy.random.
Examples
>>> from wildboar.ensemble import ExtraShapeletTreesClassifier >>> from wildboar.datasets import load_synthetic_control >>> x, y = load_synthetic_control() >>> f = ExtraShapeletTreesClassifier(n_estimators=100, metric='scaled_euclidean') >>> f.fit(x, y) ExtraShapeletTreesClassifier(metric='scaled_euclidean') >>> y_hat = f.predict(x)
- decision_function(X, **params)[source]#
 Average of the decision functions of the base classifiers.
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the decision_function method of the sub-estimators via the metadata routing API.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - scorendarray of shape (n_samples, k)
 The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute
classes_. Regression and binary classification are special cases withk == 1, otherwisek==n_classes.
- fit(x, y, sample_weight=None)[source]#
 Build a Bagging ensemble of estimators from the training set (X, y).
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- yarray-like of shape (n_samples,)
 The target values (class labels in classification, real numbers in regression).
- sample_weightarray-like of shape (n_samples,), default=None
 Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
- **fit_paramsdict
 Parameters to pass to the underlying estimators.
Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using
sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.
- Returns:
 - selfobject
 Fitted estimator.
- get_metadata_routing()[source]#
 Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
 - routingMetadataRequest
 A
MetadataRequestencapsulating routing information.
- get_params(deep=True)[source]#
 Get parameters for this estimator.
- Parameters:
 - deepbool, default=True
 If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
 - paramsdict
 Parameter names mapped to their values.
- predict(X, **params)[source]#
 Predict class for X.
The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a
predict_probamethod, then it resorts to voting.- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the predict_proba (if available) or the predict method (otherwise) of the sub-estimators via the metadata routing API.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - yndarray of shape (n_samples,)
 The predicted classes.
- predict_log_proba(X, **params)[source]#
 Predict class log-probabilities for X.
The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the predict_log_proba, the predict_proba or the proba method of the sub-estimators via the metadata routing API. The routing is tried in the mentioned order depending on whether this method is available on the sub-estimator.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - pndarray of shape (n_samples, n_classes)
 The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- predict_proba(X)[source]#
 Predict class probabilities for X.
The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a
predict_probamethod, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the predict_proba (if available) or the predict method (otherwise) of the sub-estimators via the metadata routing API.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - pndarray of shape (n_samples, n_classes)
 The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- score(X, y, sample_weight=None)[source]#
 Return accuracy on provided data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
 - Xarray-like of shape (n_samples, n_features)
 Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
 True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
 Sample weights.
- Returns:
 - scorefloat
 Mean accuracy of
self.predict(X)w.r.t. y.
- set_params(**params)[source]#
 Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
 - **paramsdict
 Estimator parameters.
- Returns:
 - selfestimator instance
 Estimator instance.
- property estimators_samples_[source]#
 The subset of drawn samples for each base estimator.
Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.
Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.
- class wildboar.ensemble.ExtraShapeletTreesRegressor(n_estimators=100, *, max_depth=None, min_samples_split=2, min_shapelet_size=0, max_shapelet_size=1, coverage_probability=None, variability=1, metric='euclidean', metric_params=None, criterion='squared_error', oob_score=False, bootstrap=True, warm_start=False, n_jobs=None, random_state=None)[source]#
 An ensemble of extremely random shapelet tree regressors.
- Parameters:
 - n_estimatorsint, optional
 The number of estimators.
- max_depthint, optional
 The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
- min_samples_splitint, optional
 The minimum number of samples to split an internal node.
- min_shapelet_sizefloat, optional
 The minimum length of a shapelets expressed as a fraction of n_timestep.
- max_shapelet_sizefloat, optional
 The maximum length of a shapelets expressed as a fraction of n_timestep.
- coverage_probabilityfloat, optional
 The probability that a time step is covered by a shapelet, in the range 0 < coverage_probability <= 1.
For larger coverage_probability, we get larger shapelets.
For smaller coverage_probability, we get shorter shapelets.
- variabilityfloat, optional
 Controls the shape of the Beta distribution used to sample shapelets. Defaults to 1.
Higher variability creates more uniform intervals.
Lower variability creates more variable intervals sizes.
- metricstr or list, optional
 The distance metric.
If str, the distance metric used to identify the best shapelet.
If list, multiple metrics specified as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument r with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).
Read more about metric specifications in the User guide.
Changed in version 1.2: Added support for multi-metric shapelet transform
- metric_paramsdict, optional
 Parameters for the distance measure. Ignored unless metric is a string.
Read more about the parameters in the User guide.
- criterion{“squared_error”}, optional
 The criterion used to evaluate the utility of a split.
Deprecated since version 1.1: Criterion “mse” was deprecated in v1.1 and removed in version 1.2.
- oob_scorebool, optional
 Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.
- bootstrapbool, optional
 If the samples are drawn with replacement.
- warm_startbool, optional
 When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
- n_jobsint, optional
 The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.
- random_stateint or RandomState, optional
 Controls the random resampling of the original dataset.
If int, random_state is the seed used by the random number generator
If
numpy.random.RandomStateinstance, random_state is the random number generatorIf None, the random number generator is the
numpy.random.RandomStateinstance used bynumpy.random.
Examples
>>> from wildboar.ensemble import ExtraShapeletTreesRegressor >>> from wildboar.datasets import load_synthetic_control >>> x, y = load_synthetic_control() >>> f = ExtraShapeletTreesRegressor(n_estimators=100, metric='scaled_euclidean') >>> f.fit(x, y) ExtraShapeletTreesRegressor(metric='scaled_euclidean') >>> y_hat = f.predict(x)
- fit(x, y, sample_weight=None)[source]#
 Build a Bagging ensemble of estimators from the training set (X, y).
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- yarray-like of shape (n_samples,)
 The target values (class labels in classification, real numbers in regression).
- sample_weightarray-like of shape (n_samples,), default=None
 Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
- **fit_paramsdict
 Parameters to pass to the underlying estimators.
Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using
sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.
- Returns:
 - selfobject
 Fitted estimator.
- get_metadata_routing()[source]#
 Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
 - routingMetadataRequest
 A
MetadataRequestencapsulating routing information.
- get_params(deep=True)[source]#
 Get parameters for this estimator.
- Parameters:
 - deepbool, default=True
 If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
 - paramsdict
 Parameter names mapped to their values.
- predict(X)[source]#
 Predict regression target for X.
The predicted regression target of an input sample is computed as the mean predicted regression targets of the estimators in the ensemble.
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the predict method of the sub-estimators via the metadata routing API.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - yndarray of shape (n_samples,)
 The predicted values.
- score(X, y, sample_weight=None)[source]#
 Return coefficient of determination on test data.
The coefficient of determination, \(R^2\), is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
 - Xarray-like of shape (n_samples, n_features)
 Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted), wheren_samples_fittedis the number of samples used in the fitting for the estimator.- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
 True values for X.
- sample_weightarray-like of shape (n_samples,), default=None
 Sample weights.
- Returns:
 - scorefloat
 \(R^2\) of
self.predict(X)w.r.t. y.
Notes
The \(R^2\) score used when calling
scoreon a regressor usesmultioutput='uniform_average'from version 0.23 to keep consistent with default value ofr2_score. This influences thescoremethod of all the multioutput regressors (except forMultiOutputRegressor).
- set_params(**params)[source]#
 Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
 - **paramsdict
 Estimator parameters.
- Returns:
 - selfestimator instance
 Estimator instance.
- property estimators_samples_[source]#
 The subset of drawn samples for each base estimator.
Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.
Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.
- class wildboar.ensemble.IntervalForestClassifier(n_estimators=100, *, n_intervals='sqrt', intervals='random', summarizer='mean_var_slope', sample_size=0.5, min_size=0.0, max_size=1.0, oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0, criterion='entropy', bootstrap=True, warm_start=False, n_jobs=None, class_weight=None, random_state=None)[source]#
 An ensemble of interval tree classifiers.
- Parameters:
 - n_estimatorsint, optional
 The number of estimators.
- n_intervalsstr, int or float, optional
 The number of intervals to use for the transform.
if “log2”, the number of intervals is log2(n_timestep).
if “sqrt”, the number of intervals is sqrt(n_timestep).
if int, the number of intervals is n_intervals.
if float, the number of intervals is n_intervals * n_timestep, with 0 < n_intervals < 1.
Deprecated since version 1.2: The option “log” has been renamed to “log2”.
- intervalsstr, optional
 The method for selecting intervals.
if “fixed”, n_intervals non-overlapping intervals.
if “random”, n_intervals possibly overlapping intervals of randomly sampled in [min_size * n_timestep, max_size * n_timestep].
Deprecated since version 1.3: The option “sample” has been deprecated. Use “fixed” with sample_size.
- summarizerstr or list, optional
 The method to summarize each interval.
if str, the summarizer is determined by _SUMMARIZERS.keys().
if list, the summarizer is a list of functions f(x) -> float, where x is a numpy array.
The default summarizer summarizes each interval as its mean, standard deviation and slope.
- sample_sizefloat, optional
 The sub-sample fixed intervals.
- min_sizefloat, optional
 The minimum interval size if intervals=”random”.
- max_sizefloat, optional
 The maximum interval size if intervals=”random”.
- oob_scorebool, optional
 Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.
- max_depthint, optional
 The maximum tree depth.
- min_samples_splitint, optional
 The minimum number of samples to consider a split.
- min_samples_leafint, optional
 The minimum number of samples in a leaf.
- min_impurity_decreasefloat, optional
 The minimum impurity decrease to build a sub-tree.
- criterion{“entropy”, “gini”}, optional
 The impurity criterion.
- bootstrapbool, optional
 If the samples are drawn with replacement.
- warm_startbool, optional
 When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
- n_jobsint, optional
 The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.
- class_weightdict or “balanced”, optional
 Weights associated with the labels.
if dict, weights on the form {label: weight}.
- if “balanced” each class weight inversely proportional to the class
 frequency.
if None, each class has equal weight.
- random_stateint or RandomState, optional
 If int, random_state is the seed used by the random number generator
If RandomState instance, random_state is the random number generator
If None, the random number generator is the RandomState instance used by np.random.
- decision_function(X, **params)[source]#
 Average of the decision functions of the base classifiers.
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the decision_function method of the sub-estimators via the metadata routing API.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - scorendarray of shape (n_samples, k)
 The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute
classes_. Regression and binary classification are special cases withk == 1, otherwisek==n_classes.
- fit(x, y, sample_weight=None)[source]#
 Build a Bagging ensemble of estimators from the training set (X, y).
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- yarray-like of shape (n_samples,)
 The target values (class labels in classification, real numbers in regression).
- sample_weightarray-like of shape (n_samples,), default=None
 Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
- **fit_paramsdict
 Parameters to pass to the underlying estimators.
Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using
sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.
- Returns:
 - selfobject
 Fitted estimator.
- get_metadata_routing()[source]#
 Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
 - routingMetadataRequest
 A
MetadataRequestencapsulating routing information.
- get_params(deep=True)[source]#
 Get parameters for this estimator.
- Parameters:
 - deepbool, default=True
 If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
 - paramsdict
 Parameter names mapped to their values.
- predict(X, **params)[source]#
 Predict class for X.
The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a
predict_probamethod, then it resorts to voting.- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the predict_proba (if available) or the predict method (otherwise) of the sub-estimators via the metadata routing API.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - yndarray of shape (n_samples,)
 The predicted classes.
- predict_log_proba(X, **params)[source]#
 Predict class log-probabilities for X.
The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the predict_log_proba, the predict_proba or the proba method of the sub-estimators via the metadata routing API. The routing is tried in the mentioned order depending on whether this method is available on the sub-estimator.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - pndarray of shape (n_samples, n_classes)
 The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- predict_proba(X)[source]#
 Predict class probabilities for X.
The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a
predict_probamethod, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the predict_proba (if available) or the predict method (otherwise) of the sub-estimators via the metadata routing API.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - pndarray of shape (n_samples, n_classes)
 The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- score(X, y, sample_weight=None)[source]#
 Return accuracy on provided data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
 - Xarray-like of shape (n_samples, n_features)
 Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
 True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
 Sample weights.
- Returns:
 - scorefloat
 Mean accuracy of
self.predict(X)w.r.t. y.
- set_params(**params)[source]#
 Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
 - **paramsdict
 Estimator parameters.
- Returns:
 - selfestimator instance
 Estimator instance.
- property estimators_samples_[source]#
 The subset of drawn samples for each base estimator.
Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.
Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.
- class wildboar.ensemble.IntervalForestRegressor(n_estimators=100, *, n_intervals='sqrt', intervals='fixed', summarizer='mean_var_slope', sample_size=0.5, min_size=0.0, max_size=1.0, oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0, criterion='squared_error', bootstrap=True, warm_start=False, n_jobs=None, random_state=None)[source]#
 An ensemble of interval tree regressors.
- Parameters:
 - n_estimatorsint, optional
 The number of estimators.
- n_intervalsstr, int or float, optional
 The number of intervals to use for the transform.
if “log2”, the number of intervals is log2(n_timestep).
if “sqrt”, the number of intervals is sqrt(n_timestep).
if int, the number of intervals is n_intervals.
if float, the number of intervals is n_intervals * n_timestep, with 0 < n_intervals < 1.
Deprecated since version 1.2: The option “log” has been renamed to “log2”.
- intervalsstr, optional
 The method for selecting intervals.
if “fixed”, n_intervals non-overlapping intervals.
if “sample”, n_intervals * sample_size non-overlapping intervals.
if “random”, n_intervals possibly overlapping intervals of randomly sampled in [min_size * n_timestep, max_size * n_timestep].
- summarizerstr or list, optional
 The method to summarize each interval.
if str, the summarizer is determined by _SUMMARIZERS.keys().
if list, the summarizer is a list of functions f(x) -> float, where x is a numpy array.
The default summarizer summarizes each interval as its mean, variance and slope.
- sample_sizefloat, optional
 The sample size of fixed intervals if intervals=”sample”.
- min_sizefloat, optional
 The minimum interval size if intervals=”random”.
- max_sizefloat, optional
 The maximum interval size if intervals=”random”.
- oob_scorebool, optional
 Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.
- max_depthint, optional
 The maximum tree depth.
- min_samples_splitint, optional
 The minimum number of samples to consider a split.
- min_samples_leafint, optional
 The minimum number of samples in a leaf.
- min_impurity_decreasefloat, optional
 The minimum impurity decrease to build a sub-tree.
- criterion{“entropy”, “gini”}, optional
 The impurity criterion.
- bootstrapbool, optional
 If the samples are drawn with replacement.
- warm_startbool, optional
 When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
- n_jobsint, optional
 The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.
- random_stateint or RandomState, optional
 If int, random_state is the seed used by the random number generator
If RandomState instance, random_state is the random number generator
If None, the random number generator is the RandomState instance used by np.random.
- fit(x, y, sample_weight=None)[source]#
 Build a Bagging ensemble of estimators from the training set (X, y).
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- yarray-like of shape (n_samples,)
 The target values (class labels in classification, real numbers in regression).
- sample_weightarray-like of shape (n_samples,), default=None
 Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
- **fit_paramsdict
 Parameters to pass to the underlying estimators.
Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using
sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.
- Returns:
 - selfobject
 Fitted estimator.
- get_metadata_routing()[source]#
 Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
 - routingMetadataRequest
 A
MetadataRequestencapsulating routing information.
- get_params(deep=True)[source]#
 Get parameters for this estimator.
- Parameters:
 - deepbool, default=True
 If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
 - paramsdict
 Parameter names mapped to their values.
- predict(X)[source]#
 Predict regression target for X.
The predicted regression target of an input sample is computed as the mean predicted regression targets of the estimators in the ensemble.
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the predict method of the sub-estimators via the metadata routing API.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - yndarray of shape (n_samples,)
 The predicted values.
- score(X, y, sample_weight=None)[source]#
 Return coefficient of determination on test data.
The coefficient of determination, \(R^2\), is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
 - Xarray-like of shape (n_samples, n_features)
 Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted), wheren_samples_fittedis the number of samples used in the fitting for the estimator.- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
 True values for X.
- sample_weightarray-like of shape (n_samples,), default=None
 Sample weights.
- Returns:
 - scorefloat
 \(R^2\) of
self.predict(X)w.r.t. y.
Notes
The \(R^2\) score used when calling
scoreon a regressor usesmultioutput='uniform_average'from version 0.23 to keep consistent with default value ofr2_score. This influences thescoremethod of all the multioutput regressors (except forMultiOutputRegressor).
- set_params(**params)[source]#
 Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
 - **paramsdict
 Estimator parameters.
- Returns:
 - selfestimator instance
 Estimator instance.
- property estimators_samples_[source]#
 The subset of drawn samples for each base estimator.
Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.
Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.
- class wildboar.ensemble.IsolationShapeletForest(n_estimators=100, *, n_shapelets=1, bootstrap=False, n_jobs=None, min_shapelet_size=0, max_shapelet_size=1, min_samples_split=2, max_samples='auto', contamination='auto', warm_start=False, metric='euclidean', metric_params=None, random_state=None)[source]#
 An isolation shapelet forest.
Added in version 0.3.5.
- Parameters:
 - n_estimatorsint, optional
 The number of estimators in the ensemble.
- n_shapeletsint, optional
 The number of shapelets to sample at each node.
- bootstrapbool, optional
 If the samples are drawn with replacement.
- n_jobsint, optional
 The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.
- min_shapelet_sizefloat, optional
 The minimum length of a shapelets expressed as a fraction of n_timestep.
- max_shapelet_sizefloat, optional
 The maximum length of a shapelets expressed as a fraction of n_timestep.
- min_samples_splitint, optional
 The minimum number of samples to split an internal node.
- max_samples“auto”, float or int, optional
 The number of samples to draw to train each base estimator.
- contamination‘auto’ or float, optional
 The strategy for computing the offset.
if “auto” then offset_ is set to -0.5.
if float offset_ is computed as the c:th percentile of scores.
If bootstrap=True, out-of-bag samples are used for computing the scores.
- warm_startbool, optional
 When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
- metricstr or list, optional
 The distance metric.
If str, the distance metric used to identify the best shapelet.
If list, multiple metrics specified as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument r with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).
Read more about metric specifications in the User guide.
Changed in version 1.2: Added support for multi-metric shapelet transform
- metric_paramsdict, optional
 Parameters for the distance measure. Ignored unless metric is a string.
Read more about the parameters in the User guide.
- random_stateint or RandomState, optional
 Controls the random resampling of the original dataset.
If int, random_state is the seed used by the random number generator.
If
numpy.random.RandomStateinstance, random_state is the random number generator.If None, the random number generator is the
numpy.random.RandomStateinstance used bynumpy.random.
- Attributes:
 - offset_float
 The offset for computing the final decision
Examples
Using default offset threshold
>>> from wildboar.ensemble import IsolationShapeletForest >>> from wildboar.datasets import load_two_lead_ecg >>> from wildboar.model_selection import outlier_train_test_split >>> from sklearn.metrics import balanced_accuracy_score >>> f = IsolationShapeletForest(random_state=1) >>> x, y = load_two_lead_ecg() >>> x_train, x_test, y_train, y_test = outlier_train_test_split( ... x, y, 1, test_size=0.2, anomalies_train_size=0.05, random_state=1 ... ) >>> f.fit(x_train) IsolationShapeletForest(random_state=1) >>> y_pred = f.predict(x_test) >>> balanced_accuracy_score(y_test, y_pred) 0.8674
- fit(x, y=None, sample_weight=None)[source]#
 Build a Bagging ensemble of estimators from the training set (X, y).
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- yarray-like of shape (n_samples,)
 The target values (class labels in classification, real numbers in regression).
- sample_weightarray-like of shape (n_samples,), default=None
 Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
- **fit_paramsdict
 Parameters to pass to the underlying estimators.
Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using
sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.
- Returns:
 - selfobject
 Fitted estimator.
- fit_predict(X, y=None, **kwargs)[source]#
 Perform fit on X and returns labels for X.
Returns -1 for outliers and 1 for inliers.
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The input samples.
- yIgnored
 Not used, present for API consistency by convention.
- **kwargsdict
 Arguments to be passed to
fit.Added in version 1.4.
- Returns:
 - yndarray of shape (n_samples,)
 1 for inliers, -1 for outliers.
- get_metadata_routing()[source]#
 Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
 - routingMetadataRequest
 A
MetadataRequestencapsulating routing information.
- get_params(deep=True)[source]#
 Get parameters for this estimator.
- Parameters:
 - deepbool, default=True
 If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
 - paramsdict
 Parameter names mapped to their values.
- set_params(**params)[source]#
 Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
 - **paramsdict
 Estimator parameters.
- Returns:
 - selfestimator instance
 Estimator instance.
- property estimators_samples_[source]#
 The subset of drawn samples for each base estimator.
Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.
Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.
- class wildboar.ensemble.PivotForestClassifier(n_estimators=100, *, n_pivot='sqrt', metrics='all', oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0, criterion='entropy', bootstrap=True, warm_start=False, n_jobs=None, class_weight=None, random_state=None)[source]#
 An ensemble of interval tree classifiers.
- decision_function(X, **params)[source]#
 Average of the decision functions of the base classifiers.
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the decision_function method of the sub-estimators via the metadata routing API.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - scorendarray of shape (n_samples, k)
 The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute
classes_. Regression and binary classification are special cases withk == 1, otherwisek==n_classes.
- fit(x, y, sample_weight=None)[source]#
 Build a Bagging ensemble of estimators from the training set (X, y).
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- yarray-like of shape (n_samples,)
 The target values (class labels in classification, real numbers in regression).
- sample_weightarray-like of shape (n_samples,), default=None
 Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
- **fit_paramsdict
 Parameters to pass to the underlying estimators.
Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using
sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.
- Returns:
 - selfobject
 Fitted estimator.
- get_metadata_routing()[source]#
 Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
 - routingMetadataRequest
 A
MetadataRequestencapsulating routing information.
- get_params(deep=True)[source]#
 Get parameters for this estimator.
- Parameters:
 - deepbool, default=True
 If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
 - paramsdict
 Parameter names mapped to their values.
- predict(X, **params)[source]#
 Predict class for X.
The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a
predict_probamethod, then it resorts to voting.- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the predict_proba (if available) or the predict method (otherwise) of the sub-estimators via the metadata routing API.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - yndarray of shape (n_samples,)
 The predicted classes.
- predict_log_proba(X, **params)[source]#
 Predict class log-probabilities for X.
The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the predict_log_proba, the predict_proba or the proba method of the sub-estimators via the metadata routing API. The routing is tried in the mentioned order depending on whether this method is available on the sub-estimator.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - pndarray of shape (n_samples, n_classes)
 The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- predict_proba(X)[source]#
 Predict class probabilities for X.
The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a
predict_probamethod, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the predict_proba (if available) or the predict method (otherwise) of the sub-estimators via the metadata routing API.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - pndarray of shape (n_samples, n_classes)
 The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- score(X, y, sample_weight=None)[source]#
 Return accuracy on provided data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
 - Xarray-like of shape (n_samples, n_features)
 Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
 True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
 Sample weights.
- Returns:
 - scorefloat
 Mean accuracy of
self.predict(X)w.r.t. y.
- set_params(**params)[source]#
 Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
 - **paramsdict
 Estimator parameters.
- Returns:
 - selfestimator instance
 Estimator instance.
- property estimators_samples_[source]#
 The subset of drawn samples for each base estimator.
Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.
Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.
- class wildboar.ensemble.ProximityForestClassifier(n_estimators=100, *, n_pivot=1, pivot_sample='label', metric_sample='weighted', metric='auto', metric_params=None, metric_factories=None, oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0, criterion='entropy', bootstrap=True, warm_start=False, n_jobs=None, class_weight=None, random_state=None)[source]#
 A forest of proximity trees.
- Parameters:
 - n_estimatorsint, optional
 The number of estimators.
- n_pivotint, optional
 The number of pivots to sample at each node.
- pivot_sample{“label”, “uniform”}, optional
 The pivot sampling method.
- metric_sample{“uniform”, “weighted”}, optional
 The metric sampling method.
- metric{“auto”, “default”}, str or list, optional
 The distance metrics. By default, we use the parameterization suggested by Lucas et.al (2019).
If “auto”, use the default metric specification, suggested by (Lucas et. al, 2020).
If str, use a single metric or default metric specification.
If list, custom metric specification can be given as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values as well as the number of values in the grid. For example, to specifiy a grid over the argument ‘r’ with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).
Read more about the metrics and their parameters in the User guide.
- metric_paramsdict, optional
 Parameters for the distance measure. Ignored unless metric is a string.
Read more about the parameters in the User guide.
- metric_factoriesdict, optional
 A metric specification.
Deprecated since version 1.2: Use the combination of metric and metric params.
- oob_scorebool, optional
 Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.
- max_depthint, optional
 The maximum tree depth.
- min_samples_splitint, optional
 The minimum number of samples to consider a split.
- min_samples_leafint, optional
 The minimum number of samples in a leaf.
- min_impurity_decreasefloat, optional
 The minimum impurity decrease to build a sub-tree.
- criterion{“entropy”, “gini”}, optional
 The impurity criterion.
- bootstrapbool, optional
 If the samples are drawn with replacement.
- warm_startbool, optional
 When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
- n_jobsint, optional
 The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.
- class_weightdict or “balanced”, optional
 Weights associated with the labels.
if dict, weights on the form {label: weight}.
- if “balanced” each class weight inversely proportional to the class
 frequency.
if None, each class has equal weight.
- random_stateint or RandomState, optional
 If int, random_state is the seed used by the random number generator
If RandomState instance, random_state is the random number generator
- If None, the random number generator is the RandomState instance used
 by np.random.
References
- Lucas, Benjamin, Ahmed Shifaz, Charlotte Pelletier, Lachlan O’Neill, Nayyar Zaidi, Bart Goethals, François Petitjean, and Geoffrey I. Webb. (2019)
 Proximity forest: an effective and scalable distance-based classifier for time series. Data Mining and Knowledge Discovery
- decision_function(X, **params)[source]#
 Average of the decision functions of the base classifiers.
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the decision_function method of the sub-estimators via the metadata routing API.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - scorendarray of shape (n_samples, k)
 The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute
classes_. Regression and binary classification are special cases withk == 1, otherwisek==n_classes.
- fit(x, y, sample_weight=None)[source]#
 Build a Bagging ensemble of estimators from the training set (X, y).
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- yarray-like of shape (n_samples,)
 The target values (class labels in classification, real numbers in regression).
- sample_weightarray-like of shape (n_samples,), default=None
 Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
- **fit_paramsdict
 Parameters to pass to the underlying estimators.
Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using
sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.
- Returns:
 - selfobject
 Fitted estimator.
- get_metadata_routing()[source]#
 Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
 - routingMetadataRequest
 A
MetadataRequestencapsulating routing information.
- get_params(deep=True)[source]#
 Get parameters for this estimator.
- Parameters:
 - deepbool, default=True
 If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
 - paramsdict
 Parameter names mapped to their values.
- predict(X, **params)[source]#
 Predict class for X.
The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a
predict_probamethod, then it resorts to voting.- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the predict_proba (if available) or the predict method (otherwise) of the sub-estimators via the metadata routing API.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - yndarray of shape (n_samples,)
 The predicted classes.
- predict_log_proba(X, **params)[source]#
 Predict class log-probabilities for X.
The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the predict_log_proba, the predict_proba or the proba method of the sub-estimators via the metadata routing API. The routing is tried in the mentioned order depending on whether this method is available on the sub-estimator.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - pndarray of shape (n_samples, n_classes)
 The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- predict_proba(X)[source]#
 Predict class probabilities for X.
The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a
predict_probamethod, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the predict_proba (if available) or the predict method (otherwise) of the sub-estimators via the metadata routing API.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - pndarray of shape (n_samples, n_classes)
 The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- score(X, y, sample_weight=None)[source]#
 Return accuracy on provided data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
 - Xarray-like of shape (n_samples, n_features)
 Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
 True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
 Sample weights.
- Returns:
 - scorefloat
 Mean accuracy of
self.predict(X)w.r.t. y.
- set_params(**params)[source]#
 Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
 - **paramsdict
 Estimator parameters.
- Returns:
 - selfestimator instance
 Estimator instance.
- property estimators_samples_[source]#
 The subset of drawn samples for each base estimator.
Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.
Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.
- class wildboar.ensemble.RocketForestClassifier(n_estimators=100, *, n_kernels=10, oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, sampling='normal', sampling_params=None, kernel_size=None, min_size=None, max_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, criterion='entropy', bootstrap=True, warm_start=False, class_weight=None, n_jobs=None, random_state=None)[source]#
 An ensemble of rocket tree classifiers.
- Parameters:
 - n_estimatorsint, optional
 The number of estimators.
- n_kernelsint, optional
 The number of shapelets to sample at each node.
- oob_scorebool, optional
 Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.
- max_depthint, optional
 The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
- min_samples_splitint, optional
 The minimum number of samples to split an internal node.
- min_samples_leafint, optional
 The minimum number of samples in a leaf.
- min_impurity_decreasefloat, optional
 A split will be introduced only if the impurity decrease is larger than or equal to this value.
- sampling{“normal”, “uniform”, “shapelet”}, optional
 The sampling of convolutional filters.
- if “normal”, sample filter according to a normal distribution with
 meanandscale.
- if “uniform”, sample filter according to a uniform distribution with
 lowerandupper.
if “shapelet”, sample filters as subsequences in the training data.
- sampling_paramsdict, optional
 The parameters for the sampling.
- if “normal”, 
{"mean": float, "scale": float}, defaults to {"mean": 0, "scale": 1}.
- if “normal”, 
 - if “uniform”, 
{"lower": float, "upper": float}, defaults to {"lower": -1, "upper": 1}.
- if “uniform”, 
 
- kernel_sizearray-like, optional
 The kernel size, by default
[7, 11, 13].- min_sizefloat, optional
 The minimum length of a shapelets expressed as a fraction of n_timestep.
- max_sizefloat, optional
 The maximum length of a shapelets expressed as a fraction of n_timestep.
- bias_probfloat, optional
 The probability of using a bias term.
- normalize_probfloat, optional
 The probability of performing normalization.
- padding_probfloat, optional
 The probability of padding with zeros.
- criterion{“entropy”, “gini”}, optional
 The criterion used to evaluate the utility of a split.
- bootstrapbool, optional
 If the samples are drawn with replacement.
- warm_startbool, optional
 When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
- class_weightdict or “balanced”, optional
 Weights associated with the labels
if dict, weights on the form {label: weight}.
if “balanced” each class weight inversely proportional to the class frequency.
if
None, each class has equal weight.
- n_jobsint, optional
 The number of processor cores used for fitting the ensemble.
- random_stateint or RandomState, optional
 Controls the random resampling of the original dataset.
If int, random_state is the seed used by the random number generator
If
numpy.random.RandomStateinstance, random_state is the random number generatorIf None, the random number generator is the
numpy.random.RandomStateinstance used bynumpy.random.
- decision_function(X, **params)[source]#
 Average of the decision functions of the base classifiers.
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the decision_function method of the sub-estimators via the metadata routing API.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - scorendarray of shape (n_samples, k)
 The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute
classes_. Regression and binary classification are special cases withk == 1, otherwisek==n_classes.
- fit(x, y, sample_weight=None)[source]#
 Build a Bagging ensemble of estimators from the training set (X, y).
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- yarray-like of shape (n_samples,)
 The target values (class labels in classification, real numbers in regression).
- sample_weightarray-like of shape (n_samples,), default=None
 Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
- **fit_paramsdict
 Parameters to pass to the underlying estimators.
Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using
sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.
- Returns:
 - selfobject
 Fitted estimator.
- get_metadata_routing()[source]#
 Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
 - routingMetadataRequest
 A
MetadataRequestencapsulating routing information.
- get_params(deep=True)[source]#
 Get parameters for this estimator.
- Parameters:
 - deepbool, default=True
 If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
 - paramsdict
 Parameter names mapped to their values.
- predict(X, **params)[source]#
 Predict class for X.
The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a
predict_probamethod, then it resorts to voting.- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the predict_proba (if available) or the predict method (otherwise) of the sub-estimators via the metadata routing API.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - yndarray of shape (n_samples,)
 The predicted classes.
- predict_log_proba(X, **params)[source]#
 Predict class log-probabilities for X.
The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the predict_log_proba, the predict_proba or the proba method of the sub-estimators via the metadata routing API. The routing is tried in the mentioned order depending on whether this method is available on the sub-estimator.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - pndarray of shape (n_samples, n_classes)
 The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- predict_proba(X)[source]#
 Predict class probabilities for X.
The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a
predict_probamethod, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the predict_proba (if available) or the predict method (otherwise) of the sub-estimators via the metadata routing API.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - pndarray of shape (n_samples, n_classes)
 The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- score(X, y, sample_weight=None)[source]#
 Return accuracy on provided data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
 - Xarray-like of shape (n_samples, n_features)
 Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
 True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
 Sample weights.
- Returns:
 - scorefloat
 Mean accuracy of
self.predict(X)w.r.t. y.
- set_params(**params)[source]#
 Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
 - **paramsdict
 Estimator parameters.
- Returns:
 - selfestimator instance
 Estimator instance.
- property estimators_samples_[source]#
 The subset of drawn samples for each base estimator.
Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.
Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.
- class wildboar.ensemble.RocketForestRegressor(n_estimators=100, *, n_kernels=10, oob_score=False, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, impurity_equality_tolerance=None, sampling='normal', sampling_params=None, kernel_size=None, min_size=None, max_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, criterion='squared_error', bootstrap=True, warm_start=False, n_jobs=None, random_state=None)[source]#
 An ensemble of rocket tree regressors.
- Parameters:
 - n_estimatorsint, optional
 The number of estimators.
- n_kernelsint, optional
 The number of shapelets to sample at each node.
- oob_scorebool, optional
 Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.
- max_depthint, optional
 The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
- min_samples_splitint, optional
 The minimum number of samples to split an internal node.
- min_samples_leafint, optional
 The minimum number of samples in a leaf.
- min_impurity_decreasefloat, optional
 A split will be introduced only if the impurity decrease is larger than or equal to this value.
- sampling{“normal”, “uniform”, “shapelet”}, optional
 The sampling of convolutional filters.
- if “normal”, sample filter according to a normal distribution with
 meanandscale.
- if “uniform”, sample filter according to a uniform distribution with
 lowerandupper.
if “shapelet”, sample filters as subsequences in the training data.
- sampling_paramsdict, optional
 The parameters for the sampling.
- if “normal”, 
{"mean": float, "scale": float}, defaults to {"mean": 0, "scale": 1}.
- if “normal”, 
 - if “uniform”, 
{"lower": float, "upper": float}, defaults to {"lower": -1, "upper": 1}.
- if “uniform”, 
 
- kernel_sizearray-like, optional
 The kernel size, by default
[7, 11, 13].- min_sizefloat, optional
 The minimum length of a shapelets expressed as a fraction of n_timestep.
- max_sizefloat, optional
 The maximum length of a shapelets expressed as a fraction of n_timestep.
- bias_probfloat, optional
 The probability of using a bias term.
- normalize_probfloat, optional
 The probability of performing normalization.
- padding_probfloat, optional
 The probability of padding with zeros.
- criterion{“squared_error”}, optional
 The criterion used to evaluate the utility of a split.
Deprecated since version 1.1: Criterion “mse” was deprecated in v1.1 and removed in version 1.2.
- bootstrapbool, optional
 If the samples are drawn with replacement.
- warm_startbool, optional
 When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
- n_jobsint, optional
 The number of processor cores used for fitting the ensemble.
- random_stateint or RandomState, optional
 Controls the random resampling of the original dataset.
If int, random_state is the seed used by the random number generator
If
numpy.random.RandomStateinstance, random_state is the random number generatorIf None, the random number generator is the
numpy.random.RandomStateinstance used bynumpy.random.
- fit(x, y, sample_weight=None)[source]#
 Build a Bagging ensemble of estimators from the training set (X, y).
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- yarray-like of shape (n_samples,)
 The target values (class labels in classification, real numbers in regression).
- sample_weightarray-like of shape (n_samples,), default=None
 Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
- **fit_paramsdict
 Parameters to pass to the underlying estimators.
Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using
sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.
- Returns:
 - selfobject
 Fitted estimator.
- get_metadata_routing()[source]#
 Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
 - routingMetadataRequest
 A
MetadataRequestencapsulating routing information.
- get_params(deep=True)[source]#
 Get parameters for this estimator.
- Parameters:
 - deepbool, default=True
 If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
 - paramsdict
 Parameter names mapped to their values.
- predict(X)[source]#
 Predict regression target for X.
The predicted regression target of an input sample is computed as the mean predicted regression targets of the estimators in the ensemble.
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the predict method of the sub-estimators via the metadata routing API.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - yndarray of shape (n_samples,)
 The predicted values.
- score(X, y, sample_weight=None)[source]#
 Return coefficient of determination on test data.
The coefficient of determination, \(R^2\), is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
 - Xarray-like of shape (n_samples, n_features)
 Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted), wheren_samples_fittedis the number of samples used in the fitting for the estimator.- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
 True values for X.
- sample_weightarray-like of shape (n_samples,), default=None
 Sample weights.
- Returns:
 - scorefloat
 \(R^2\) of
self.predict(X)w.r.t. y.
Notes
The \(R^2\) score used when calling
scoreon a regressor usesmultioutput='uniform_average'from version 0.23 to keep consistent with default value ofr2_score. This influences thescoremethod of all the multioutput regressors (except forMultiOutputRegressor).
- set_params(**params)[source]#
 Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
 - **paramsdict
 Estimator parameters.
- Returns:
 - selfestimator instance
 Estimator instance.
- property estimators_samples_[source]#
 The subset of drawn samples for each base estimator.
Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.
Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.
- class wildboar.ensemble.ShapeletForestClassifier(n_estimators=100, *, n_shapelets='log2', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, impurity_equality_tolerance=None, min_shapelet_size=0.0, max_shapelet_size=1.0, alpha=None, coverage_probability=None, variability=1, metric='euclidean', metric_params=None, criterion='entropy', oob_score=False, bootstrap=True, warm_start=False, class_weight=None, n_jobs=None, random_state=None)[source]#
 An ensemble of random shapelet tree classifiers.
A forest of randomized shapelet trees.
- Parameters:
 - n_estimatorsint, optional
 The number of estimators.
- n_shapeletsint, optional
 The number of shapelets to sample at each node.
- max_depthint, optional
 The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
- min_samples_splitint, optional
 The minimum number of samples to split an internal node.
- min_samples_leafint, optional
 The minimum number of samples in a leaf.
- min_impurity_decreasefloat, optional
 A split will be introduced only if the impurity decrease is larger than or equal to this value.
- impurity_equality_tolerancefloat, optional
 Tolerance for considering two impurities as equal. If the impurity decrease is the same, we consider the split that maximizes the gap between the sum of distances.
If None, we never consider the separation gap.
Added in version 1.3.
- min_shapelet_sizefloat, optional
 The minimum length of a shapelets expressed as a fraction of n_timestep.
- max_shapelet_sizefloat, optional
 The maximum length of a shapelets expressed as a fraction of n_timestep.
- coverage_probabilityfloat, optional
 The probability that a time step is covered by a shapelet, in the range 0 < coverage_probability <= 1.
For larger coverage_probability, we get larger shapelets.
For smaller coverage_probability, we get shorter shapelets.
- variabilityfloat, optional
 Controls the shape of the Beta distribution used to sample shapelets. Defaults to 1.
Higher variability creates more uniform intervals.
Lower variability creates more variable intervals sizes.
- alphafloat, optional
 Dynamically decrease the number of sampled shapelets at each node according to the current depth, i.e.
w = 1 - exp(-abs(alpha) * depth)
if alpha < 0, the number of sampled shapelets decrease from n_shapelets towards 1 with increased depth.
if alpha > 0, the number of sampled shapelets increase from 1 towards n_shapelets with increased depth.
if None, the number of sampled shapelets are the same independeth of depth.
- metricstr or list, optional
 The distance metric.
If str, the distance metric used to identify the best shapelet.
If list, multiple metrics specified as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument r with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).
Read more about metric specifications in the User guide.
Changed in version 1.2: Added support for multi-metric shapelet transform
- metric_paramsdict, optional
 Parameters for the distance measure. Ignored unless metric is a string.
Read more about the parameters in the User guide.
- criterion{“entropy”, “gini”}, optional
 The criterion used to evaluate the utility of a split.
- oob_scorebool, optional
 Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.
- bootstrapbool, optional
 If the samples are drawn with replacement.
- warm_startbool, optional
 When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
- class_weightdict or “balanced”, optional
 Weights associated with the labels
if dict, weights on the form {label: weight}.
if “balanced” each class weight inversely proportional to the class frequency.
if None, each class has equal weight.
- n_jobsint, optional
 The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.
- random_stateint or RandomState, optional
 Controls the random resampling of the original dataset.
If int, random_state is the seed used by the random number generator
If
numpy.random.RandomStateinstance, random_state is the random number generatorIf None, the random number generator is the
numpy.random.RandomStateinstance used bynumpy.random.
Examples
>>> from wildboar.ensemble import ShapeletForestClassifier >>> from wildboar.datasets import load_synthetic_control >>> x, y = load_synthetic_control() >>> f = ShapeletForestClassifier(n_estimators=100, metric='scaled_euclidean') >>> f.fit(x, y) ShapeletForestClassifier(metric='scaled_euclidean') >>> y_hat = f.predict(x)
- decision_function(X, **params)[source]#
 Average of the decision functions of the base classifiers.
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the decision_function method of the sub-estimators via the metadata routing API.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - scorendarray of shape (n_samples, k)
 The decision function of the input samples. The columns correspond to the classes in sorted order, as they appear in the attribute
classes_. Regression and binary classification are special cases withk == 1, otherwisek==n_classes.
- fit(x, y, sample_weight=None)[source]#
 Build a Bagging ensemble of estimators from the training set (X, y).
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- yarray-like of shape (n_samples,)
 The target values (class labels in classification, real numbers in regression).
- sample_weightarray-like of shape (n_samples,), default=None
 Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
- **fit_paramsdict
 Parameters to pass to the underlying estimators.
Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using
sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.
- Returns:
 - selfobject
 Fitted estimator.
- get_metadata_routing()[source]#
 Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
 - routingMetadataRequest
 A
MetadataRequestencapsulating routing information.
- get_params(deep=True)[source]#
 Get parameters for this estimator.
- Parameters:
 - deepbool, default=True
 If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
 - paramsdict
 Parameter names mapped to their values.
- predict(X, **params)[source]#
 Predict class for X.
The predicted class of an input sample is computed as the class with the highest mean predicted probability. If base estimators do not implement a
predict_probamethod, then it resorts to voting.- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the predict_proba (if available) or the predict method (otherwise) of the sub-estimators via the metadata routing API.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - yndarray of shape (n_samples,)
 The predicted classes.
- predict_log_proba(X, **params)[source]#
 Predict class log-probabilities for X.
The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the base estimators in the ensemble.
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the predict_log_proba, the predict_proba or the proba method of the sub-estimators via the metadata routing API. The routing is tried in the mentioned order depending on whether this method is available on the sub-estimator.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - pndarray of shape (n_samples, n_classes)
 The class log-probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- predict_proba(X)[source]#
 Predict class probabilities for X.
The predicted class probabilities of an input sample is computed as the mean predicted class probabilities of the base estimators in the ensemble. If base estimators do not implement a
predict_probamethod, then it resorts to voting and the predicted class probabilities of an input sample represents the proportion of estimators predicting each class.- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the predict_proba (if available) or the predict method (otherwise) of the sub-estimators via the metadata routing API.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - pndarray of shape (n_samples, n_classes)
 The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- score(X, y, sample_weight=None)[source]#
 Return accuracy on provided data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
 - Xarray-like of shape (n_samples, n_features)
 Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
 True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
 Sample weights.
- Returns:
 - scorefloat
 Mean accuracy of
self.predict(X)w.r.t. y.
- set_params(**params)[source]#
 Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
 - **paramsdict
 Estimator parameters.
- Returns:
 - selfestimator instance
 Estimator instance.
- property estimators_samples_[source]#
 The subset of drawn samples for each base estimator.
Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.
Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.
- class wildboar.ensemble.ShapeletForestEmbedding(n_estimators=100, *, n_shapelets=1, max_depth=5, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, coverage_probability=None, variability=1, metric='euclidean', metric_params=None, criterion='squared_error', bootstrap=True, warm_start=False, n_jobs=None, sparse_output=True, random_state=None)[source]#
 An ensemble of random shapelet trees.
An unsupervised transformation of a time series dataset to a high-dimensional sparse representation. A time series i indexed by the leaf that it falls into. This leads to a binary coding of a time series with as many ones as trees in the forest.
The dimensionality of the resulting representation is <= n_estimators * 2^max_depth
- Parameters:
 - n_estimatorsint, optional
 The number of estimators.
- n_shapeletsint, optional
 The number of shapelets to sample at each node.
- max_depthint, optional
 The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
- min_samples_splitint, optional
 The minimum number of samples to split an internal node.
- min_samples_leafint, optional
 The minimum number of samples in a leaf.
- min_impurity_decreasefloat, optional
 A split will be introduced only if the impurity decrease is larger than or equal to this value.
- min_shapelet_sizefloat, optional
 The minimum length of a shapelets expressed as a fraction of n_timestep.
- max_shapelet_sizefloat, optional
 The maximum length of a shapelets expressed as a fraction of n_timestep.
- coverage_probabilityfloat, optional
 The probability that a time step is covered by a shapelet, in the range 0 < coverage_probability <= 1.
For larger coverage_probability, we get larger shapelets.
For smaller coverage_probability, we get shorter shapelets.
- variabilityfloat, optional
 Controls the shape of the Beta distribution used to sample shapelets. Defaults to 1.
Higher variability creates more uniform intervals.
Lower variability creates more variable intervals sizes.
- metricstr or list, optional
 The distance metric.
If str, the distance metric used to identify the best shapelet.
If list, multiple metrics specified as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument r with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).
Read more about metric specifications in the User guide.
Changed in version 1.2: Added support for multi-metric shapelet transform
- metric_paramsdict, optional
 Parameters for the distance measure. Ignored unless metric is a string.
Read more about the parameters in the User guide.
- criterion{“squared_error”}, optional
 The criterion used to evaluate the utility of a split.
Deprecated since version 1.1: Criterion “mse” was deprecated in v1.1 and removed in version 1.2.
- bootstrapbool, optional
 If the samples are drawn with replacement.
- warm_startbool, optional
 When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
- n_jobsint, optional
 The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.
- sparse_outputbool, optional
 Return a sparse CSR-matrix.
- random_stateint or RandomState, optional
 Controls the random resampling of the original dataset.
If int, random_state is the seed used by the random number generator.
If
numpy.random.RandomStateinstance, random_state is the random number generator.If None, the random number generator is the
numpy.random.RandomStateinstance used bynumpy.random.
- fit(x, y=None, sample_weight=None)[source]#
 Build a Bagging ensemble of estimators from the training set (X, y).
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- yarray-like of shape (n_samples,)
 The target values (class labels in classification, real numbers in regression).
- sample_weightarray-like of shape (n_samples,), default=None
 Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
- **fit_paramsdict
 Parameters to pass to the underlying estimators.
Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using
sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.
- Returns:
 - selfobject
 Fitted estimator.
- get_metadata_routing()[source]#
 Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
 - routingMetadataRequest
 A
MetadataRequestencapsulating routing information.
- get_params(deep=True)[source]#
 Get parameters for this estimator.
- Parameters:
 - deepbool, default=True
 If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
 - paramsdict
 Parameter names mapped to their values.
- predict(X)[source]#
 Predict regression target for X.
The predicted regression target of an input sample is computed as the mean predicted regression targets of the estimators in the ensemble.
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the predict method of the sub-estimators via the metadata routing API.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - yndarray of shape (n_samples,)
 The predicted values.
- score(X, y, sample_weight=None)[source]#
 Return coefficient of determination on test data.
The coefficient of determination, \(R^2\), is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
 - Xarray-like of shape (n_samples, n_features)
 Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted), wheren_samples_fittedis the number of samples used in the fitting for the estimator.- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
 True values for X.
- sample_weightarray-like of shape (n_samples,), default=None
 Sample weights.
- Returns:
 - scorefloat
 \(R^2\) of
self.predict(X)w.r.t. y.
Notes
The \(R^2\) score used when calling
scoreon a regressor usesmultioutput='uniform_average'from version 0.23 to keep consistent with default value ofr2_score. This influences thescoremethod of all the multioutput regressors (except forMultiOutputRegressor).
- set_params(**params)[source]#
 Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
 - **paramsdict
 Estimator parameters.
- Returns:
 - selfestimator instance
 Estimator instance.
- property estimators_samples_[source]#
 The subset of drawn samples for each base estimator.
Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.
Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.
- class wildboar.ensemble.ShapeletForestRegressor(n_estimators=100, *, n_shapelets='log2', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, impurity_equality_tolerance=None, min_shapelet_size=0.0, max_shapelet_size=1.0, coverage_probability=None, variability=1, alpha=None, metric='euclidean', metric_params=None, criterion='squared_error', oob_score=False, bootstrap=True, warm_start=False, n_jobs=None, random_state=None)[source]#
 An ensemble of random shapelet tree regressors.
- Parameters:
 - n_estimatorsint, optional
 The number of estimators.
- n_shapeletsint, optional
 The number of shapelets to sample at each node.
- max_depthint, optional
 The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
- min_samples_splitint, optional
 The minimum number of samples to split an internal node.
- min_samples_leafint, optional
 The minimum number of samples in a leaf.
- min_impurity_decreasefloat, optional
 A split will be introduced only if the impurity decrease is larger than or equal to this value.
- impurity_equality_tolerancefloat, optional
 Tolerance for considering two impurities as equal. If the impurity decrease is the same, we consider the split that maximizes the gap between the sum of distances.
If None, we never consider the separation gap.
Added in version 1.3.
- min_shapelet_sizefloat, optional
 The minimum length of a shapelets expressed as a fraction of n_timestep.
- max_shapelet_sizefloat, optional
 The maximum length of a shapelets expressed as a fraction of n_timestep.
- coverage_probabilityfloat, optional
 The probability that a time step is covered by a shapelet, in the range 0 < coverage_probability <= 1.
For larger coverage_probability, we get larger shapelets.
For smaller coverage_probability, we get shorter shapelets.
- variabilityfloat, optional
 Controls the shape of the Beta distribution used to sample shapelets. Defaults to 1.
Higher variability creates more uniform intervals.
Lower variability creates more variable intervals sizes.
- alphafloat, optional
 Dynamically decrease the number of sampled shapelets at each node according to the current depth, i.e.
w = 1 - exp(-abs(alpha) * depth)
if alpha < 0, the number of sampled shapelets decrease from n_shapelets towards 1 with increased depth.
if alpha > 0, the number of sampled shapelets increase from 1 towards n_shapelets with increased depth.
if None, the number of sampled shapelets are the same independeth of depth.
- metricstr or list, optional
 The distance metric.
If str, the distance metric used to identify the best shapelet.
If list, multiple metrics specified as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument r with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).
Read more about metric specifications in the User guide.
Changed in version 1.2: Added support for multi-metric shapelet transform
- metric_paramsdict, optional
 Parameters for the distance measure. Ignored unless metric is a string.
Read more about the parameters in the User guide.
- criterion{“squared_error”}, optional
 The criterion used to evaluate the utility of a split.
Deprecated since version 1.1: Criterion “mse” was deprecated in v1.1 and removed in version 1.2.
- oob_scorebool, optional
 Use out-of-bag samples to estimate generalization performance. Requires bootstrap=True.
- bootstrapbool, optional
 If the samples are drawn with replacement.
- warm_startbool, optional
 When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new ensemble.
- n_jobsint, optional
 The number of processor cores used for fitting the ensemble.
- random_stateint or RandomState, optional
 Controls the random resampling of the original dataset.
If int, random_state is the seed used by the random number generator
If
numpy.random.RandomStateinstance, random_state is the random number generatorIf None, the random number generator is the
numpy.random.RandomStateinstance used bynumpy.random.
Examples
>>> from wildboar.ensemble import ShapeletForestRegressor >>> from wildboar.datasets import load_synthetic_control >>> x, y = load_synthetic_control() >>> f = ShapeletForestRegressor(n_estimators=100, metric='scaled_euclidean') >>> f.fit(x, y) ShapeletForestRegressor(metric='scaled_euclidean') >>> y_hat = f.predict(x)
- fit(x, y, sample_weight=None)[source]#
 Build a Bagging ensemble of estimators from the training set (X, y).
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- yarray-like of shape (n_samples,)
 The target values (class labels in classification, real numbers in regression).
- sample_weightarray-like of shape (n_samples,), default=None
 Sample weights. If None, then samples are equally weighted. Note that this is supported only if the base estimator supports sample weighting.
- **fit_paramsdict
 Parameters to pass to the underlying estimators.
Added in version 1.5: Only available if enable_metadata_routing=True, which can be set by using
sklearn.set_config(enable_metadata_routing=True). See Metadata Routing User Guide for more details.
- Returns:
 - selfobject
 Fitted estimator.
- get_metadata_routing()[source]#
 Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
 - routingMetadataRequest
 A
MetadataRequestencapsulating routing information.
- get_params(deep=True)[source]#
 Get parameters for this estimator.
- Parameters:
 - deepbool, default=True
 If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
 - paramsdict
 Parameter names mapped to their values.
- predict(X)[source]#
 Predict regression target for X.
The predicted regression target of an input sample is computed as the mean predicted regression targets of the estimators in the ensemble.
- Parameters:
 - X{array-like, sparse matrix} of shape (n_samples, n_features)
 The training input samples. Sparse matrices are accepted only if they are supported by the base estimator.
- **paramsdict
 Parameters routed to the predict method of the sub-estimators via the metadata routing API.
Added in version 1.7: Only available if sklearn.set_config(enable_metadata_routing=True) is set. See Metadata Routing User Guide for more details.
- Returns:
 - yndarray of shape (n_samples,)
 The predicted values.
- score(X, y, sample_weight=None)[source]#
 Return coefficient of determination on test data.
The coefficient of determination, \(R^2\), is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
 - Xarray-like of shape (n_samples, n_features)
 Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted), wheren_samples_fittedis the number of samples used in the fitting for the estimator.- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
 True values for X.
- sample_weightarray-like of shape (n_samples,), default=None
 Sample weights.
- Returns:
 - scorefloat
 \(R^2\) of
self.predict(X)w.r.t. y.
Notes
The \(R^2\) score used when calling
scoreon a regressor usesmultioutput='uniform_average'from version 0.23 to keep consistent with default value ofr2_score. This influences thescoremethod of all the multioutput regressors (except forMultiOutputRegressor).
- set_params(**params)[source]#
 Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
 - **paramsdict
 Estimator parameters.
- Returns:
 - selfestimator instance
 Estimator instance.
- property estimators_samples_[source]#
 The subset of drawn samples for each base estimator.
Returns a dynamically generated list of indices identifying the samples used for fitting each member of the ensemble, i.e., the in-bag samples.
Note: the list is re-created at each call to the property in order to reduce the object memory footprint by not storing the sampling data. Thus fetching the property may be slower than expected.