wildboar.tree
#
Tree-based estimators for classification and regression.
Package Contents#
Classes#
An extra shapelet tree classifier. |
|
An extra shapelet tree regressor. |
|
An interval based tree classifier. |
|
An interval based tree regressor. |
|
A tree classifier that uses pivot time series. |
|
A classifier that uses a k-branching tree based on pivot-time series. |
|
A tree classifier that uses random convolutions as features. |
|
A tree regressor that uses random convolutions as features. |
|
A shapelet tree classifier. |
|
A shapelet tree regressor. |
- class wildboar.tree.ExtraShapeletTreeClassifier(*, n_shapelets=1, max_depth=None, min_samples_leaf=1, min_impurity_decrease=0.0, min_samples_split=2, min_shapelet_size=0.0, max_shapelet_size=1.0, metric='euclidean', metric_params=None, criterion='entropy', class_weight=None, random_state=None)[source]#
An extra shapelet tree classifier.
Extra shapelet trees are constructed by sampling a distance threshold uniformly in the range
[min(dist), max(dist)]
.- Parameters:
- n_shapeletsint, optional
The number of shapelets to sample at each node.
- max_depthint, optional
The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
- min_samples_leafint, optional
The minimum number of samples in a leaf.
- min_impurity_decreasefloat, optional
A split will be introduced only if the impurity decrease is larger than or equal to this value.
- min_samples_splitint, optional
The minimum number of samples to split an internal node.
- min_shapelet_sizefloat, optional
The minimum length of a sampled shapelet expressed as a fraction, computed as min(ceil(X.shape[-1] * min_shapelet_size), 2).
- max_shapelet_sizefloat, optional
The maximum length of a sampled shapelet, expressed as a fraction, computed as ceil(X.shape[-1] * max_shapelet_size).
- metric{“euclidean”, “scaled_euclidean”, “dtw”, “scaled_dtw”}, optional
Distance metric used to identify the best shapelet.
- metric_paramsdict, optional
Parameters for the distance measure.
- criterion{“entropy”, “gini”}, optional
The criterion used to evaluate the utility of a split.
- class_weightdict or “balanced”, optional
Weights associated with the labels
if dict, weights on the form {label: weight}
- if “balanced” each class weight inversely proportional to the class
frequency
if None, each class has equal weight.
- random_stateint or RandomState, optional
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
- If None, the random number generator is the RandomState instance used
by np.random.
- Attributes:
- tree_Tree
The tree representation
- apply(x, check_input=True)[source]#
Return the index of the leaf that each sample is predicted by.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- ndarray of shape (n_samples, )
For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count].
Examples
Get the leaf probability distribution of a prediction:
>>> from wildboar.datasets import load_gun_point >>> from wildboar.tree import ShapeletTreeClassifier >>> X, y = load_gun_point() >>> tree = ShapeletTreeClassifier() >>> tree.fit(X, y) >>> leaves = tree.apply(X) >>> tree.tree_.value.take(leaves, axis=0) array([[0., 1.], [0., 1.], [1., 0.]])
This is equvivalent to using tree.predict_proba.
- decision_path(x, check_input=True)[source]#
Compute the decision path of the tree.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- sparse matrix of shape (n_samples, n_nodes)
An indicator array where each nonzero values indicate that the sample traverses a node.
- fit(x, y, sample_weight=None, check_input=True)[source]#
Fit a classification tree.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The training time series.
- yarray-like of shape (n_samples,)
The target values.
- sample_weightarray-like of shape (n_samples,), optional
If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
- check_inputbool, optional
Allow to bypass several input checks.
- Returns:
- self
This instance.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(x, check_input=True)[source]#
Predict the regression of the input samples x.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples,)
The predicted classes.
- predict_proba(x, check_input=True)[source]#
Predict class probabilities of the input samples X.
The predicted class probability is the fraction of samples of the same class in a leaf.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples, n_classes)
The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- score(X, y, sample_weight=None)[source]#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.tree.ExtraShapeletTreeRegressor(*, n_shapelets=1, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, metric='euclidean', metric_params=None, criterion='squared_error', random_state=None)[source]#
An extra shapelet tree regressor.
Extra shapelet trees are constructed by sampling a distance threshold uniformly in the range [min(dist), max(dist)].
- Parameters:
- n_shapeletsint, optional
The number of shapelets to sample at each node.
- max_depthint, optional
The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
- min_samples_splitint, optional
The minimum number of samples to split an internal node.
- min_samples_leafint, optional
The minimum number of samples in a leaf.
- criterion{“squared_error”}, optional
The criterion used to evaluate the utility of a split.
Deprecated since version 1.1: Criterion “mse” was deprecated in v1.1 and removed in version 1.2.
- min_impurity_decreasefloat, optional
A split will be introduced only if the impurity decrease is larger than or equal to this value.
- min_shapelet_sizefloat, optional
The minimum length of a sampled shapelet expressed as a fraction, computed as min(ceil(X.shape[-1] * min_shapelet_size), 2).
- max_shapelet_sizefloat, optional
The maximum length of a sampled shapelet, expressed as a fraction, computed as ceil(X.shape[-1] * max_shapelet_size).
- metric{‘euclidean’, ‘scaled_euclidean’, ‘scaled_dtw’}, optional
Distance metric used to identify the best shapelet.
- metric_paramsdict, optional
Parameters for the distance measure.
- random_stateint or RandomState
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
- If None, the random number generator is the RandomState instance used
by np.random.
- Attributes:
- tree_Tree
The internal tree representation
- apply(x, check_input=True)[source]#
Return the index of the leaf that each sample is predicted by.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- ndarray of shape (n_samples, )
For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count].
Examples
Get the leaf probability distribution of a prediction:
>>> from wildboar.datasets import load_gun_point >>> from wildboar.tree import ShapeletTreeClassifier >>> X, y = load_gun_point() >>> tree = ShapeletTreeClassifier() >>> tree.fit(X, y) >>> leaves = tree.apply(X) >>> tree.tree_.value.take(leaves, axis=0) array([[0., 1.], [0., 1.], [1., 0.]])
This is equvivalent to using tree.predict_proba.
- decision_path(x, check_input=True)[source]#
Compute the decision path of the tree.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- sparse matrix of shape (n_samples, n_nodes)
An indicator array where each nonzero values indicate that the sample traverses a node.
- fit(x, y, sample_weight=None, check_input=True)[source]#
Fit the estimator.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The training time series.
- yarray-like of shape (n_samples,)
Target values as floating point values.
- sample_weightarray-like of shape (n_samples,), optional
If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
- check_inputbool, optional
Allow to bypass several input checks.
- Returns:
- self
This object.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(x, check_input=True)[source]#
Predict the value of x.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples,)
The predicted classes.
- score(X, y, sample_weight=None)[source]#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True values for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
\(R^2\) of
self.predict(X)
w.r.t. y.
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.tree.IntervalTreeClassifier(n_intervals='sqrt', *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='entropy', intervals='fixed', sample_size=0.5, min_size=0.0, max_size=1.0, summarizer='mean_var_slope', class_weight=None, random_state=None)[source]#
An interval based tree classifier.
- Attributes:
- tree_Tree
The internal tree structure.
- apply(x, check_input=True)[source]#
Return the index of the leaf that each sample is predicted by.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- ndarray of shape (n_samples, )
For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count].
Examples
Get the leaf probability distribution of a prediction:
>>> from wildboar.datasets import load_gun_point >>> from wildboar.tree import ShapeletTreeClassifier >>> X, y = load_gun_point() >>> tree = ShapeletTreeClassifier() >>> tree.fit(X, y) >>> leaves = tree.apply(X) >>> tree.tree_.value.take(leaves, axis=0) array([[0., 1.], [0., 1.], [1., 0.]])
This is equvivalent to using tree.predict_proba.
- decision_path(x, check_input=True)[source]#
Compute the decision path of the tree.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- sparse matrix of shape (n_samples, n_nodes)
An indicator array where each nonzero values indicate that the sample traverses a node.
- fit(x, y, sample_weight=None, check_input=True)[source]#
Fit a classification tree.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The training time series.
- yarray-like of shape (n_samples,)
The target values.
- sample_weightarray-like of shape (n_samples,), optional
If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
- check_inputbool, optional
Allow to bypass several input checks.
- Returns:
- self
This instance.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(x, check_input=True)[source]#
Predict the regression of the input samples x.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples,)
The predicted classes.
- predict_proba(x, check_input=True)[source]#
Predict class probabilities of the input samples X.
The predicted class probability is the fraction of samples of the same class in a leaf.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples, n_classes)
The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- score(X, y, sample_weight=None)[source]#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.tree.IntervalTreeRegressor(n_intervals='sqrt', *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='squared_error', intervals='fixed', sample_size=0.5, min_size=0.0, max_size=1.0, summarizer='mean_var_slope', random_state=None)[source]#
An interval based tree regressor.
- Attributes:
- tree_Tree
The internal tree structure.
- apply(x, check_input=True)[source]#
Return the index of the leaf that each sample is predicted by.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- ndarray of shape (n_samples, )
For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count].
Examples
Get the leaf probability distribution of a prediction:
>>> from wildboar.datasets import load_gun_point >>> from wildboar.tree import ShapeletTreeClassifier >>> X, y = load_gun_point() >>> tree = ShapeletTreeClassifier() >>> tree.fit(X, y) >>> leaves = tree.apply(X) >>> tree.tree_.value.take(leaves, axis=0) array([[0., 1.], [0., 1.], [1., 0.]])
This is equvivalent to using tree.predict_proba.
- decision_path(x, check_input=True)[source]#
Compute the decision path of the tree.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- sparse matrix of shape (n_samples, n_nodes)
An indicator array where each nonzero values indicate that the sample traverses a node.
- fit(x, y, sample_weight=None, check_input=True)[source]#
Fit the estimator.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The training time series.
- yarray-like of shape (n_samples,)
Target values as floating point values.
- sample_weightarray-like of shape (n_samples,), optional
If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
- check_inputbool, optional
Allow to bypass several input checks.
- Returns:
- self
This object.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(x, check_input=True)[source]#
Predict the value of x.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples,)
The predicted classes.
- score(X, y, sample_weight=None)[source]#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True values for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
\(R^2\) of
self.predict(X)
w.r.t. y.
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.tree.PivotTreeClassifier(n_pivot='sqrt', *, metrics='all', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='entropy', class_weight=None, random_state=None)[source]#
A tree classifier that uses pivot time series.
- Attributes:
- tree_Tree
The internal tree representation
- apply(x, check_input=True)[source]#
Return the index of the leaf that each sample is predicted by.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- ndarray of shape (n_samples, )
For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count].
Examples
Get the leaf probability distribution of a prediction:
>>> from wildboar.datasets import load_gun_point >>> from wildboar.tree import ShapeletTreeClassifier >>> X, y = load_gun_point() >>> tree = ShapeletTreeClassifier() >>> tree.fit(X, y) >>> leaves = tree.apply(X) >>> tree.tree_.value.take(leaves, axis=0) array([[0., 1.], [0., 1.], [1., 0.]])
This is equvivalent to using tree.predict_proba.
- decision_path(x, check_input=True)[source]#
Compute the decision path of the tree.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- sparse matrix of shape (n_samples, n_nodes)
An indicator array where each nonzero values indicate that the sample traverses a node.
- fit(x, y, sample_weight=None, check_input=True)[source]#
Fit a classification tree.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The training time series.
- yarray-like of shape (n_samples,)
The target values.
- sample_weightarray-like of shape (n_samples,), optional
If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
- check_inputbool, optional
Allow to bypass several input checks.
- Returns:
- self
This instance.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(x, check_input=True)[source]#
Predict the regression of the input samples x.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples,)
The predicted classes.
- predict_proba(x, check_input=True)[source]#
Predict class probabilities of the input samples X.
The predicted class probability is the fraction of samples of the same class in a leaf.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples, n_classes)
The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- score(X, y, sample_weight=None)[source]#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.tree.ProximityTreeClassifier(n_pivot=1, *, criterion='entropy', pivot_sample='label', metric_sample='weighted', metric='auto', metric_params=None, metric_factories=None, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, class_weight=None, random_state=None)[source]#
A classifier that uses a k-branching tree based on pivot-time series.
- Parameters:
- n_pivotint, optional
The number of pivots to sample at each node.
- criterion{“entropy”, “gini”}, optional
The impurity criterion.
- pivot_sample{“label”, “uniform”}, optional
The pivot sampling method.
- metric_sample{“uniform”, “weighted”}, optional
The metric sampling method.
- metric{“auto”, “default”}, str or list, optional
The distance metrics. By default, we use the parameterization suggested by Lucas et.al (2019).
If “auto”, use the default metric specification, suggested by (Lucas et. al, 2020).
If str, use a single metric or default metric specification.
If list, custom metric specification can be given as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values as well as the number of values in the grid. For example, to specifiy a grid over the argument ‘r’ with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).
Read more about the metrics and their parameters in the User guide.
- metric_paramsdict, optional
Parameters for the distance measure. Ignored unless metric is a string.
Read more about the parameters in the User guide.
- metric_factoriesdict, optional
A metric specification.
Deprecated since version 1.2: Use the combination of metric and metric params.
- max_depthint, optional
The maximum tree depth.
- min_samples_splitint, optional
The minimum number of samples to consider a split.
- min_samples_leafint, optional
The minimum number of samples in a leaf.
- min_impurity_decreasefloat, optional
The minimum impurity decrease to build a sub-tree.
- class_weightdict or “balanced”, optional
Weights associated with the labels.
if dict, weights on the form {label: weight}.
- if “balanced” each class weight inversely proportional to the class
frequency.
if None, each class has equal weight.
- random_stateint or RandomState
If int, random_state is the seed used by the random number generator
If RandomState instance, random_state is the random number generator
- If None, the random number generator is the RandomState instance used
by np.random.
References
- Lucas, Benjamin, Ahmed Shifaz, Charlotte Pelletier, Lachlan O’Neill, Nayyar Zaidi, Bart Goethals, François Petitjean, and Geoffrey I. Webb. (2019)
Proximity forest: an effective and scalable distance-based classifier for time series. Data Mining and Knowledge Discovery
Examples
Fit a single proximity tree, with dynamic time warping and move-split-merge metrics.
>>> from wildboar.datasets import load_dataset >>> from wildboar.tree import ProximityTreeClassifier >>> x, y = load_dataset("GunPoint") >>> f = ProximityTreeClassifier( ... n_pivot=10, ... metrics=[ ... ("dtw", {"min_r": 0.1, "max_r": 0.25}), ... ("msm", {"min_c": 0.1, "max_c": 100, "num_c": 20}) ... ], ... criterion="gini" ... ) >>> f.fit(x, y)
- apply(x, check_input=True)[source]#
Return the index of the leaf that each sample is predicted by.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- ndarray of shape (n_samples, )
For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count].
Examples
Get the leaf probability distribution of a prediction:
>>> from wildboar.datasets import load_gun_point >>> from wildboar.tree import ShapeletTreeClassifier >>> X, y = load_gun_point() >>> tree = ShapeletTreeClassifier() >>> tree.fit(X, y) >>> leaves = tree.apply(X) >>> tree.tree_.value.take(leaves, axis=0) array([[0., 1.], [0., 1.], [1., 0.]])
This is equvivalent to using tree.predict_proba.
- decision_path(x, check_input=True)[source]#
Compute the decision path of the tree.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- sparse matrix of shape (n_samples, n_nodes)
An indicator array where each nonzero values indicate that the sample traverses a node.
- fit(x, y, sample_weight=None, check_input=True)[source]#
Fit a classification tree.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The training time series.
- yarray-like of shape (n_samples,)
The target values.
- sample_weightarray-like of shape (n_samples,), optional
If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
- check_inputbool, optional
Allow to bypass several input checks.
- Returns:
- self
This instance.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(x, check_input=True)[source]#
Predict the regression of the input samples x.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples,)
The predicted classes.
- predict_proba(x, check_input=True)[source]#
Predict class probabilities of the input samples X.
The predicted class probability is the fraction of samples of the same class in a leaf.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples, n_classes)
The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- score(X, y, sample_weight=None)[source]#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.tree.RocketTreeClassifier(n_kernels=10, *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='entropy', sampling='normal', sampling_params=None, kernel_size=None, min_size=None, max_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, class_weight=None, random_state=None)[source]#
A tree classifier that uses random convolutions as features.
- Attributes:
- tree_Tree
The internal tree representation.
- apply(x, check_input=True)[source]#
Return the index of the leaf that each sample is predicted by.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- ndarray of shape (n_samples, )
For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count].
Examples
Get the leaf probability distribution of a prediction:
>>> from wildboar.datasets import load_gun_point >>> from wildboar.tree import ShapeletTreeClassifier >>> X, y = load_gun_point() >>> tree = ShapeletTreeClassifier() >>> tree.fit(X, y) >>> leaves = tree.apply(X) >>> tree.tree_.value.take(leaves, axis=0) array([[0., 1.], [0., 1.], [1., 0.]])
This is equvivalent to using tree.predict_proba.
- decision_path(x, check_input=True)[source]#
Compute the decision path of the tree.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- sparse matrix of shape (n_samples, n_nodes)
An indicator array where each nonzero values indicate that the sample traverses a node.
- fit(x, y, sample_weight=None, check_input=True)[source]#
Fit a classification tree.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The training time series.
- yarray-like of shape (n_samples,)
The target values.
- sample_weightarray-like of shape (n_samples,), optional
If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
- check_inputbool, optional
Allow to bypass several input checks.
- Returns:
- self
This instance.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(x, check_input=True)[source]#
Predict the regression of the input samples x.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples,)
The predicted classes.
- predict_proba(x, check_input=True)[source]#
Predict class probabilities of the input samples X.
The predicted class probability is the fraction of samples of the same class in a leaf.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples, n_classes)
The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- score(X, y, sample_weight=None)[source]#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.tree.RocketTreeRegressor(n_kernels=10, *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='squared_error', sampling='normal', sampling_params=None, kernel_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, random_state=None)[source]#
A tree regressor that uses random convolutions as features.
- Attributes:
- tree_Tree
The internal tree representation.
- apply(x, check_input=True)[source]#
Return the index of the leaf that each sample is predicted by.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- ndarray of shape (n_samples, )
For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count].
Examples
Get the leaf probability distribution of a prediction:
>>> from wildboar.datasets import load_gun_point >>> from wildboar.tree import ShapeletTreeClassifier >>> X, y = load_gun_point() >>> tree = ShapeletTreeClassifier() >>> tree.fit(X, y) >>> leaves = tree.apply(X) >>> tree.tree_.value.take(leaves, axis=0) array([[0., 1.], [0., 1.], [1., 0.]])
This is equvivalent to using tree.predict_proba.
- decision_path(x, check_input=True)[source]#
Compute the decision path of the tree.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- sparse matrix of shape (n_samples, n_nodes)
An indicator array where each nonzero values indicate that the sample traverses a node.
- fit(x, y, sample_weight=None, check_input=True)[source]#
Fit the estimator.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The training time series.
- yarray-like of shape (n_samples,)
Target values as floating point values.
- sample_weightarray-like of shape (n_samples,), optional
If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
- check_inputbool, optional
Allow to bypass several input checks.
- Returns:
- self
This object.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(x, check_input=True)[source]#
Predict the value of x.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples,)
The predicted classes.
- score(X, y, sample_weight=None)[source]#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True values for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
\(R^2\) of
self.predict(X)
w.r.t. y.
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.tree.ShapeletTreeClassifier(*, n_shapelets='log2', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, alpha=None, metric='euclidean', metric_params=None, criterion='entropy', class_weight=None, random_state=None)[source]#
A shapelet tree classifier.
- Parameters:
- n_shapeletsint, optional
The number of shapelets to sample at each node.
- max_depthint, optional
The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
- min_samples_splitint, optional
The minimum number of samples to split an internal node.
- min_samples_leafint, optional
The minimum number of samples in a leaf.
- criterion{“entropy”, “gini”}, optional
The criterion used to evaluate the utility of a split.
- min_impurity_decreasefloat, optional
A split will be introduced only if the impurity decrease is larger than or equal to this value.
- min_shapelet_sizefloat, optional
The minimum length of a sampled shapelet expressed as a fraction, computed as
min(ceil(X.shape[-1] * min_shapelet_size), 2)
.- max_shapelet_sizefloat, optional
The maximum length of a sampled shapelet, expressed as a fraction, computed as
ceil(X.shape[-1] * max_shapelet_size)
.- alphafloat, optional
Dynamically decrease the number of sampled shapelets at each node according to the current depth.
- if \(alpha < 0\), the number of sampled shapelets decrease from
n_shapelets
towards 1 with increased depth.
- if \(alpha > 0\), the number of sampled shapelets increase from
1
towards
n_shapelets
with increased depth.
- if \(alpha > 0\), the number of sampled shapelets increase from
- if
None
, the number of sampled shapelets are the same independeth of depth.
- if
- metric{“euclidean”, “scaled_euclidean”, “dtw”, “scaled_dtw”}, optional
Distance metric used to identify the best shapelet.
- metric_paramsdict, optional
Parameters for the distance measure
- class_weightdict or “balanced”, optional
Weights associated with the labels
if dict, weights on the form {label: weight}
- if “balanced” each class weight inversely proportional to the class
frequency
if None, each class has equal weight.
- random_stateint or RandomState
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
- If None, the random number generator is the RandomState instance used
by np.random.
See also
ShapeletTreeRegressor
A shapelet tree regressor.
ExtraShapeletTreeClassifier
An extra random shapelet tree classifier.
- Attributes:
- tree_Tree
The tree data structure used internally
- classes_ndarray of shape (n_classes,)
The class labels
- n_classes_int
The number of class labels
- apply(x, check_input=True)[source]#
Return the index of the leaf that each sample is predicted by.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- ndarray of shape (n_samples, )
For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count].
Examples
Get the leaf probability distribution of a prediction:
>>> from wildboar.datasets import load_gun_point >>> from wildboar.tree import ShapeletTreeClassifier >>> X, y = load_gun_point() >>> tree = ShapeletTreeClassifier() >>> tree.fit(X, y) >>> leaves = tree.apply(X) >>> tree.tree_.value.take(leaves, axis=0) array([[0., 1.], [0., 1.], [1., 0.]])
This is equvivalent to using tree.predict_proba.
- decision_path(x, check_input=True)[source]#
Compute the decision path of the tree.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- sparse matrix of shape (n_samples, n_nodes)
An indicator array where each nonzero values indicate that the sample traverses a node.
- fit(x, y, sample_weight=None, check_input=True)[source]#
Fit a classification tree.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The training time series.
- yarray-like of shape (n_samples,)
The target values.
- sample_weightarray-like of shape (n_samples,), optional
If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
- check_inputbool, optional
Allow to bypass several input checks.
- Returns:
- self
This instance.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(x, check_input=True)[source]#
Predict the regression of the input samples x.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples,)
The predicted classes.
- predict_proba(x, check_input=True)[source]#
Predict class probabilities of the input samples X.
The predicted class probability is the fraction of samples of the same class in a leaf.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples, n_classes)
The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- score(X, y, sample_weight=None)[source]#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.tree.ShapeletTreeRegressor(*, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, n_shapelets='log2', min_shapelet_size=0, max_shapelet_size=1, alpha=None, metric='euclidean', metric_params=None, criterion='squared_error', random_state=None)[source]#
A shapelet tree regressor.
- Parameters:
- max_depthint, optional
The maximum depth of the tree. If
None
the tree is expanded until all leaves are pure or until all leaves contain less thanmin_samples_split
samples.- min_samples_splitint, optional
The minimum number of samples to split an internal node.
- min_samples_leafint, optional
The minimum number of samples in a leaf.
- min_impurity_decreasefloat, optional
A split will be introduced only if the impurity decrease is larger than or equal to this value.
- n_shapeletsint, optional
The number of shapelets to sample at each node.
- min_shapelet_sizefloat, optional
The minimum length of a shapelets expressed as a fraction of n_timestep.
- max_shapelet_sizefloat, optional
The maximum length of a shapelets expressed as a fraction of n_timestep.
- alphafloat, optional
Dynamically decrease the number of sampled shapelets at each node according to the current depth, i.e.:
- ::
w = 1 - exp(-abs(alpha) * depth)
- if
alpha < 0
, the number of sampled shapelets decrease from n_shapelets
towards 1 with increased depth.
- if
- if
alpha > 0
, the number of sampled shapelets increase from1
towards
n_shapelets
with increased depth.
- if
- if
None
, the number of sampled shapelets are the same independeth of depth.
- if
- metricstr or list, optional
- If
str
, the distance metric used to identify the best shapelet.
- If
- If
list
, multiple metrics specified as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument
r
with 10 values in the range 0 to 1, we would give the following specification:dict(min_r=0, max_r=1, num_r=10)
.Read more about metric specifications in the User guide
- If
Changed in version 1.2: Added support for multi-metric shapelet transform
- metric_paramsdict, optional
Parameters for the distance measure. Ignored unless metric is a string.
Read more about the parameters in the User guide.
- criterion{“squared_error”}, optional
The criterion used to evaluate the utility of a split.
Deprecated since version 1.1: Criterion “mse” was deprecated in v1.1 and removed in version 1.2.
- random_stateint or RandomState
- If
int
,random_state
is the seed used by the random number generator
- If
- If
numpy.random.RandomState
instance,random_state
is the random number generator
- If
- If
None
, the random number generator is the numpy.random.RandomState
instance used bynumpy.random
.
- If
- Attributes:
- tree_Tree
The internal tree representation
- apply(x, check_input=True)[source]#
Return the index of the leaf that each sample is predicted by.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- ndarray of shape (n_samples, )
For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count].
Examples
Get the leaf probability distribution of a prediction:
>>> from wildboar.datasets import load_gun_point >>> from wildboar.tree import ShapeletTreeClassifier >>> X, y = load_gun_point() >>> tree = ShapeletTreeClassifier() >>> tree.fit(X, y) >>> leaves = tree.apply(X) >>> tree.tree_.value.take(leaves, axis=0) array([[0., 1.], [0., 1.], [1., 0.]])
This is equvivalent to using tree.predict_proba.
- decision_path(x, check_input=True)[source]#
Compute the decision path of the tree.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- sparse matrix of shape (n_samples, n_nodes)
An indicator array where each nonzero values indicate that the sample traverses a node.
- fit(x, y, sample_weight=None, check_input=True)[source]#
Fit the estimator.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The training time series.
- yarray-like of shape (n_samples,)
Target values as floating point values.
- sample_weightarray-like of shape (n_samples,), optional
If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
- check_inputbool, optional
Allow to bypass several input checks.
- Returns:
- self
This object.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(x, check_input=True)[source]#
Predict the value of x.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples,)
The predicted classes.
- score(X, y, sample_weight=None)[source]#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True values for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
\(R^2\) of
self.predict(X)
w.r.t. y.
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.