wildboar.tree
#
Tree-based estimators for classification and regression.
Classes#
An extra shapelet tree classifier. |
|
An extra shapelet tree regressor. |
|
An interval based tree classifier. |
|
An interval based tree regressor. |
|
A tree classifier that uses pivot time series. |
|
A classifier that uses a k-branching tree based on pivot-time series. |
|
A tree classifier that uses random convolutions as features. |
|
A tree regressor that uses random convolutions as features. |
|
A shapelet tree classifier. |
|
A shapelet tree regressor. |
Functions#
|
Plot a tree |
- class wildboar.tree.ExtraShapeletTreeClassifier(*, n_shapelets=1, max_depth=None, min_samples_leaf=1, min_impurity_decrease=0.0, min_samples_split=2, min_shapelet_size=0.0, max_shapelet_size=1.0, coverage_probability=None, variability=1, metric='euclidean', metric_params=None, criterion='entropy', class_weight=None, random_state=None)[source]#
An extra shapelet tree classifier.
Extra shapelet trees are constructed by sampling a distance threshold uniformly in the range [min(dist), max(dist)].
- Parameters:
- n_shapeletsint, optional
The number of shapelets to sample at each node.
- max_depthint, optional
The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
- min_samples_leafint, optional
The minimum number of samples in a leaf.
- min_impurity_decreasefloat, optional
A split will be introduced only if the impurity decrease is larger than or equal to this value.
- min_samples_splitint, optional
The minimum number of samples to split an internal node.
- min_shapelet_sizefloat, optional
The minimum length of a sampled shapelet expressed as a fraction, computed as min(ceil(X.shape[-1] * min_shapelet_size), 2).
- max_shapelet_sizefloat, optional
The maximum length of a sampled shapelet, expressed as a fraction, computed as ceil(X.shape[-1] * max_shapelet_size).
- coverage_probabilityfloat, optional
The probability that a time step is covered by a shapelet, in the range 0 < coverage_probability <= 1.
For larger coverage_probability, we get larger shapelets.
For smaller coverage_probability, we get shorter shapelets.
- variabilityfloat, optional
Controls the shape of the Beta distribution used to sample shapelets. Defaults to 1.
Higher variability creates more uniform intervals.
Lower variability creates more variable intervals sizes.
- metric{“euclidean”, “scaled_euclidean”, “dtw”, “scaled_dtw”}, optional
Distance metric used to identify the best shapelet.
- metric_paramsdict, optional
Parameters for the distance measure.
- criterion{“entropy”, “gini”}, optional
The criterion used to evaluate the utility of a split.
- class_weightdict or “balanced”, optional
Weights associated with the labels
if dict, weights on the form {label: weight}
- if “balanced” each class weight inversely proportional to the class
frequency
if None, each class has equal weight.
- random_stateint or RandomState, optional
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
- If None, the random number generator is the RandomState instance used
by np.random.
- Attributes:
- tree_Tree
The tree representation
- apply(x, check_input=True)[source]#
Return the index of the leaf that each sample is predicted by.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- ndarray of shape (n_samples, )
For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count].
Examples
Get the leaf probability distribution of a prediction:
>>> from wildboar.datasets import load_gun_point >>> from wildboar.tree import ShapeletTreeClassifier >>> X, y = load_gun_point() >>> tree = ShapeletTreeClassifier() >>> tree.fit(X, y) >>> leaves = tree.apply(X) >>> tree.tree_.value.take(leaves, axis=0) array([[0., 1.], [0., 1.], [1., 0.]])
This is equvivalent to using tree.predict_proba.
- decision_path(x, check_input=True)[source]#
Compute the decision path of the tree.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- sparse matrix of shape (n_samples, n_nodes)
An indicator array where each nonzero values indicate that the sample traverses a node.
- fit(x, y, sample_weight=None, check_input=True)[source]#
Fit a classification tree.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The training time series.
- yarray-like of shape (n_samples,)
The target values.
- sample_weightarray-like of shape (n_samples,), optional
If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
- check_inputbool, optional
Allow to bypass several input checks.
- Returns:
- self
This instance.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(x, check_input=True)[source]#
Predict the regression of the input samples x.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples,)
The predicted classes.
- predict_proba(x, check_input=True)[source]#
Predict class probabilities of the input samples X.
The predicted class probability is the fraction of samples of the same class in a leaf.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples, n_classes)
The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- score(X, y, sample_weight=None)[source]#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.tree.ExtraShapeletTreeRegressor(*, n_shapelets=1, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, coverage_probability=None, variability=1, metric='euclidean', metric_params=None, criterion='squared_error', random_state=None)[source]#
An extra shapelet tree regressor.
Extra shapelet trees are constructed by sampling a distance threshold uniformly in the range [min(dist), max(dist)].
- Parameters:
- n_shapeletsint, optional
The number of shapelets to sample at each node.
- max_depthint, optional
The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
- min_samples_splitint, optional
The minimum number of samples to split an internal node.
- min_samples_leafint, optional
The minimum number of samples in a leaf.
- criterion{“squared_error”}, optional
The criterion used to evaluate the utility of a split.
Deprecated since version 1.1: Criterion “mse” was deprecated in v1.1 and removed in version 1.2.
- min_impurity_decreasefloat, optional
A split will be introduced only if the impurity decrease is larger than or equal to this value.
- min_shapelet_sizefloat, optional
The minimum length of a sampled shapelet expressed as a fraction, computed as min(ceil(X.shape[-1] * min_shapelet_size), 2).
- max_shapelet_sizefloat, optional
The maximum length of a sampled shapelet, expressed as a fraction, computed as ceil(X.shape[-1] * max_shapelet_size).
- coverage_probabilityfloat, optional
The probability that a time step is covered by a shapelet, in the range 0 < coverage_probability <= 1.
For larger coverage_probability, we get larger shapelets.
For smaller coverage_probability, we get shorter shapelets.
- variabilityfloat, optional
Controls the shape of the Beta distribution used to sample shapelets. Defaults to 1.
Higher variability creates more uniform intervals.
Lower variability creates more variable intervals sizes.
- metric{‘euclidean’, ‘scaled_euclidean’, ‘scaled_dtw’}, optional
Distance metric used to identify the best shapelet.
- metric_paramsdict, optional
Parameters for the distance measure.
- random_stateint or RandomState
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
- If None, the random number generator is the RandomState instance used
by np.random.
- Attributes:
- tree_Tree
The internal tree representation
- apply(x, check_input=True)[source]#
Return the index of the leaf that each sample is predicted by.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- ndarray of shape (n_samples, )
For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count].
Examples
Get the leaf probability distribution of a prediction:
>>> from wildboar.datasets import load_gun_point >>> from wildboar.tree import ShapeletTreeClassifier >>> X, y = load_gun_point() >>> tree = ShapeletTreeClassifier() >>> tree.fit(X, y) >>> leaves = tree.apply(X) >>> tree.tree_.value.take(leaves, axis=0) array([[0., 1.], [0., 1.], [1., 0.]])
This is equvivalent to using tree.predict_proba.
- decision_path(x, check_input=True)[source]#
Compute the decision path of the tree.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- sparse matrix of shape (n_samples, n_nodes)
An indicator array where each nonzero values indicate that the sample traverses a node.
- fit(x, y, sample_weight=None, check_input=True)[source]#
Fit the estimator.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The training time series.
- yarray-like of shape (n_samples,)
Target values as floating point values.
- sample_weightarray-like of shape (n_samples,), optional
If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
- check_inputbool, optional
Allow to bypass several input checks.
- Returns:
- self
This object.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(x, check_input=True)[source]#
Predict the value of x.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples,)
The predicted classes.
- score(X, y, sample_weight=None)[source]#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True values for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
\(R^2\) of
self.predict(X)
w.r.t. y.
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.tree.IntervalTreeClassifier(n_intervals='sqrt', *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='entropy', intervals='fixed', sample_size=None, min_size=0.0, max_size=1.0, coverage_probability=None, variability=1, summarizer='mean_var_slope', class_weight=None, random_state=None)[source]#
An interval based tree classifier.
- Parameters:
- n_intervals{“log”, “sqrt”}, int or float, optional
The number of intervals to partition the time series into.
if “log”, the number of intervals is log2(n_timestep).
if “sqrt”, the number of intervals is sqrt(n_timestep).
if int, the number of intervals is n_intervals.
- if float, the number of intervals is n_intervals * n_timestep, with
0 < n_intervals < 1.
- max_depthint, optional
The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
- min_samples_splitint, optional
The minimum number of samples to split an internal node.
- min_samples_leafint, optional
The minimum number of samples in a leaf.
- min_impurity_decreasefloat, optional
A split will be introduced only if the impurity decrease is larger than or equal to this value.
- criterion{“entropy”, “gini”}, optional
The criterion used to evaluate the utility of a split.
- intervals{“fixed”, “sample”, “random”}, optional
if “fixed”, n_intervals non-overlapping intervals.
if “sample”, n_intervals * sample_size non-overlapping intervals.
- if “random”, n_intervals possibly overlapping intervals of randomly
sampled in [min_size * n_timestep, max_size * n_timestep].
- sample_sizefloat, optional
The fraction of intervals to sample at each node. Ignored unless intervals=”sample”.
- min_sizefloat, optional
The minimum interval size if intervals=”random”. Ignored if coverage_probability is set.
- max_sizefloat, optional
The maximum interval size if intervals=”random”. Ignored if coverage_probability is set.
- coverage_probabilityfloat, optional
The probability that a time step is covered by an interval, in the range 0 < coverage_probability <= 1.
For larger coverage_probability, we get larger intervals.
For smaller coverage_probability, we get shorter intervals.
- variabilityfloat, optional
Controls the shape of the Beta distribution used to sample shapelets. Defaults to 1.
Higher variability creates more uniform intervals.
Lower variability creates more variable intervals sizes.
- summarizerstr or list, optional
The method to summarize each interval.
if str, the summarizer is determined by _SUMMARIZERS.keys().
if list, the summarizer is a list of functions f(x) -> float, where x is a numpy array.
The default summarizer summarizes each interval as its mean, variance and slope.
- class_weightdict or “balanced”, optional
Weights associated with the labels - if dict, weights on the form {label: weight} - if “balanced” each class weight inversely proportional to the class
frequency
if None, each class has equal weight.
- random_stateint or RandomState, optional
If int, random_state is the seed used by the random number generator
If RandomState instance, random_state is the random number generator
- If None, the random number generator is the RandomState instance used
by np.random.
- Attributes:
- tree_Tree
The internal tree structure.
- apply(x, check_input=True)[source]#
Return the index of the leaf that each sample is predicted by.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- ndarray of shape (n_samples, )
For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count].
Examples
Get the leaf probability distribution of a prediction:
>>> from wildboar.datasets import load_gun_point >>> from wildboar.tree import ShapeletTreeClassifier >>> X, y = load_gun_point() >>> tree = ShapeletTreeClassifier() >>> tree.fit(X, y) >>> leaves = tree.apply(X) >>> tree.tree_.value.take(leaves, axis=0) array([[0., 1.], [0., 1.], [1., 0.]])
This is equvivalent to using tree.predict_proba.
- decision_path(x, check_input=True)[source]#
Compute the decision path of the tree.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- sparse matrix of shape (n_samples, n_nodes)
An indicator array where each nonzero values indicate that the sample traverses a node.
- fit(x, y, sample_weight=None, check_input=True)[source]#
Fit a classification tree.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The training time series.
- yarray-like of shape (n_samples,)
The target values.
- sample_weightarray-like of shape (n_samples,), optional
If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
- check_inputbool, optional
Allow to bypass several input checks.
- Returns:
- self
This instance.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(x, check_input=True)[source]#
Predict the regression of the input samples x.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples,)
The predicted classes.
- predict_proba(x, check_input=True)[source]#
Predict class probabilities of the input samples X.
The predicted class probability is the fraction of samples of the same class in a leaf.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples, n_classes)
The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- score(X, y, sample_weight=None)[source]#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.tree.IntervalTreeRegressor(n_intervals='sqrt', *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='squared_error', intervals='fixed', sample_size=None, min_size=0.0, max_size=1.0, coverage_probability=None, variability=1, summarizer='mean_var_slope', random_state=None)[source]#
An interval based tree regressor.
- Parameters:
- n_intervals{“log”, “sqrt”}, int or float, optional
The number of intervals to partition the time series into.
if “log”, the number of intervals is log2(n_timestep).
if “sqrt”, the number of intervals is sqrt(n_timestep).
if int, the number of intervals is n_intervals.
- if float, the number of intervals is n_intervals * n_timestep, with
0 < n_intervals < 1.
- max_depthint, optional
The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
- min_samples_splitint, optional
The minimum number of samples to split an internal node.
- min_samples_leafint, optional
The minimum number of samples in a leaf.
- min_impurity_decreasefloat, optional
A split will be introduced only if the impurity decrease is larger than or equal to this value.
- criterion{“entropy”, “gini”}, optional
The criterion used to evaluate the utility of a split.
- intervals{“fixed”, “sample”, “random”}, optional
if “fixed”, n_intervals non-overlapping intervals.
if “sample”, n_intervals * sample_size non-overlapping intervals.
- if “random”, n_intervals possibly overlapping intervals of randomly
sampled in [min_size * n_timestep, max_size * n_timestep].
- sample_sizefloat, optional
The fraction of intervals to sample at each node. Ignored unless intervals=”sample”.
- min_sizefloat, optional
The minimum interval size if intervals=”random”. Ignored if coverage_probability is set.
- max_sizefloat, optional
The maximum interval size if intervals=”random”. Ignored if coverage_probability is set.
- coverage_probabilityfloat, optional
The probability that a time step is covered by an interval, in the range 0 < coverage_probability <= 1.
For larger coverage_probability, we get larger intervals.
For smaller coverage_probability, we get shorter intervals.
- variabilityfloat, optional
Controls the shape of the Beta distribution used to sample shapelets. Defaults to 1.
Higher variability creates more uniform intervals.
Lower variability creates more variable intervals sizes.
- summarizerstr or list, optional
The method to summarize each interval.
if str, the summarizer is determined by _SUMMARIZERS.keys().
if list, the summarizer is a list of functions f(x) -> float, where x is a numpy array.
The default summarizer summarizes each interval as its mean, variance and slope.
- random_stateint or RandomState, optional
If int, random_state is the seed used by the random number generator
If RandomState instance, random_state is the random number generator
- If None, the random number generator is the RandomState instance used
by np.random.
- Attributes:
- tree_Tree
The internal tree structure.
- apply(x, check_input=True)[source]#
Return the index of the leaf that each sample is predicted by.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- ndarray of shape (n_samples, )
For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count].
Examples
Get the leaf probability distribution of a prediction:
>>> from wildboar.datasets import load_gun_point >>> from wildboar.tree import ShapeletTreeClassifier >>> X, y = load_gun_point() >>> tree = ShapeletTreeClassifier() >>> tree.fit(X, y) >>> leaves = tree.apply(X) >>> tree.tree_.value.take(leaves, axis=0) array([[0., 1.], [0., 1.], [1., 0.]])
This is equvivalent to using tree.predict_proba.
- decision_path(x, check_input=True)[source]#
Compute the decision path of the tree.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- sparse matrix of shape (n_samples, n_nodes)
An indicator array where each nonzero values indicate that the sample traverses a node.
- fit(x, y, sample_weight=None, check_input=True)[source]#
Fit the estimator.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The training time series.
- yarray-like of shape (n_samples,)
Target values as floating point values.
- sample_weightarray-like of shape (n_samples,), optional
If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
- check_inputbool, optional
Allow to bypass several input checks.
- Returns:
- self
This object.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(x, check_input=True)[source]#
Predict the value of x.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples,)
The predicted classes.
- score(X, y, sample_weight=None)[source]#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True values for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
\(R^2\) of
self.predict(X)
w.r.t. y.
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.tree.PivotTreeClassifier(n_pivot='sqrt', *, metrics='all', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, impurity_equality_tolerance=None, criterion='entropy', class_weight=None, random_state=None)[source]#
A tree classifier that uses pivot time series.
- Parameters:
- n_pivotstr or int, optional
The number of pivot time series to sample at each node.
- metricsstr, optional
The metrics to sample from. Currently, we only support “all”.
- max_depthint, optional
The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
- min_samples_splitint, optional
The minimum number of samples to split an internal node.
- min_samples_leafint, optional
The minimum number of samples in a leaf.
- min_impurity_decreasefloat, optional
A split will be introduced only if the impurity decrease is larger than or equal to this value.
- impurity_equality_tolerancefloat, optional
Tolerance for considering two impurities as equal. If the impurity decrease is the same, we consider the split that maximizes the gap between the sum of distances.
If None, we never consider the separation gap.
Added in version 1.3.
- criterion{“entropy”, “gini”}, optional
The criterion used to evaluate the utility of a split.
- class_weightdict or “balanced”, optional
Weights associated with the labels.
if dict, weights on the form {label: weight}.
- if “balanced” each class weight inversely proportional to the class
frequency.
if None, each class has equal weight.
- random_stateint or RandomState
If int, random_state is the seed used by the random number generator
If RandomState instance, random_state is the random number generator
- If None, the random number generator is the RandomState instance used
by np.random.
- Attributes:
- tree_Tree
The internal tree representation
- apply(x, check_input=True)[source]#
Return the index of the leaf that each sample is predicted by.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- ndarray of shape (n_samples, )
For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count].
Examples
Get the leaf probability distribution of a prediction:
>>> from wildboar.datasets import load_gun_point >>> from wildboar.tree import ShapeletTreeClassifier >>> X, y = load_gun_point() >>> tree = ShapeletTreeClassifier() >>> tree.fit(X, y) >>> leaves = tree.apply(X) >>> tree.tree_.value.take(leaves, axis=0) array([[0., 1.], [0., 1.], [1., 0.]])
This is equvivalent to using tree.predict_proba.
- decision_path(x, check_input=True)[source]#
Compute the decision path of the tree.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- sparse matrix of shape (n_samples, n_nodes)
An indicator array where each nonzero values indicate that the sample traverses a node.
- fit(x, y, sample_weight=None, check_input=True)[source]#
Fit a classification tree.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The training time series.
- yarray-like of shape (n_samples,)
The target values.
- sample_weightarray-like of shape (n_samples,), optional
If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
- check_inputbool, optional
Allow to bypass several input checks.
- Returns:
- self
This instance.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(x, check_input=True)[source]#
Predict the regression of the input samples x.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples,)
The predicted classes.
- predict_proba(x, check_input=True)[source]#
Predict class probabilities of the input samples X.
The predicted class probability is the fraction of samples of the same class in a leaf.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples, n_classes)
The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- score(X, y, sample_weight=None)[source]#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.tree.ProximityTreeClassifier(n_pivot=1, *, criterion='entropy', pivot_sample='label', metric_sample='weighted', metric='auto', metric_params=None, metric_factories=None, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, class_weight=None, random_state=None)[source]#
A classifier that uses a k-branching tree based on pivot-time series.
- Parameters:
- n_pivotint, optional
The number of pivots to sample at each node.
- criterion{“entropy”, “gini”}, optional
The impurity criterion.
- pivot_sample{“label”, “uniform”}, optional
The pivot sampling method.
- metric_sample{“uniform”, “weighted”}, optional
The metric sampling method.
- metric{“auto”}, str or list, optional
The distance metrics. By default, we use the parameterization suggested by Lucas et.al (2019).
If “auto”, use the default metric specification, suggested by (Lucas et. al, 2020).
If str, use a single metric or default metric specification.
If list, custom metric specification can be given as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values as well as the number of values in the grid. For example, to specifiy a grid over the argument ‘r’ with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).
Read more about the metrics and their parameters in the User guide.
- metric_paramsdict, optional
Parameters for the distance measure. Ignored unless metric is a string.
Read more about the parameters in the User guide.
- metric_factoriesdict, optional
A metric specification.
Deprecated since version 1.2: Use the combination of metric and metric params.
- max_depthint, optional
The maximum tree depth.
- min_samples_splitint, optional
The minimum number of samples to consider a split.
- min_samples_leafint, optional
The minimum number of samples in a leaf.
- min_impurity_decreasefloat, optional
The minimum impurity decrease to build a sub-tree.
- class_weightdict or “balanced”, optional
Weights associated with the labels.
if dict, weights on the form {label: weight}.
- if “balanced” each class weight inversely proportional to the class
frequency.
if None, each class has equal weight.
- random_stateint or RandomState
If int, random_state is the seed used by the random number generator
If RandomState instance, random_state is the random number generator
- If None, the random number generator is the RandomState instance used
by np.random.
References
- Lucas, Benjamin, Ahmed Shifaz, Charlotte Pelletier, Lachlan O’Neill, Nayyar Zaidi, Bart Goethals, François Petitjean, and Geoffrey I. Webb. (2019)
Proximity forest: an effective and scalable distance-based classifier for time series. Data Mining and Knowledge Discovery
Examples
Fit a single proximity tree, with dynamic time warping and move-split-merge metrics.
>>> from wildboar.datasets import load_dataset >>> from wildboar.tree import ProximityTreeClassifier >>> x, y = load_dataset("GunPoint") >>> f = ProximityTreeClassifier( ... n_pivot=10, ... metrics=[ ... ("dtw", {"min_r": 0.1, "max_r": 0.25}), ... ("msm", {"min_c": 0.1, "max_c": 100, "num_c": 20}) ... ], ... criterion="gini" ... ) >>> f.fit(x, y)
- apply(x, check_input=True)[source]#
Return the index of the leaf that each sample is predicted by.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- ndarray of shape (n_samples, )
For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count].
Examples
Get the leaf probability distribution of a prediction:
>>> from wildboar.datasets import load_gun_point >>> from wildboar.tree import ShapeletTreeClassifier >>> X, y = load_gun_point() >>> tree = ShapeletTreeClassifier() >>> tree.fit(X, y) >>> leaves = tree.apply(X) >>> tree.tree_.value.take(leaves, axis=0) array([[0., 1.], [0., 1.], [1., 0.]])
This is equvivalent to using tree.predict_proba.
- decision_path(x, check_input=True)[source]#
Compute the decision path of the tree.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- sparse matrix of shape (n_samples, n_nodes)
An indicator array where each nonzero values indicate that the sample traverses a node.
- fit(x, y, sample_weight=None, check_input=True)[source]#
Fit a classification tree.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The training time series.
- yarray-like of shape (n_samples,)
The target values.
- sample_weightarray-like of shape (n_samples,), optional
If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
- check_inputbool, optional
Allow to bypass several input checks.
- Returns:
- self
This instance.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(x, check_input=True)[source]#
Predict the regression of the input samples x.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples,)
The predicted classes.
- predict_proba(x, check_input=True)[source]#
Predict class probabilities of the input samples X.
The predicted class probability is the fraction of samples of the same class in a leaf.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples, n_classes)
The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- score(X, y, sample_weight=None)[source]#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.tree.RocketTreeClassifier(n_kernels=10, *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='entropy', sampling='normal', sampling_params=None, kernel_size=None, min_size=None, max_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, class_weight=None, random_state=None)[source]#
A tree classifier that uses random convolutions as features.
- Attributes:
- tree_Tree
The internal tree representation.
- apply(x, check_input=True)[source]#
Return the index of the leaf that each sample is predicted by.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- ndarray of shape (n_samples, )
For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count].
Examples
Get the leaf probability distribution of a prediction:
>>> from wildboar.datasets import load_gun_point >>> from wildboar.tree import ShapeletTreeClassifier >>> X, y = load_gun_point() >>> tree = ShapeletTreeClassifier() >>> tree.fit(X, y) >>> leaves = tree.apply(X) >>> tree.tree_.value.take(leaves, axis=0) array([[0., 1.], [0., 1.], [1., 0.]])
This is equvivalent to using tree.predict_proba.
- decision_path(x, check_input=True)[source]#
Compute the decision path of the tree.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- sparse matrix of shape (n_samples, n_nodes)
An indicator array where each nonzero values indicate that the sample traverses a node.
- fit(x, y, sample_weight=None, check_input=True)[source]#
Fit a classification tree.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The training time series.
- yarray-like of shape (n_samples,)
The target values.
- sample_weightarray-like of shape (n_samples,), optional
If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
- check_inputbool, optional
Allow to bypass several input checks.
- Returns:
- self
This instance.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(x, check_input=True)[source]#
Predict the regression of the input samples x.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples,)
The predicted classes.
- predict_proba(x, check_input=True)[source]#
Predict class probabilities of the input samples X.
The predicted class probability is the fraction of samples of the same class in a leaf.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples, n_classes)
The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- score(X, y, sample_weight=None)[source]#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.tree.RocketTreeRegressor(n_kernels=10, *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='squared_error', sampling='normal', sampling_params=None, kernel_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, random_state=None)[source]#
A tree regressor that uses random convolutions as features.
- Attributes:
- tree_Tree
The internal tree representation.
- apply(x, check_input=True)[source]#
Return the index of the leaf that each sample is predicted by.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- ndarray of shape (n_samples, )
For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count].
Examples
Get the leaf probability distribution of a prediction:
>>> from wildboar.datasets import load_gun_point >>> from wildboar.tree import ShapeletTreeClassifier >>> X, y = load_gun_point() >>> tree = ShapeletTreeClassifier() >>> tree.fit(X, y) >>> leaves = tree.apply(X) >>> tree.tree_.value.take(leaves, axis=0) array([[0., 1.], [0., 1.], [1., 0.]])
This is equvivalent to using tree.predict_proba.
- decision_path(x, check_input=True)[source]#
Compute the decision path of the tree.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- sparse matrix of shape (n_samples, n_nodes)
An indicator array where each nonzero values indicate that the sample traverses a node.
- fit(x, y, sample_weight=None, check_input=True)[source]#
Fit the estimator.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The training time series.
- yarray-like of shape (n_samples,)
Target values as floating point values.
- sample_weightarray-like of shape (n_samples,), optional
If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
- check_inputbool, optional
Allow to bypass several input checks.
- Returns:
- self
This object.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(x, check_input=True)[source]#
Predict the value of x.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples,)
The predicted classes.
- score(X, y, sample_weight=None)[source]#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True values for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
\(R^2\) of
self.predict(X)
w.r.t. y.
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.tree.ShapeletTreeClassifier(*, n_shapelets='log2', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, impurity_equality_tolerance=None, strategy='warn', shapelet_size=0.1, sample_size=1.0, min_shapelet_size=0.0, max_shapelet_size=1.0, coverage_probability=None, variability=1, alpha=None, metric='euclidean', metric_params=None, criterion='entropy', class_weight=None, random_state=None)[source]#
A shapelet tree classifier.
- Parameters:
- n_shapeletsint or {“log2”, “sqrt”, “auto”}, optional
The number of shapelets in the resulting transform.
if, “auto” the number of shapelets depend on the value of strategy. For “best” the number is 1; and for “random” it is 1000.
if, “log2”, the number of shaplets is the log2 of the total possible number of shapelets.
if, “sqrt”, the number of shaplets is the square root of the total possible number of shapelets.
- max_depthint, optional
The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
- min_samples_splitint, optional
The minimum number of samples to split an internal node.
- min_samples_leafint, optional
The minimum number of samples in a leaf.
- min_impurity_decreasefloat, optional
A split will be introduced only if the impurity decrease is larger than or equal to this value.
- impurity_equality_tolerancefloat, optional
Tolerance for considering two impurities as equal. If the impurity decrease is the same, we consider the split that maximizes the gap between the sum of distances.
If None, we never consider the separation gap.
Added in version 1.3.
- strategy{“best”, “random”}, optional
The strategy for selecting shapelets.
If “random”, n_shapelets shapelets are randomly selected in the range defined by min_shapelet_size and max_shapelet_size
If “best”, n_shapelets shapelets are selected per input sample of the size determined by shapelet_size.
Added in version 1.3: Add support for the “best” strategy. The default will change to “best” in 1.4.
- shapelet_sizeint, float or array-like, optional
The shapelet size if strategy=”best”.
If int, the exact shapelet size.
If float, a fraction of the number of input timestep.
If array-like, a list of float or int.
Added in version 1.3.
- sample_sizefloat, optional
The size of the sample to determine the shapelets, if shapelet_size=”best”.
Added in version 1.3.
- min_shapelet_sizefloat, optional
The minimum length of a sampled shapelet expressed as a fraction, computed as min(ceil(X.shape[-1] * min_shapelet_size), 2).
- max_shapelet_sizefloat, optional
The maximum length of a sampled shapelet, expressed as a fraction, computed as ceil(X.shape[-1] * max_shapelet_size).
- coverage_probabilityfloat, optional
The probability that a time step is covered by a shapelet, in the range 0 < coverage_probability <= 1.
For larger coverage_probability, we get larger shapelets.
For smaller coverage_probability, we get shorter shapelets.
- variabilityfloat, optional
Controls the shape of the Beta distribution used to sample shapelets. Defaults to 1.
Higher variability creates more uniform intervals.
Lower variability creates more variable intervals sizes.
- alphafloat, optional
Dynamically decrease the number of sampled shapelets at each node according to the current depth.
if \(alpha < 0\), the number of sampled shapelets decrease from n_shapelets towards 1 with increased depth.
if \(alpha > 0\), the number of sampled shapelets increase from 1 towards n_shapelets with increased depth.
if None, the number of sampled shapelets are the same independent of depth.
- metricstr or list, optional
- If str, the distance metric used to identify the best
shapelet.
- If list, multiple metrics specified as a list of
tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specify a grid over the argument r with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).
Read more about metric specifications in the User guide.
Changed in version 1.2: Added support for multi-metric shapelet transform
- metric_paramsdict, optional
Parameters for the distance measure. Ignored unless metric is a string.
Read more about the parameters in the User guide.
- criterion{“entropy”, “gini”}, optional
The criterion used to evaluate the utility of a split.
- class_weightdict or “balanced”, optional
Weights associated with the labels
if dict, weights on the form {label: weight}
- if “balanced” each class weight inversely proportional to the class
frequency
if None, each class has equal weight.
- random_stateint or RandomState
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
- If None, the random number generator is the RandomState instance used
by np.random.
- Attributes:
- tree_Tree
The tree data structure used internally
- classes_ndarray of shape (n_classes,)
The class labels
- n_classes_int
The number of class labels
See also
ShapeletTreeRegressor
A shapelet tree regressor.
ExtraShapeletTreeClassifier
An extra random shapelet tree classifier.
Notes
When strategy is set to “best”, the shapelet tree is constructed by selecting the top n_shapelets per sample. The initial construction of the matrix profile for each sample may be computationally intensive for large datasets. To balance accuracy and computational efficiency, the sample_size parameter can be adjusted to determine the number of samples utilized to compute the minimum distance annotation.
The significance of shapelets is determined by the difference between the ab-join of a label with any other label and the self-join of the label, selecting the shapelets with the greatest absolute values. This method is detailed in the work of Zhu et al. (2020).
When strategy is set to “random”, the shapelet tree is constructed by randomly sampling n_shapelets within the range defined by min_shapelet_size and max_shapelet_size. This method is detailed in the work of Karlsson et al. (2016). Alternatively, shapelets can be sampled with a specified coverage_probability and variability. By specifying a coverage probability, we define the probability of including a point in the extracted shapelet. If coverage_probability is set, min_shapelet_size and max_shapelet_size are ignored.
References
- Zhu, Y., et al. 2020.
The Swiss army knife of time series data mining: ten useful things you can do with the matrix profile and ten lines of code. Data Mining and Knowledge Discovery, 34, pp.949-979.
- Karlsson, I., Papapetrou, P. and Boström, H., 2016.
Generalized random shapelet forests. Data mining and knowledge discovery, 30, pp.1053-1085.
- apply(x, check_input=True)[source]#
Return the index of the leaf that each sample is predicted by.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- ndarray of shape (n_samples, )
For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count].
Examples
Get the leaf probability distribution of a prediction:
>>> from wildboar.datasets import load_gun_point >>> from wildboar.tree import ShapeletTreeClassifier >>> X, y = load_gun_point() >>> tree = ShapeletTreeClassifier() >>> tree.fit(X, y) >>> leaves = tree.apply(X) >>> tree.tree_.value.take(leaves, axis=0) array([[0., 1.], [0., 1.], [1., 0.]])
This is equvivalent to using tree.predict_proba.
- decision_path(x, check_input=True)[source]#
Compute the decision path of the tree.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- sparse matrix of shape (n_samples, n_nodes)
An indicator array where each nonzero values indicate that the sample traverses a node.
- fit(x, y, sample_weight=None, check_input=True)[source]#
Fit a classification tree.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The training time series.
- yarray-like of shape (n_samples,)
The target values.
- sample_weightarray-like of shape (n_samples,), optional
If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
- check_inputbool, optional
Allow to bypass several input checks.
- Returns:
- self
This instance.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(x, check_input=True)[source]#
Predict the regression of the input samples x.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples,)
The predicted classes.
- predict_proba(x, check_input=True)[source]#
Predict class probabilities of the input samples X.
The predicted class probability is the fraction of samples of the same class in a leaf.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples, n_classes)
The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
- score(X, y, sample_weight=None)[source]#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.tree.ShapeletTreeRegressor(*, n_shapelets='log2', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, impurity_equality_tolerance=None, strategy='warn', shapelet_size=0.1, sample_size=1.0, min_shapelet_size=0, max_shapelet_size=1, coverage_probability=None, variability=1, alpha=None, metric='euclidean', metric_params=None, criterion='squared_error', random_state=None)[source]#
A shapelet tree regressor.
- Parameters:
- n_shapeletsint, optional
The number of shapelets to sample at each node.
- max_depthint, optional
The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
- min_samples_splitint, optional
The minimum number of samples to split an internal node.
- min_samples_leafint, optional
The minimum number of samples in a leaf.
- min_impurity_decreasefloat, optional
A split will be introduced only if the impurity decrease is larger than or equal to this value.
- impurity_equality_tolerancefloat, optional
Tolerance for considering two impurities as equal. If the impurity decrease is the same, we consider the split that maximizes the gap between the sum of distances.
If None, we never consider the separation gap.
Added in version 1.3.
- strategy{“best”, “random”}, optional
The strategy for selecting shapelets.
If “random”, n_shapelets shapelets are randomly selected in the range defined by min_shapelet_size and max_shapelet_size
If “best”, n_shapelets shapelets are selected per input sample of the size determined by shapelet_size.
Added in version 1.3: Add support for the “best” strategy. The default will change to “best” in 1.4.
- shapelet_sizeint, float or array-like, optional
The shapelet size if strategy=”best”.
If int, the exact shapelet size.
If float, a fraction of the number of input timestep.
If array-like, a list of float or int.
Added in version 1.3.
- sample_sizefloat, optional
The size of the sample to determine the shapelets, if shapelet_size=”best”.
Added in version 1.3.
- min_shapelet_sizefloat, optional
The minimum length of a shapelets expressed as a fraction of n_timestep.
- max_shapelet_sizefloat, optional
The maximum length of a shapelets expressed as a fraction of n_timestep.
- coverage_probabilityfloat, optional
The probability that a time step is covered by a shapelet, in the range 0 < coverage_probability <= 1.
For larger coverage_probability, we get larger shapelets.
For smaller coverage_probability, we get shorter shapelets.
- variabilityfloat, optional
Controls the shape of the Beta distribution used to sample shapelets. Defaults to 1.
Higher variability creates more uniform intervals.
Lower variability creates more variable intervals sizes.
- alphafloat, optional
Dynamically decrease the number of sampled shapelets at each node according to the current depth, i.e.:
- ::
w = 1 - exp(-abs(alpha) * depth)
- if alpha < 0, the number of sampled shapelets decrease from
n_shapelets towards 1 with increased depth.
- if alpha > 0, the number of sampled shapelets increase from 1
towards n_shapelets with increased depth.
- if None, the number of sampled shapelets are the same
independent of depth.
- metricstr or list, optional
- If str, the distance metric used to identify the best
shapelet.
- If list, multiple metrics specified as a list of
tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specify a grid over the argument r with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).
Read more about metric specifications in the User guide.
Changed in version 1.2: Added support for multi-metric shapelet transform
- metric_paramsdict, optional
Parameters for the distance measure. Ignored unless metric is a string.
Read more about the parameters in the User guide.
- criterion{“squared_error”}, optional
The criterion used to evaluate the utility of a split.
Deprecated since version 1.1: Criterion “mse” was deprecated in v1.1 and removed in version 1.2.
- random_stateint or RandomState
- If int, random_state is the seed used by the
random number generator
- If
numpy.random.RandomState
instance, random_state is the random number generator
- If
- If None, the random number generator is the
numpy.random.RandomState
instance used bynumpy.random
.
- Attributes:
- tree_Tree
The internal tree representation
Notes
When strategy is set to “best”, the shapelet tree is constructed by selecting the top n_shapelets per sample. The initial construction of the matrix profile for each sample may be computationally intensive for large datasets. To balance accuracy and computational efficiency, the sample_size parameter can be adjusted to determine the number of samples utilized to compute the minimum distance annotation.
The significance of shapelets is determined by the difference between the ab-join of a label with any other label and the self-join of the label, selecting the shapelets with the greatest absolute values. This method is detailed in the work of Zhu et al. (2020).
When strategy is set to “random”, the shapelet tree is constructed by randomly sampling n_shapelets within the range defined by min_shapelet_size and max_shapelet_size. This method is detailed in the work of Karlsson et al. (2016). Alternatively, shapelets can be sampled with a specified coverage_probability and variability. By specifying a coverage probability, we define the probability of including a point in the extracted shapelet. If coverage_probability is set, min_shapelet_size and max_shapelet_size are ignored.
References
- Zhu, Y., et al. 2020.
The Swiss army knife of time series data mining: ten useful things you can do with the matrix profile and ten lines of code. Data Mining and Knowledge Discovery, 34, pp.949-979.
- Karlsson, I., Papapetrou, P. and Boström, H., 2016.
Generalized random shapelet forests. Data mining and knowledge discovery, 30, pp.1053-1085.
- apply(x, check_input=True)[source]#
Return the index of the leaf that each sample is predicted by.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- ndarray of shape (n_samples, )
For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count].
Examples
Get the leaf probability distribution of a prediction:
>>> from wildboar.datasets import load_gun_point >>> from wildboar.tree import ShapeletTreeClassifier >>> X, y = load_gun_point() >>> tree = ShapeletTreeClassifier() >>> tree.fit(X, y) >>> leaves = tree.apply(X) >>> tree.tree_.value.take(leaves, axis=0) array([[0., 1.], [0., 1.], [1., 0.]])
This is equvivalent to using tree.predict_proba.
- decision_path(x, check_input=True)[source]#
Compute the decision path of the tree.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
- check_inputbool, optional
Bypass array validation. Only set to True if you are sure your data is valid.
- Returns:
- sparse matrix of shape (n_samples, n_nodes)
An indicator array where each nonzero values indicate that the sample traverses a node.
- fit(x, y, sample_weight=None, check_input=True)[source]#
Fit the estimator.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The training time series.
- yarray-like of shape (n_samples,)
Target values as floating point values.
- sample_weightarray-like of shape (n_samples,), optional
If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
- check_inputbool, optional
Allow to bypass several input checks.
- Returns:
- self
This object.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(x, check_input=True)[source]#
Predict the value of x.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps)
The input time series.
- check_inputbool, optional
Allow to bypass several input checking. Don’t use this parameter unless you know what you do.
- Returns:
- ndarray of shape (n_samples,)
The predicted classes.
- score(X, y, sample_weight=None)[source]#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True values for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
\(R^2\) of
self.predict(X)
w.r.t. y.
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- wildboar.tree.plot_tree(clf, *, ax=None, bbox_args=dict(), arrow_args=dict(arrowstyle='<-'), max_depth=None, class_labels=True, fontsize=None, node_labeler=None)[source]#
Plot a tree
- Parameters:
- clftree-based estimator
A decision tree.
- axaxes, optional
The axes to plot the tree to.
- bbox_argsdict, optional
Arguments to the node box.
- arrow_argsdict, optional
Arguments to the arrow.
- max_depthint, optional
Only show the branches until max_depth.
- class_labelsbool or array-like, optional
Show the classes
if True, show classes from the classes_ attribute of the decision tree.
if False, show leaf probabilities.
if array-like, show classes from the array.
- fontsizeint, optional
The font size. If None, the font size is determined automatically.
- node_labelercallable, optional
A function returning the label for a node on the form f(node) -> str).
If
node.children is not None
the node is a leaf.node._attr
contains information about the node:n_node_samples
: the number of samples reaching the nodeif leaf,
value
is an array with the fractions of labels reaching the leaf (in case of classification); or the mean among the samples reach the leaf (if regression). Determine if it is a classification or regression tree by inspecting the shape of the value array.if branch,
threshold
contains the threshold used to split the node.if branch,
dim
contains the dimension from which the attribute was extracted.if branch,
attribute
contains the attribute used for computing the feature value. The attribute depends on the estimator.
- Returns:
- axes
The axes.
Examples
>>> from wildboar.datasets import load_two_lead_ecg >>> from wildboar.tree import ShapeletTreeClassifier, plot_tree >>> X, y = load_two_lead_ecg() >>> clf = ShapeletTreeClassifier(strategy="random").fit(X, y) >>> plot_tree(clf) <Axes: >