`wildboar.tree`#

Tree-based estimators for classification and regression.

Package Contents#

Classes#

`ExtraShapeletTreeClassifier`	An extra shapelet tree classifier.
`ExtraShapeletTreeRegressor`	An extra shapelet tree regressor.
`IntervalTreeClassifier`	An interval based tree classifier.
`IntervalTreeRegressor`	An interval based tree regressor.
`PivotTreeClassifier`	A tree classifier that uses pivot time series.
`ProximityTreeClassifier`	A classifier that uses a k-branching tree based on pivot-time series.
`RocketTreeClassifier`	A tree classifier that uses random convolutions as features.
`RocketTreeRegressor`	A tree regressor that uses random convolutions as features.
`ShapeletTreeClassifier`	A shapelet tree classifier.
`ShapeletTreeRegressor`	A shapelet tree regressor.

class wildboar.tree.ExtraShapeletTreeClassifier(*, n_shapelets=1, max_depth=None, min_samples_leaf=1, min_impurity_decrease=0.0, min_samples_split=2, min_shapelet_size=0.0, max_shapelet_size=1.0, metric='euclidean', metric_params=None, criterion='entropy', class_weight=None, random_state=None)[source]#

An extra shapelet tree classifier.

Extra shapelet trees are constructed by sampling a distance threshold uniformly in the range [min(dist), max(dist)].

Parameters:

n_shapeletsint, optional

The number of shapelets to sample at each node.

max_depthint, optional

The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_leafint, optional

The minimum number of samples in a leaf.

min_impurity_decreasefloat, optional

A split will be introduced only if the impurity decrease is larger than or equal to this value.

min_samples_splitint, optional

The minimum number of samples to split an internal node.

min_shapelet_sizefloat, optional

The minimum length of a sampled shapelet expressed as a fraction, computed as min(ceil(X.shape[-1] * min_shapelet_size), 2).

max_shapelet_sizefloat, optional

The maximum length of a sampled shapelet, expressed as a fraction, computed as ceil(X.shape[-1] * max_shapelet_size).

metric{“euclidean”, “scaled_euclidean”, “dtw”, “scaled_dtw”}, optional

Distance metric used to identify the best shapelet.

metric_paramsdict, optional

Parameters for the distance measure.

criterion{“entropy”, “gini”}, optional

The criterion used to evaluate the utility of a split.

class_weightdict or “balanced”, optional

Weights associated with the labels

if dict, weights on the form {label: weight}
if “balanced” each class weight inversely proportional to the class
frequency
if None, each class has equal weight.

random_stateint or RandomState, optional

If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by np.random.

Attributes:

tree_Tree: The tree representation

apply(x, check_input=True)[source]#

Return the index of the leaf that each sample is predicted by.

Parameters:

xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep): The input samples.
check_inputbool, optional: Bypass array validation. Only set to True if you are sure your data is valid.

Returns:

ndarray of shape (n_samples, ): For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count].

Examples

Get the leaf probability distribution of a prediction:

>>> from wildboar.datasets import load_gun_point
>>> from wildboar.tree import ShapeletTreeClassifier
>>> X, y = load_gun_point()
>>> tree = ShapeletTreeClassifier()
>>> tree.fit(X, y)
>>> leaves = tree.apply(X)
>>> tree.tree_.value.take(leaves, axis=0)
array([[0., 1.],
       [0., 1.],
       [1., 0.]])

This is equvivalent to using tree.predict_proba.

decision_path(x, check_input=True)[source]#

Compute the decision path of the tree.

Parameters:

xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep): The input samples.
check_inputbool, optional: Bypass array validation. Only set to True if you are sure your data is valid.

Returns:

sparse matrix of shape (n_samples, n_nodes): An indicator array where each nonzero values indicate that the sample traverses a node.

fit(x, y, sample_weight=None, check_input=True)[source]#

Fit a classification tree.

Parameters:

xarray-like of shape (n_samples, n_timesteps): The training time series.
yarray-like of shape (n_samples,): The target values.
sample_weightarray-like of shape (n_samples,), optional: If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
check_inputbool, optional: Allow to bypass several input checks.

Returns:

self: This instance.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(x, check_input=True)[source]#

Predict the regression of the input samples x.

Parameters:

xarray-like of shape (n_samples, n_timesteps): The input time series.
check_inputbool, optional: Allow to bypass several input checking. Don’t use this parameter unless you know what you do.

Returns:

ndarray of shape (n_samples,): The predicted classes.

predict_proba(x, check_input=True)[source]#

Predict class probabilities of the input samples X.

The predicted class probability is the fraction of samples of the same class in a leaf.

Parameters:

xarray-like of shape (n_samples, n_timesteps): The input time series.
check_inputbool, optional: Allow to bypass several input checking. Don’t use this parameter unless you know what you do.

Returns:

ndarray of shape (n_samples, n_classes): The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

score(X, y, sample_weight=None)[source]#

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:

Xarray-like of shape (n_samples, n_features): Test samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True labels for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns:

scorefloat: Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

class wildboar.tree.ExtraShapeletTreeRegressor(*, n_shapelets=1, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, metric='euclidean', metric_params=None, criterion='squared_error', random_state=None)[source]#

An extra shapelet tree regressor.

Extra shapelet trees are constructed by sampling a distance threshold uniformly in the range [min(dist), max(dist)].

Parameters:

n_shapeletsint, optional

The number of shapelets to sample at each node.

max_depthint, optional

The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_splitint, optional

The minimum number of samples to split an internal node.

min_samples_leafint, optional

The minimum number of samples in a leaf.

criterion{“squared_error”}, optional

The criterion used to evaluate the utility of a split.

Deprecated since version 1.1: Criterion “mse” was deprecated in v1.1 and removed in version 1.2.

min_impurity_decreasefloat, optional

A split will be introduced only if the impurity decrease is larger than or equal to this value.

min_shapelet_sizefloat, optional

The minimum length of a sampled shapelet expressed as a fraction, computed as min(ceil(X.shape[-1] * min_shapelet_size), 2).

max_shapelet_sizefloat, optional

The maximum length of a sampled shapelet, expressed as a fraction, computed as ceil(X.shape[-1] * max_shapelet_size).

metric{‘euclidean’, ‘scaled_euclidean’, ‘scaled_dtw’}, optional

Distance metric used to identify the best shapelet.

metric_paramsdict, optional

Parameters for the distance measure.

random_stateint or RandomState

If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by np.random.

Attributes:

tree_Tree: The internal tree representation

apply(x, check_input=True)[source]#

Return the index of the leaf that each sample is predicted by.

Parameters:

xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep): The input samples.
check_inputbool, optional: Bypass array validation. Only set to True if you are sure your data is valid.

Returns:

ndarray of shape (n_samples, ): For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count].

Examples

Get the leaf probability distribution of a prediction:

>>> from wildboar.datasets import load_gun_point
>>> from wildboar.tree import ShapeletTreeClassifier
>>> X, y = load_gun_point()
>>> tree = ShapeletTreeClassifier()
>>> tree.fit(X, y)
>>> leaves = tree.apply(X)
>>> tree.tree_.value.take(leaves, axis=0)
array([[0., 1.],
       [0., 1.],
       [1., 0.]])

This is equvivalent to using tree.predict_proba.

decision_path(x, check_input=True)[source]#

Compute the decision path of the tree.

Parameters:

xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep): The input samples.
check_inputbool, optional: Bypass array validation. Only set to True if you are sure your data is valid.

Returns:

sparse matrix of shape (n_samples, n_nodes): An indicator array where each nonzero values indicate that the sample traverses a node.

fit(x, y, sample_weight=None, check_input=True)[source]#

Fit the estimator.

Parameters:

xarray-like of shape (n_samples, n_timesteps): The training time series.
yarray-like of shape (n_samples,): Target values as floating point values.
sample_weightarray-like of shape (n_samples,), optional: If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
check_inputbool, optional: Allow to bypass several input checks.

Returns:

self: This object.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(x, check_input=True)[source]#

Predict the value of x.

Parameters:

xarray-like of shape (n_samples, n_timesteps): The input time series.
check_inputbool, optional: Allow to bypass several input checking. Don’t use this parameter unless you know what you do.

Returns:

ndarray of shape (n_samples,): The predicted classes.

score(X, y, sample_weight=None)[source]#

Return the coefficient of determination of the prediction.

The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters:

Xarray-like of shape (n_samples, n_features): Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True values for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns:

scorefloat: \(R^2\) of self.predict(X) w.r.t. y.

Notes

The \(R^2\) score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score. This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

class wildboar.tree.IntervalTreeClassifier(n_intervals='sqrt', *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='entropy', intervals='fixed', sample_size=0.5, min_size=0.0, max_size=1.0, summarizer='mean_var_slope', class_weight=None, random_state=None)[source]#

An interval based tree classifier.

Attributes:

tree_Tree: The internal tree structure.

apply(x, check_input=True)[source]#

Return the index of the leaf that each sample is predicted by.

Parameters:

xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep): The input samples.
check_inputbool, optional: Bypass array validation. Only set to True if you are sure your data is valid.

Returns:

ndarray of shape (n_samples, ): For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count].

Examples

Get the leaf probability distribution of a prediction:

>>> from wildboar.datasets import load_gun_point
>>> from wildboar.tree import ShapeletTreeClassifier
>>> X, y = load_gun_point()
>>> tree = ShapeletTreeClassifier()
>>> tree.fit(X, y)
>>> leaves = tree.apply(X)
>>> tree.tree_.value.take(leaves, axis=0)
array([[0., 1.],
       [0., 1.],
       [1., 0.]])

This is equvivalent to using tree.predict_proba.

decision_path(x, check_input=True)[source]#

Compute the decision path of the tree.

Parameters:

xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep): The input samples.
check_inputbool, optional: Bypass array validation. Only set to True if you are sure your data is valid.

Returns:

sparse matrix of shape (n_samples, n_nodes): An indicator array where each nonzero values indicate that the sample traverses a node.

fit(x, y, sample_weight=None, check_input=True)[source]#

Fit a classification tree.

Parameters:

xarray-like of shape (n_samples, n_timesteps): The training time series.
yarray-like of shape (n_samples,): The target values.
sample_weightarray-like of shape (n_samples,), optional: If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
check_inputbool, optional: Allow to bypass several input checks.

Returns:

self: This instance.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(x, check_input=True)[source]#

Predict the regression of the input samples x.

Parameters:

xarray-like of shape (n_samples, n_timesteps): The input time series.
check_inputbool, optional: Allow to bypass several input checking. Don’t use this parameter unless you know what you do.

Returns:

ndarray of shape (n_samples,): The predicted classes.

predict_proba(x, check_input=True)[source]#

Predict class probabilities of the input samples X.

The predicted class probability is the fraction of samples of the same class in a leaf.

Parameters:

xarray-like of shape (n_samples, n_timesteps): The input time series.
check_inputbool, optional: Allow to bypass several input checking. Don’t use this parameter unless you know what you do.

Returns:

ndarray of shape (n_samples, n_classes): The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

score(X, y, sample_weight=None)[source]#

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:

Xarray-like of shape (n_samples, n_features): Test samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True labels for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns:

scorefloat: Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

class wildboar.tree.IntervalTreeRegressor(n_intervals='sqrt', *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='squared_error', intervals='fixed', sample_size=0.5, min_size=0.0, max_size=1.0, summarizer='mean_var_slope', random_state=None)[source]#

An interval based tree regressor.

Attributes:

tree_Tree: The internal tree structure.

apply(x, check_input=True)[source]#

Return the index of the leaf that each sample is predicted by.

Parameters:

xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep): The input samples.
check_inputbool, optional: Bypass array validation. Only set to True if you are sure your data is valid.

Returns:

ndarray of shape (n_samples, ): For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count].

Examples

Get the leaf probability distribution of a prediction:

>>> from wildboar.datasets import load_gun_point
>>> from wildboar.tree import ShapeletTreeClassifier
>>> X, y = load_gun_point()
>>> tree = ShapeletTreeClassifier()
>>> tree.fit(X, y)
>>> leaves = tree.apply(X)
>>> tree.tree_.value.take(leaves, axis=0)
array([[0., 1.],
       [0., 1.],
       [1., 0.]])

This is equvivalent to using tree.predict_proba.

decision_path(x, check_input=True)[source]#

Compute the decision path of the tree.

Parameters:

xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep): The input samples.
check_inputbool, optional: Bypass array validation. Only set to True if you are sure your data is valid.

Returns:

sparse matrix of shape (n_samples, n_nodes): An indicator array where each nonzero values indicate that the sample traverses a node.

fit(x, y, sample_weight=None, check_input=True)[source]#

Fit the estimator.

Parameters:

xarray-like of shape (n_samples, n_timesteps): The training time series.
yarray-like of shape (n_samples,): Target values as floating point values.
sample_weightarray-like of shape (n_samples,), optional: If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
check_inputbool, optional: Allow to bypass several input checks.

Returns:

self: This object.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(x, check_input=True)[source]#

Predict the value of x.

Parameters:

xarray-like of shape (n_samples, n_timesteps): The input time series.
check_inputbool, optional: Allow to bypass several input checking. Don’t use this parameter unless you know what you do.

Returns:

ndarray of shape (n_samples,): The predicted classes.

score(X, y, sample_weight=None)[source]#

Return the coefficient of determination of the prediction.

The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters:

Xarray-like of shape (n_samples, n_features): Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True values for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns:

scorefloat: \(R^2\) of self.predict(X) w.r.t. y.

Notes

The \(R^2\) score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score. This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

class wildboar.tree.PivotTreeClassifier(n_pivot='sqrt', *, metrics='all', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='entropy', class_weight=None, random_state=None)[source]#

A tree classifier that uses pivot time series.

Attributes:

tree_Tree: The internal tree representation

apply(x, check_input=True)[source]#

Return the index of the leaf that each sample is predicted by.

Parameters:

xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep): The input samples.
check_inputbool, optional: Bypass array validation. Only set to True if you are sure your data is valid.

Returns:

ndarray of shape (n_samples, ): For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count].

Examples

Get the leaf probability distribution of a prediction:

>>> from wildboar.datasets import load_gun_point
>>> from wildboar.tree import ShapeletTreeClassifier
>>> X, y = load_gun_point()
>>> tree = ShapeletTreeClassifier()
>>> tree.fit(X, y)
>>> leaves = tree.apply(X)
>>> tree.tree_.value.take(leaves, axis=0)
array([[0., 1.],
       [0., 1.],
       [1., 0.]])

This is equvivalent to using tree.predict_proba.

decision_path(x, check_input=True)[source]#

Compute the decision path of the tree.

Parameters:

xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep): The input samples.
check_inputbool, optional: Bypass array validation. Only set to True if you are sure your data is valid.

Returns:

sparse matrix of shape (n_samples, n_nodes): An indicator array where each nonzero values indicate that the sample traverses a node.

fit(x, y, sample_weight=None, check_input=True)[source]#

Fit a classification tree.

Parameters:

xarray-like of shape (n_samples, n_timesteps): The training time series.
yarray-like of shape (n_samples,): The target values.
sample_weightarray-like of shape (n_samples,), optional: If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
check_inputbool, optional: Allow to bypass several input checks.

Returns:

self: This instance.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(x, check_input=True)[source]#

Predict the regression of the input samples x.

Parameters:

xarray-like of shape (n_samples, n_timesteps): The input time series.
check_inputbool, optional: Allow to bypass several input checking. Don’t use this parameter unless you know what you do.

Returns:

ndarray of shape (n_samples,): The predicted classes.

predict_proba(x, check_input=True)[source]#

Predict class probabilities of the input samples X.

The predicted class probability is the fraction of samples of the same class in a leaf.

Parameters:

xarray-like of shape (n_samples, n_timesteps): The input time series.
check_inputbool, optional: Allow to bypass several input checking. Don’t use this parameter unless you know what you do.

Returns:

ndarray of shape (n_samples, n_classes): The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

score(X, y, sample_weight=None)[source]#

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:

Xarray-like of shape (n_samples, n_features): Test samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True labels for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns:

scorefloat: Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

class wildboar.tree.ProximityTreeClassifier(n_pivot=1, *, criterion='entropy', pivot_sample='label', metric_sample='weighted', metric='auto', metric_params=None, metric_factories=None, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, class_weight=None, random_state=None)[source]#

A classifier that uses a k-branching tree based on pivot-time series.

Parameters:

n_pivotint, optional

The number of pivots to sample at each node.

criterion{“entropy”, “gini”}, optional

The impurity criterion.

pivot_sample{“label”, “uniform”}, optional

The pivot sampling method.

metric_sample{“uniform”, “weighted”}, optional

The metric sampling method.

metric{“auto”, “default”}, str or list, optional

The distance metrics. By default, we use the parameterization suggested by Lucas et.al (2019).

If “auto”, use the default metric specification, suggested by (Lucas et. al, 2020).
If str, use a single metric or default metric specification.
If list, custom metric specification can be given as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values as well as the number of values in the grid. For example, to specifiy a grid over the argument ‘r’ with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).

Read more about the metrics and their parameters in the User guide.

metric_paramsdict, optional

Parameters for the distance measure. Ignored unless metric is a string.

Read more about the parameters in the User guide.

metric_factoriesdict, optional

A metric specification.

Deprecated since version 1.2: Use the combination of metric and metric params.

max_depthint, optional

The maximum tree depth.

min_samples_splitint, optional

The minimum number of samples to consider a split.

min_samples_leafint, optional

The minimum number of samples in a leaf.

min_impurity_decreasefloat, optional

The minimum impurity decrease to build a sub-tree.

class_weightdict or “balanced”, optional

Weights associated with the labels.

if dict, weights on the form {label: weight}.
if “balanced” each class weight inversely proportional to the class
frequency.
if None, each class has equal weight.

random_stateint or RandomState

If int, random_state is the seed used by the random number generator
If RandomState instance, random_state is the random number generator
If None, the random number generator is the RandomState instance used
by np.random.

References

Lucas, Benjamin, Ahmed Shifaz, Charlotte Pelletier, Lachlan O’Neill, Nayyar Zaidi, Bart Goethals, François Petitjean, and Geoffrey I. Webb. (2019): Proximity forest: an effective and scalable distance-based classifier for time series. Data Mining and Knowledge Discovery

Examples

Fit a single proximity tree, with dynamic time warping and move-split-merge metrics.

>>> from wildboar.datasets import load_dataset
>>> from wildboar.tree import ProximityTreeClassifier
>>> x, y = load_dataset("GunPoint")
>>> f = ProximityTreeClassifier(
...     n_pivot=10,
...     metrics=[
...         ("dtw", {"min_r": 0.1, "max_r": 0.25}),
...         ("msm", {"min_c": 0.1, "max_c": 100, "num_c": 20})
...     ],
...     criterion="gini"
... )
>>> f.fit(x, y)

apply(x, check_input=True)[source]#

Return the index of the leaf that each sample is predicted by.

Parameters:

xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep): The input samples.
check_inputbool, optional: Bypass array validation. Only set to True if you are sure your data is valid.

Returns:

ndarray of shape (n_samples, ): For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count].

Examples

Get the leaf probability distribution of a prediction:

>>> from wildboar.datasets import load_gun_point
>>> from wildboar.tree import ShapeletTreeClassifier
>>> X, y = load_gun_point()
>>> tree = ShapeletTreeClassifier()
>>> tree.fit(X, y)
>>> leaves = tree.apply(X)
>>> tree.tree_.value.take(leaves, axis=0)
array([[0., 1.],
       [0., 1.],
       [1., 0.]])

This is equvivalent to using tree.predict_proba.

decision_path(x, check_input=True)[source]#

Compute the decision path of the tree.

Parameters:

xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep): The input samples.
check_inputbool, optional: Bypass array validation. Only set to True if you are sure your data is valid.

Returns:

sparse matrix of shape (n_samples, n_nodes): An indicator array where each nonzero values indicate that the sample traverses a node.

fit(x, y, sample_weight=None, check_input=True)[source]#

Fit a classification tree.

Parameters:

xarray-like of shape (n_samples, n_timesteps): The training time series.
yarray-like of shape (n_samples,): The target values.
sample_weightarray-like of shape (n_samples,), optional: If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
check_inputbool, optional: Allow to bypass several input checks.

Returns:

self: This instance.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(x, check_input=True)[source]#

Predict the regression of the input samples x.

Parameters:

xarray-like of shape (n_samples, n_timesteps): The input time series.
check_inputbool, optional: Allow to bypass several input checking. Don’t use this parameter unless you know what you do.

Returns:

ndarray of shape (n_samples,): The predicted classes.

predict_proba(x, check_input=True)[source]#

Predict class probabilities of the input samples X.

The predicted class probability is the fraction of samples of the same class in a leaf.

Parameters:

xarray-like of shape (n_samples, n_timesteps): The input time series.
check_inputbool, optional: Allow to bypass several input checking. Don’t use this parameter unless you know what you do.

Returns:

ndarray of shape (n_samples, n_classes): The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

score(X, y, sample_weight=None)[source]#

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:

Xarray-like of shape (n_samples, n_features): Test samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True labels for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns:

scorefloat: Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

class wildboar.tree.RocketTreeClassifier(n_kernels=10, *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='entropy', sampling='normal', sampling_params=None, kernel_size=None, min_size=None, max_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, class_weight=None, random_state=None)[source]#

A tree classifier that uses random convolutions as features.

Attributes:

tree_Tree: The internal tree representation.

apply(x, check_input=True)[source]#

Return the index of the leaf that each sample is predicted by.

Parameters:

xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep): The input samples.
check_inputbool, optional: Bypass array validation. Only set to True if you are sure your data is valid.

Returns:

ndarray of shape (n_samples, ): For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count].

Examples

Get the leaf probability distribution of a prediction:

>>> from wildboar.datasets import load_gun_point
>>> from wildboar.tree import ShapeletTreeClassifier
>>> X, y = load_gun_point()
>>> tree = ShapeletTreeClassifier()
>>> tree.fit(X, y)
>>> leaves = tree.apply(X)
>>> tree.tree_.value.take(leaves, axis=0)
array([[0., 1.],
       [0., 1.],
       [1., 0.]])

This is equvivalent to using tree.predict_proba.

decision_path(x, check_input=True)[source]#

Compute the decision path of the tree.

Parameters:

xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep): The input samples.
check_inputbool, optional: Bypass array validation. Only set to True if you are sure your data is valid.

Returns:

sparse matrix of shape (n_samples, n_nodes): An indicator array where each nonzero values indicate that the sample traverses a node.

fit(x, y, sample_weight=None, check_input=True)[source]#

Fit a classification tree.

Parameters:

xarray-like of shape (n_samples, n_timesteps): The training time series.
yarray-like of shape (n_samples,): The target values.
sample_weightarray-like of shape (n_samples,), optional: If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
check_inputbool, optional: Allow to bypass several input checks.

Returns:

self: This instance.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(x, check_input=True)[source]#

Predict the regression of the input samples x.

Parameters:

xarray-like of shape (n_samples, n_timesteps): The input time series.
check_inputbool, optional: Allow to bypass several input checking. Don’t use this parameter unless you know what you do.

Returns:

ndarray of shape (n_samples,): The predicted classes.

predict_proba(x, check_input=True)[source]#

Predict class probabilities of the input samples X.

The predicted class probability is the fraction of samples of the same class in a leaf.

Parameters:

xarray-like of shape (n_samples, n_timesteps): The input time series.
check_inputbool, optional: Allow to bypass several input checking. Don’t use this parameter unless you know what you do.

Returns:

ndarray of shape (n_samples, n_classes): The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.

score(X, y, sample_weight=None)[source]#

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:

Xarray-like of shape (n_samples, n_features): Test samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True labels for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns:

scorefloat: Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

class wildboar.tree.RocketTreeRegressor(n_kernels=10, *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='squared_error', sampling='normal', sampling_params=None, kernel_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, random_state=None)[source]#

A tree regressor that uses random convolutions as features.

Attributes:

tree_Tree: The internal tree representation.

apply(x, check_input=True)[source]#

Return the index of the leaf that each sample is predicted by.

Parameters:

xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep): The input samples.
check_inputbool, optional: Bypass array validation. Only set to True if you are sure your data is valid.

Returns:

ndarray of shape (n_samples, ): For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count].

Examples

Get the leaf probability distribution of a prediction:

>>> from wildboar.datasets import load_gun_point
>>> from wildboar.tree import ShapeletTreeClassifier
>>> X, y = load_gun_point()
>>> tree = ShapeletTreeClassifier()
>>> tree.fit(X, y)
>>> leaves = tree.apply(X)
>>> tree.tree_.value.take(leaves, axis=0)
array([[0., 1.],
       [0., 1.],
       [1., 0.]])

This is equvivalent to using tree.predict_proba.

decision_path(x, check_input=True)[source]#

Compute the decision path of the tree.

Parameters:

xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep): The input samples.
check_inputbool, optional: Bypass array validation. Only set to True if you are sure your data is valid.

Returns:

sparse matrix of shape (n_samples, n_nodes): An indicator array where each nonzero values indicate that the sample traverses a node.

fit(x, y, sample_weight=None, check_input=True)[source]#

Fit the estimator.

Parameters:

xarray-like of shape (n_samples, n_timesteps): The training time series.
yarray-like of shape (n_samples,): Target values as floating point values.
sample_weightarray-like of shape (n_samples,), optional: If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node.
check_inputbool, optional: Allow to bypass several input checks.

Returns:

self: This object.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(x, check_input=True)[source]#

Predict the value of x.

Parameters:

xarray-like of shape (n_samples, n_timesteps): The input time series.
check_inputbool, optional: Allow to bypass several input checking. Don’t use this parameter unless you know what you do.

Returns:

ndarray of shape (n_samples,): The predicted classes.

score(X, y, sample_weight=None)[source]#

Return the coefficient of determination of the prediction.

The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters:

Xarray-like of shape (n_samples, n_features): Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True values for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns:

scorefloat: \(R^2\) of self.predict(X) w.r.t. y.

Notes

The \(R^2\) score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score. This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

class wildboar.tree.ShapeletTreeClassifier(*, n_shapelets='log2', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, alpha=None, metric='euclidean', metric_params=None, criterion='entropy', class_weight=None, random_state=None)[source]#

A shapelet tree classifier.

Parameters:

n_shapeletsint, optional

The number of shapelets to sample at each node.

max_depthint, optional

The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_splitint, optional

The minimum number of samples to split an internal node.

min_samples_leafint, optional

The minimum number of samples in a leaf.

criterion{“entropy”, “gini”}, optional

The criterion used to evaluate the utility of a split.

min_impurity_decreasefloat, optional

A split will be introduced only if the impurity decrease is larger than or equal to this value.

min_shapelet_sizefloat, optional

The minimum length of a sampled shapelet expressed as a fraction, computed as min(ceil(X.shape[-1] * min_shapelet_size), 2).

max_shapelet_sizefloat, optional

The maximum length of a sampled shapelet, expressed as a fraction, computed as ceil(X.shape[-1] * max_shapelet_size).

alphafloat, optional

Dynamically decrease the number of sampled shapelets at each node according to the current depth.

if \(alpha < 0\), the number of sampled shapelets decrease from
n_shapelets towards 1 with increased depth.
if \(alpha > 0\), the number of sampled shapelets increase from 1
towards n_shapelets with increased depth.
if None, the number of sampled shapelets are the same independeth of
depth.

metric{“euclidean”, “scaled_euclidean”, “dtw”, “scaled_dtw”}, optional

Distance metric used to identify the best shapelet.

metric_paramsdict, optional

Parameters for the distance measure

class_weightdict or “balanced”, optional

Weights associated with the labels

if dict, weights on the form {label: weight}
if “balanced” each class weight inversely proportional to the class
frequency
if None, each class has equal weight.

random_stateint or RandomState

If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by np.random.

Attributes:

tree_Tree: The tree data structure used internally
classes_ndarray of shape (n_classes,): The class labels
n_classes_int: The number of class labels

wildboar.tree#

Package Contents#

Classes#

This Page

`wildboar.tree`#