***********************
:py:mod:`wildboar.tree`
***********************
.. py:module:: wildboar.tree
.. autoapi-nested-parse::
Tree-based estimators for classification and regression.
..
!! processed by numpydoc !!
Classes
-------
.. autoapisummary::
wildboar.tree.ExtraShapeletTreeClassifier
wildboar.tree.ExtraShapeletTreeRegressor
wildboar.tree.IntervalTreeClassifier
wildboar.tree.IntervalTreeRegressor
wildboar.tree.PivotTreeClassifier
wildboar.tree.ProximityTreeClassifier
wildboar.tree.RocketTreeClassifier
wildboar.tree.RocketTreeRegressor
wildboar.tree.ShapeletTreeClassifier
wildboar.tree.ShapeletTreeRegressor
Functions
---------
.. autoapisummary::
wildboar.tree.plot_tree
.. raw:: html
.. py:class:: ExtraShapeletTreeClassifier(*, n_shapelets=1, max_depth=None, min_samples_leaf=1, min_impurity_decrease=0.0, min_samples_split=2, min_shapelet_size=0.0, max_shapelet_size=1.0, coverage_probability=None, variability=1, metric='euclidean', metric_params=None, criterion='entropy', class_weight=None, random_state=None)
An extra shapelet tree classifier.
Extra shapelet trees are constructed by sampling a distance threshold
uniformly in the range `[min(dist), max(dist)]`.
:Parameters:
**n_shapelets** : int, optional
The number of shapelets to sample at each node.
**max_depth** : int, optional
The maximum depth of the tree. If `None` the tree is expanded until all
leaves are pure or until all leaves contain less than `min_samples_split`
samples.
**min_samples_leaf** : int, optional
The minimum number of samples in a leaf.
**min_impurity_decrease** : float, optional
A split will be introduced only if the impurity decrease is larger than or
equal to this value.
**min_samples_split** : int, optional
The minimum number of samples to split an internal node.
**min_shapelet_size** : float, optional
The minimum length of a sampled shapelet expressed as a fraction, computed
as `min(ceil(X.shape[-1] * min_shapelet_size), 2)`.
**max_shapelet_size** : float, optional
The maximum length of a sampled shapelet, expressed as a fraction, computed
as `ceil(X.shape[-1] * max_shapelet_size)`.
**coverage_probability** : float, optional
The probability that a time step is covered by a
shapelet, in the range 0 < coverage_probability <= 1.
- For larger `coverage_probability`, we get larger shapelets.
- For smaller `coverage_probability`, we get shorter shapelets.
**variability** : float, optional
Controls the shape of the Beta distribution used to
sample shapelets. Defaults to 1.
- Higher `variability` creates more uniform intervals.
- Lower `variability` creates more variable intervals sizes.
**metric** : {"euclidean", "scaled_euclidean", "dtw", "scaled_dtw"}, optional
Distance metric used to identify the best shapelet.
**metric_params** : dict, optional
Parameters for the distance measure.
**criterion** : {"entropy", "gini"}, optional
The criterion used to evaluate the utility of a split.
**class_weight** : dict or "balanced", optional
Weights associated with the labels
- if dict, weights on the form {label: weight}
- if "balanced" each class weight inversely proportional to the class
frequency
- if None, each class has equal weight.
**random_state** : int or RandomState, optional
- If `int`, `random_state` is the seed used by the random number generator;
- If `RandomState` instance, `random_state` is the random number generator;
- If `None`, the random number generator is the `RandomState` instance used
by `np.random`.
:Attributes:
**tree_** : Tree
The tree representation
..
!! processed by numpydoc !!
.. py:method:: apply(x, check_input=True)
Return the index of the leaf that each sample is predicted by.
:Parameters:
**x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
**check_input** : bool, optional
Bypass array validation. Only set to True if you are sure your data
is valid.
:Returns:
ndarray of shape (n_samples, )
For every sample, return the index of the leaf that the sample
ends up in. The index is in the range [0; node_count].
.. rubric:: Examples
Get the leaf probability distribution of a prediction:
>>> from wildboar.datasets import load_gun_point
>>> from wildboar.tree import ShapeletTreeClassifier
>>> X, y = load_gun_point()
>>> tree = ShapeletTreeClassifier()
>>> tree.fit(X, y)
>>> leaves = tree.apply(X)
>>> tree.tree_.value.take(leaves, axis=0)
array([[0., 1.],
[0., 1.],
[1., 0.]])
This is equvivalent to using `tree.predict_proba`.
..
!! processed by numpydoc !!
.. py:method:: decision_path(x, check_input=True)
Compute the decision path of the tree.
:Parameters:
**x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
**check_input** : bool, optional
Bypass array validation. Only set to True if you are sure your data
is valid.
:Returns:
sparse matrix of shape (n_samples, n_nodes)
An indicator array where each nonzero values indicate that the sample
traverses a node.
..
!! processed by numpydoc !!
.. py:method:: fit(x, y, sample_weight=None, check_input=True)
Fit a classification tree.
:Parameters:
**x** : array-like of shape (n_samples, n_timesteps)
The training time series.
**y** : array-like of shape (n_samples,)
The target values.
**sample_weight** : array-like of shape (n_samples,), optional
If `None`, then samples are equally weighted. Splits that would create child
nodes with net zero or negative weight are ignored while searching for a
split in each node. Splits are also ignored if they would result in any
single class carrying a negative weight in either child node.
**check_input** : bool, optional
Allow to bypass several input checks.
:Returns:
self
This instance.
..
!! processed by numpydoc !!
.. py:method:: get_metadata_routing()
Get metadata routing of this object.
Please check :ref:`User Guide ` on how the routing
mechanism works.
:Returns:
**routing** : MetadataRequest
A :class:`~sklearn.utils.metadata_routing.MetadataRequest` encapsulating
routing information.
..
!! processed by numpydoc !!
.. py:method:: get_params(deep=True)
Get parameters for this estimator.
:Parameters:
**deep** : bool, default=True
If True, will return the parameters for this estimator and
contained subobjects that are estimators.
:Returns:
**params** : dict
Parameter names mapped to their values.
..
!! processed by numpydoc !!
.. py:method:: predict(x, check_input=True)
Predict the regression of the input samples x.
:Parameters:
**x** : array-like of shape (n_samples, n_timesteps)
The input time series.
**check_input** : bool, optional
Allow to bypass several input checking. Don't use this parameter unless you
know what you do.
:Returns:
ndarray of shape (n_samples,)
The predicted classes.
..
!! processed by numpydoc !!
.. py:method:: predict_proba(x, check_input=True)
Predict class probabilities of the input samples X.
The predicted class probability is the fraction of samples of the same
class in a leaf.
:Parameters:
**x** : array-like of shape (n_samples, n_timesteps)
The input time series.
**check_input** : bool, optional
Allow to bypass several input checking. Don't use this parameter unless you
know what you do.
:Returns:
ndarray of shape (n_samples, n_classes)
The class probabilities of the input samples. The order of the classes
corresponds to that in the attribute `classes_`.
..
!! processed by numpydoc !!
.. py:method:: score(X, y, sample_weight=None)
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy
which is a harsh metric since you require for each sample that
each label set be correctly predicted.
:Parameters:
**X** : array-like of shape (n_samples, n_features)
Test samples.
**y** : array-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for `X`.
**sample_weight** : array-like of shape (n_samples,), default=None
Sample weights.
:Returns:
**score** : float
Mean accuracy of ``self.predict(X)`` w.r.t. `y`.
..
!! processed by numpydoc !!
.. py:method:: set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as :class:`~sklearn.pipeline.Pipeline`). The latter have
parameters of the form ``__`` so that it's
possible to update each component of a nested object.
:Parameters:
**\*\*params** : dict
Estimator parameters.
:Returns:
**self** : estimator instance
Estimator instance.
..
!! processed by numpydoc !!
.. py:class:: ExtraShapeletTreeRegressor(*, n_shapelets=1, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, coverage_probability=None, variability=1, metric='euclidean', metric_params=None, criterion='squared_error', random_state=None)
An extra shapelet tree regressor.
Extra shapelet trees are constructed by sampling a distance threshold
uniformly in the range [min(dist), max(dist)].
:Parameters:
**n_shapelets** : int, optional
The number of shapelets to sample at each node.
**max_depth** : int, optional
The maximum depth of the tree. If `None` the tree is expanded until all
leaves are pure or until all leaves contain less than `min_samples_split`
samples.
**min_samples_split** : int, optional
The minimum number of samples to split an internal node.
**min_samples_leaf** : int, optional
The minimum number of samples in a leaf.
**criterion** : {"squared_error"}, optional
The criterion used to evaluate the utility of a split.
.. deprecated:: 1.1
Criterion "mse" was deprecated in v1.1 and removed in version 1.2.
**min_impurity_decrease** : float, optional
A split will be introduced only if the impurity decrease is larger than or
equal to this value.
**min_shapelet_size** : float, optional
The minimum length of a sampled shapelet expressed as a fraction, computed
as `min(ceil(X.shape[-1] * min_shapelet_size), 2)`.
**max_shapelet_size** : float, optional
The maximum length of a sampled shapelet, expressed as a fraction, computed
as `ceil(X.shape[-1] * max_shapelet_size)`.
**coverage_probability** : float, optional
The probability that a time step is covered by a
shapelet, in the range 0 < coverage_probability <= 1.
- For larger `coverage_probability`, we get larger shapelets.
- For smaller `coverage_probability`, we get shorter shapelets.
**variability** : float, optional
Controls the shape of the Beta distribution used to
sample shapelets. Defaults to 1.
- Higher `variability` creates more uniform intervals.
- Lower `variability` creates more variable intervals sizes.
**metric** : {'euclidean', 'scaled_euclidean', 'scaled_dtw'}, optional
Distance metric used to identify the best shapelet.
**metric_params** : dict, optional
Parameters for the distance measure.
**random_state** : int or RandomState
- If `int`, `random_state` is the seed used by the random number generator;
- If `RandomState` instance, `random_state` is the random number generator;
- If `None`, the random number generator is the `RandomState` instance used
by `np.random`.
:Attributes:
**tree_** : Tree
The internal tree representation
..
!! processed by numpydoc !!
.. py:method:: apply(x, check_input=True)
Return the index of the leaf that each sample is predicted by.
:Parameters:
**x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
**check_input** : bool, optional
Bypass array validation. Only set to True if you are sure your data
is valid.
:Returns:
ndarray of shape (n_samples, )
For every sample, return the index of the leaf that the sample
ends up in. The index is in the range [0; node_count].
.. rubric:: Examples
Get the leaf probability distribution of a prediction:
>>> from wildboar.datasets import load_gun_point
>>> from wildboar.tree import ShapeletTreeClassifier
>>> X, y = load_gun_point()
>>> tree = ShapeletTreeClassifier()
>>> tree.fit(X, y)
>>> leaves = tree.apply(X)
>>> tree.tree_.value.take(leaves, axis=0)
array([[0., 1.],
[0., 1.],
[1., 0.]])
This is equvivalent to using `tree.predict_proba`.
..
!! processed by numpydoc !!
.. py:method:: decision_path(x, check_input=True)
Compute the decision path of the tree.
:Parameters:
**x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
**check_input** : bool, optional
Bypass array validation. Only set to True if you are sure your data
is valid.
:Returns:
sparse matrix of shape (n_samples, n_nodes)
An indicator array where each nonzero values indicate that the sample
traverses a node.
..
!! processed by numpydoc !!
.. py:method:: fit(x, y, sample_weight=None, check_input=True)
Fit the estimator.
:Parameters:
**x** : array-like of shape (n_samples, n_timesteps)
The training time series.
**y** : array-like of shape (n_samples,)
Target values as floating point values.
**sample_weight** : array-like of shape (n_samples,), optional
If `None`, then samples are equally weighted. Splits that would create child
nodes with net zero or negative weight are ignored while searching for a
split in each node. Splits are also ignored if they would result in any
single class carrying a negative weight in either child node.
**check_input** : bool, optional
Allow to bypass several input checks.
:Returns:
self
This object.
..
!! processed by numpydoc !!
.. py:method:: get_metadata_routing()
Get metadata routing of this object.
Please check :ref:`User Guide ` on how the routing
mechanism works.
:Returns:
**routing** : MetadataRequest
A :class:`~sklearn.utils.metadata_routing.MetadataRequest` encapsulating
routing information.
..
!! processed by numpydoc !!
.. py:method:: get_params(deep=True)
Get parameters for this estimator.
:Parameters:
**deep** : bool, default=True
If True, will return the parameters for this estimator and
contained subobjects that are estimators.
:Returns:
**params** : dict
Parameter names mapped to their values.
..
!! processed by numpydoc !!
.. py:method:: predict(x, check_input=True)
Predict the value of x.
:Parameters:
**x** : array-like of shape (n_samples, n_timesteps)
The input time series.
**check_input** : bool, optional
Allow to bypass several input checking. Don't use this parameter unless you
know what you do.
:Returns:
ndarray of shape (n_samples,)
The predicted classes.
..
!! processed by numpydoc !!
.. py:method:: score(X, y, sample_weight=None)
Return the coefficient of determination of the prediction.
The coefficient of determination :math:`R^2` is defined as
:math:`(1 - \frac{u}{v})`, where :math:`u` is the residual
sum of squares ``((y_true - y_pred)** 2).sum()`` and :math:`v`
is the total sum of squares ``((y_true - y_true.mean()) ** 2).sum()``.
The best possible score is 1.0 and it can be negative (because the
model can be arbitrarily worse). A constant model that always predicts
the expected value of `y`, disregarding the input features, would get
a :math:`R^2` score of 0.0.
:Parameters:
**X** : array-like of shape (n_samples, n_features)
Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
``(n_samples, n_samples_fitted)``, where ``n_samples_fitted``
is the number of samples used in the fitting for the estimator.
**y** : array-like of shape (n_samples,) or (n_samples, n_outputs)
True values for `X`.
**sample_weight** : array-like of shape (n_samples,), default=None
Sample weights.
:Returns:
**score** : float
:math:`R^2` of ``self.predict(X)`` w.r.t. `y`.
.. rubric:: Notes
The :math:`R^2` score used when calling ``score`` on a regressor uses
``multioutput='uniform_average'`` from version 0.23 to keep consistent
with default value of :func:`~sklearn.metrics.r2_score`.
This influences the ``score`` method of all the multioutput
regressors (except for
:class:`~sklearn.multioutput.MultiOutputRegressor`).
..
!! processed by numpydoc !!
.. py:method:: set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as :class:`~sklearn.pipeline.Pipeline`). The latter have
parameters of the form ``__`` so that it's
possible to update each component of a nested object.
:Parameters:
**\*\*params** : dict
Estimator parameters.
:Returns:
**self** : estimator instance
Estimator instance.
..
!! processed by numpydoc !!
.. py:class:: IntervalTreeClassifier(n_intervals='sqrt', *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='entropy', intervals='fixed', sample_size=None, min_size=0.0, max_size=1.0, coverage_probability=None, variability=1, summarizer='mean_var_slope', class_weight=None, random_state=None)
An interval based tree classifier.
:Parameters:
**n_intervals** : {"log", "sqrt"}, int or float, optional
The number of intervals to partition the time series into.
- if "log", the number of intervals is `log2(n_timestep)`.
- if "sqrt", the number of intervals is `sqrt(n_timestep)`.
- if int, the number of intervals is `n_intervals`.
- if float, the number of intervals is `n_intervals * n_timestep`, with
`0 < n_intervals < 1`.
**max_depth** : int, optional
The maximum depth of the tree. If `None` the tree is expanded until all
leaves are pure or until all leaves contain less than `min_samples_split`
samples.
**min_samples_split** : int, optional
The minimum number of samples to split an internal node.
**min_samples_leaf** : int, optional
The minimum number of samples in a leaf.
**min_impurity_decrease** : float, optional
A split will be introduced only if the impurity decrease is larger than or
equal to this value.
**criterion** : {"entropy", "gini"}, optional
The criterion used to evaluate the utility of a split.
**intervals** : {"fixed", "sample", "random"}, optional
- if "fixed", `n_intervals` non-overlapping intervals.
- if "sample", `n_intervals * sample_size` non-overlapping intervals.
- if "random", `n_intervals` possibly overlapping intervals of randomly
sampled in `[min_size * n_timestep, max_size * n_timestep]`.
**sample_size** : float, optional
The fraction of intervals to sample at each node. Ignored unless
`intervals="sample"`.
**min_size** : float, optional
The minimum interval size if `intervals="random"`. Ignored if
`coverage_probability` is set.
**max_size** : float, optional
The maximum interval size if `intervals="random"`. Ignored if
`coverage_probability` is set.
**coverage_probability** : float, optional
The probability that a time step is covered by an interval, in the
range 0 < coverage_probability <= 1.
- For larger `coverage_probability`, we get larger intervals.
- For smaller `coverage_probability`, we get shorter intervals.
**variability** : float, optional
Controls the shape of the Beta distribution used to
sample shapelets. Defaults to 1.
- Higher `variability` creates more uniform intervals.
- Lower `variability` creates more variable intervals sizes.
**summarizer** : str or list, optional
The method to summarize each interval.
- if str, the summarizer is determined by `_SUMMARIZERS.keys()`.
- if list, the summarizer is a list of functions `f(x) -> float`, where
`x` is a numpy array.
The default summarizer summarizes each interval as its mean, variance
and slope.
**class_weight** : dict or "balanced", optional
Weights associated with the labels
- if dict, weights on the form {label: weight}
- if "balanced" each class weight inversely proportional to the class
frequency
- if None, each class has equal weight.
**random_state** : int or RandomState, optional
- If `int`, `random_state` is the seed used by the random number generator
- If `RandomState` instance, `random_state` is the random number generator
- If `None`, the random number generator is the `RandomState` instance used
by `np.random`.
:Attributes:
**tree_** : Tree
The internal tree structure.
..
!! processed by numpydoc !!
.. py:method:: apply(x, check_input=True)
Return the index of the leaf that each sample is predicted by.
:Parameters:
**x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
**check_input** : bool, optional
Bypass array validation. Only set to True if you are sure your data
is valid.
:Returns:
ndarray of shape (n_samples, )
For every sample, return the index of the leaf that the sample
ends up in. The index is in the range [0; node_count].
.. rubric:: Examples
Get the leaf probability distribution of a prediction:
>>> from wildboar.datasets import load_gun_point
>>> from wildboar.tree import ShapeletTreeClassifier
>>> X, y = load_gun_point()
>>> tree = ShapeletTreeClassifier()
>>> tree.fit(X, y)
>>> leaves = tree.apply(X)
>>> tree.tree_.value.take(leaves, axis=0)
array([[0., 1.],
[0., 1.],
[1., 0.]])
This is equvivalent to using `tree.predict_proba`.
..
!! processed by numpydoc !!
.. py:method:: decision_path(x, check_input=True)
Compute the decision path of the tree.
:Parameters:
**x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
**check_input** : bool, optional
Bypass array validation. Only set to True if you are sure your data
is valid.
:Returns:
sparse matrix of shape (n_samples, n_nodes)
An indicator array where each nonzero values indicate that the sample
traverses a node.
..
!! processed by numpydoc !!
.. py:method:: fit(x, y, sample_weight=None, check_input=True)
Fit a classification tree.
:Parameters:
**x** : array-like of shape (n_samples, n_timesteps)
The training time series.
**y** : array-like of shape (n_samples,)
The target values.
**sample_weight** : array-like of shape (n_samples,), optional
If `None`, then samples are equally weighted. Splits that would create child
nodes with net zero or negative weight are ignored while searching for a
split in each node. Splits are also ignored if they would result in any
single class carrying a negative weight in either child node.
**check_input** : bool, optional
Allow to bypass several input checks.
:Returns:
self
This instance.
..
!! processed by numpydoc !!
.. py:method:: get_metadata_routing()
Get metadata routing of this object.
Please check :ref:`User Guide ` on how the routing
mechanism works.
:Returns:
**routing** : MetadataRequest
A :class:`~sklearn.utils.metadata_routing.MetadataRequest` encapsulating
routing information.
..
!! processed by numpydoc !!
.. py:method:: get_params(deep=True)
Get parameters for this estimator.
:Parameters:
**deep** : bool, default=True
If True, will return the parameters for this estimator and
contained subobjects that are estimators.
:Returns:
**params** : dict
Parameter names mapped to their values.
..
!! processed by numpydoc !!
.. py:method:: predict(x, check_input=True)
Predict the regression of the input samples x.
:Parameters:
**x** : array-like of shape (n_samples, n_timesteps)
The input time series.
**check_input** : bool, optional
Allow to bypass several input checking. Don't use this parameter unless you
know what you do.
:Returns:
ndarray of shape (n_samples,)
The predicted classes.
..
!! processed by numpydoc !!
.. py:method:: predict_proba(x, check_input=True)
Predict class probabilities of the input samples X.
The predicted class probability is the fraction of samples of the same
class in a leaf.
:Parameters:
**x** : array-like of shape (n_samples, n_timesteps)
The input time series.
**check_input** : bool, optional
Allow to bypass several input checking. Don't use this parameter unless you
know what you do.
:Returns:
ndarray of shape (n_samples, n_classes)
The class probabilities of the input samples. The order of the classes
corresponds to that in the attribute `classes_`.
..
!! processed by numpydoc !!
.. py:method:: score(X, y, sample_weight=None)
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy
which is a harsh metric since you require for each sample that
each label set be correctly predicted.
:Parameters:
**X** : array-like of shape (n_samples, n_features)
Test samples.
**y** : array-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for `X`.
**sample_weight** : array-like of shape (n_samples,), default=None
Sample weights.
:Returns:
**score** : float
Mean accuracy of ``self.predict(X)`` w.r.t. `y`.
..
!! processed by numpydoc !!
.. py:method:: set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as :class:`~sklearn.pipeline.Pipeline`). The latter have
parameters of the form ``__`` so that it's
possible to update each component of a nested object.
:Parameters:
**\*\*params** : dict
Estimator parameters.
:Returns:
**self** : estimator instance
Estimator instance.
..
!! processed by numpydoc !!
.. py:class:: IntervalTreeRegressor(n_intervals='sqrt', *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='squared_error', intervals='fixed', sample_size=None, min_size=0.0, max_size=1.0, coverage_probability=None, variability=1, summarizer='mean_var_slope', random_state=None)
An interval based tree regressor.
:Parameters:
**n_intervals** : {"log", "sqrt"}, int or float, optional
The number of intervals to partition the time series into.
- if "log", the number of intervals is `log2(n_timestep)`.
- if "sqrt", the number of intervals is `sqrt(n_timestep)`.
- if int, the number of intervals is `n_intervals`.
- if float, the number of intervals is `n_intervals * n_timestep`, with
`0 < n_intervals < 1`.
**max_depth** : int, optional
The maximum depth of the tree. If `None` the tree is expanded until all
leaves are pure or until all leaves contain less than `min_samples_split`
samples.
**min_samples_split** : int, optional
The minimum number of samples to split an internal node.
**min_samples_leaf** : int, optional
The minimum number of samples in a leaf.
**min_impurity_decrease** : float, optional
A split will be introduced only if the impurity decrease is larger than or
equal to this value.
**criterion** : {"entropy", "gini"}, optional
The criterion used to evaluate the utility of a split.
**intervals** : {"fixed", "sample", "random"}, optional
- if "fixed", `n_intervals` non-overlapping intervals.
- if "sample", `n_intervals * sample_size` non-overlapping intervals.
- if "random", `n_intervals` possibly overlapping intervals of randomly
sampled in `[min_size * n_timestep, max_size * n_timestep]`.
**sample_size** : float, optional
The fraction of intervals to sample at each node. Ignored unless
`intervals="sample"`.
**min_size** : float, optional
The minimum interval size if `intervals="random"`. Ignored if
`coverage_probability` is set.
**max_size** : float, optional
The maximum interval size if `intervals="random"`. Ignored if
`coverage_probability` is set.
**coverage_probability** : float, optional
The probability that a time step is covered by an interval, in the range
0 < coverage_probability <= 1.
- For larger `coverage_probability`, we get larger intervals.
- For smaller `coverage_probability`, we get shorter intervals.
**variability** : float, optional
Controls the shape of the Beta distribution used to
sample shapelets. Defaults to 1.
- Higher `variability` creates more uniform intervals.
- Lower `variability` creates more variable intervals sizes.
**summarizer** : str or list, optional
The method to summarize each interval.
- if str, the summarizer is determined by `_SUMMARIZERS.keys()`.
- if list, the summarizer is a list of functions `f(x) -> float`, where
`x` is a numpy array.
The default summarizer summarizes each interval as its mean, variance
and slope.
**random_state** : int or RandomState, optional
- If `int`, `random_state` is the seed used by the random number generator
- If `RandomState` instance, `random_state` is the random number generator
- If `None`, the random number generator is the `RandomState` instance used
by `np.random`.
:Attributes:
**tree_** : Tree
The internal tree structure.
..
!! processed by numpydoc !!
.. py:method:: apply(x, check_input=True)
Return the index of the leaf that each sample is predicted by.
:Parameters:
**x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
**check_input** : bool, optional
Bypass array validation. Only set to True if you are sure your data
is valid.
:Returns:
ndarray of shape (n_samples, )
For every sample, return the index of the leaf that the sample
ends up in. The index is in the range [0; node_count].
.. rubric:: Examples
Get the leaf probability distribution of a prediction:
>>> from wildboar.datasets import load_gun_point
>>> from wildboar.tree import ShapeletTreeClassifier
>>> X, y = load_gun_point()
>>> tree = ShapeletTreeClassifier()
>>> tree.fit(X, y)
>>> leaves = tree.apply(X)
>>> tree.tree_.value.take(leaves, axis=0)
array([[0., 1.],
[0., 1.],
[1., 0.]])
This is equvivalent to using `tree.predict_proba`.
..
!! processed by numpydoc !!
.. py:method:: decision_path(x, check_input=True)
Compute the decision path of the tree.
:Parameters:
**x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
**check_input** : bool, optional
Bypass array validation. Only set to True if you are sure your data
is valid.
:Returns:
sparse matrix of shape (n_samples, n_nodes)
An indicator array where each nonzero values indicate that the sample
traverses a node.
..
!! processed by numpydoc !!
.. py:method:: fit(x, y, sample_weight=None, check_input=True)
Fit the estimator.
:Parameters:
**x** : array-like of shape (n_samples, n_timesteps)
The training time series.
**y** : array-like of shape (n_samples,)
Target values as floating point values.
**sample_weight** : array-like of shape (n_samples,), optional
If `None`, then samples are equally weighted. Splits that would create child
nodes with net zero or negative weight are ignored while searching for a
split in each node. Splits are also ignored if they would result in any
single class carrying a negative weight in either child node.
**check_input** : bool, optional
Allow to bypass several input checks.
:Returns:
self
This object.
..
!! processed by numpydoc !!
.. py:method:: get_metadata_routing()
Get metadata routing of this object.
Please check :ref:`User Guide ` on how the routing
mechanism works.
:Returns:
**routing** : MetadataRequest
A :class:`~sklearn.utils.metadata_routing.MetadataRequest` encapsulating
routing information.
..
!! processed by numpydoc !!
.. py:method:: get_params(deep=True)
Get parameters for this estimator.
:Parameters:
**deep** : bool, default=True
If True, will return the parameters for this estimator and
contained subobjects that are estimators.
:Returns:
**params** : dict
Parameter names mapped to their values.
..
!! processed by numpydoc !!
.. py:method:: predict(x, check_input=True)
Predict the value of x.
:Parameters:
**x** : array-like of shape (n_samples, n_timesteps)
The input time series.
**check_input** : bool, optional
Allow to bypass several input checking. Don't use this parameter unless you
know what you do.
:Returns:
ndarray of shape (n_samples,)
The predicted classes.
..
!! processed by numpydoc !!
.. py:method:: score(X, y, sample_weight=None)
Return the coefficient of determination of the prediction.
The coefficient of determination :math:`R^2` is defined as
:math:`(1 - \frac{u}{v})`, where :math:`u` is the residual
sum of squares ``((y_true - y_pred)** 2).sum()`` and :math:`v`
is the total sum of squares ``((y_true - y_true.mean()) ** 2).sum()``.
The best possible score is 1.0 and it can be negative (because the
model can be arbitrarily worse). A constant model that always predicts
the expected value of `y`, disregarding the input features, would get
a :math:`R^2` score of 0.0.
:Parameters:
**X** : array-like of shape (n_samples, n_features)
Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
``(n_samples, n_samples_fitted)``, where ``n_samples_fitted``
is the number of samples used in the fitting for the estimator.
**y** : array-like of shape (n_samples,) or (n_samples, n_outputs)
True values for `X`.
**sample_weight** : array-like of shape (n_samples,), default=None
Sample weights.
:Returns:
**score** : float
:math:`R^2` of ``self.predict(X)`` w.r.t. `y`.
.. rubric:: Notes
The :math:`R^2` score used when calling ``score`` on a regressor uses
``multioutput='uniform_average'`` from version 0.23 to keep consistent
with default value of :func:`~sklearn.metrics.r2_score`.
This influences the ``score`` method of all the multioutput
regressors (except for
:class:`~sklearn.multioutput.MultiOutputRegressor`).
..
!! processed by numpydoc !!
.. py:method:: set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as :class:`~sklearn.pipeline.Pipeline`). The latter have
parameters of the form ``__`` so that it's
possible to update each component of a nested object.
:Parameters:
**\*\*params** : dict
Estimator parameters.
:Returns:
**self** : estimator instance
Estimator instance.
..
!! processed by numpydoc !!
.. py:class:: PivotTreeClassifier(n_pivot='sqrt', *, metrics='all', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, impurity_equality_tolerance=None, criterion='entropy', class_weight=None, random_state=None)
A tree classifier that uses pivot time series.
:Parameters:
**n_pivot** : str or int, optional
The number of pivot time series to sample at each node.
**metrics** : str, optional
The metrics to sample from. Currently, we only support "all".
**max_depth** : int, optional
The maximum depth of the tree. If `None` the tree is expanded until all
leaves are pure or until all leaves contain less than `min_samples_split`
samples.
**min_samples_split** : int, optional
The minimum number of samples to split an internal node.
**min_samples_leaf** : int, optional
The minimum number of samples in a leaf.
**min_impurity_decrease** : float, optional
A split will be introduced only if the impurity decrease is larger than or
equal to this value.
**impurity_equality_tolerance** : float, optional
Tolerance for considering two impurities as equal. If the impurity decrease
is the same, we consider the split that maximizes the gap between the sum
of distances.
- If None, we never consider the separation gap.
.. versionadded:: 1.3
**criterion** : {"entropy", "gini"}, optional
The criterion used to evaluate the utility of a split.
**class_weight** : dict or "balanced", optional
Weights associated with the labels.
- if dict, weights on the form {label: weight}.
- if "balanced" each class weight inversely proportional to the class
frequency.
- if None, each class has equal weight.
**random_state** : int or RandomState
- If `int`, `random_state` is the seed used by the random number generator
- If `RandomState` instance, `random_state` is the random number generator
- If `None`, the random number generator is the `RandomState` instance used
by `np.random`.
:Attributes:
**tree_** : Tree
The internal tree representation
..
!! processed by numpydoc !!
.. py:method:: apply(x, check_input=True)
Return the index of the leaf that each sample is predicted by.
:Parameters:
**x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
**check_input** : bool, optional
Bypass array validation. Only set to True if you are sure your data
is valid.
:Returns:
ndarray of shape (n_samples, )
For every sample, return the index of the leaf that the sample
ends up in. The index is in the range [0; node_count].
.. rubric:: Examples
Get the leaf probability distribution of a prediction:
>>> from wildboar.datasets import load_gun_point
>>> from wildboar.tree import ShapeletTreeClassifier
>>> X, y = load_gun_point()
>>> tree = ShapeletTreeClassifier()
>>> tree.fit(X, y)
>>> leaves = tree.apply(X)
>>> tree.tree_.value.take(leaves, axis=0)
array([[0., 1.],
[0., 1.],
[1., 0.]])
This is equvivalent to using `tree.predict_proba`.
..
!! processed by numpydoc !!
.. py:method:: decision_path(x, check_input=True)
Compute the decision path of the tree.
:Parameters:
**x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
**check_input** : bool, optional
Bypass array validation. Only set to True if you are sure your data
is valid.
:Returns:
sparse matrix of shape (n_samples, n_nodes)
An indicator array where each nonzero values indicate that the sample
traverses a node.
..
!! processed by numpydoc !!
.. py:method:: fit(x, y, sample_weight=None, check_input=True)
Fit a classification tree.
:Parameters:
**x** : array-like of shape (n_samples, n_timesteps)
The training time series.
**y** : array-like of shape (n_samples,)
The target values.
**sample_weight** : array-like of shape (n_samples,), optional
If `None`, then samples are equally weighted. Splits that would create child
nodes with net zero or negative weight are ignored while searching for a
split in each node. Splits are also ignored if they would result in any
single class carrying a negative weight in either child node.
**check_input** : bool, optional
Allow to bypass several input checks.
:Returns:
self
This instance.
..
!! processed by numpydoc !!
.. py:method:: get_metadata_routing()
Get metadata routing of this object.
Please check :ref:`User Guide ` on how the routing
mechanism works.
:Returns:
**routing** : MetadataRequest
A :class:`~sklearn.utils.metadata_routing.MetadataRequest` encapsulating
routing information.
..
!! processed by numpydoc !!
.. py:method:: get_params(deep=True)
Get parameters for this estimator.
:Parameters:
**deep** : bool, default=True
If True, will return the parameters for this estimator and
contained subobjects that are estimators.
:Returns:
**params** : dict
Parameter names mapped to their values.
..
!! processed by numpydoc !!
.. py:method:: predict(x, check_input=True)
Predict the regression of the input samples x.
:Parameters:
**x** : array-like of shape (n_samples, n_timesteps)
The input time series.
**check_input** : bool, optional
Allow to bypass several input checking. Don't use this parameter unless you
know what you do.
:Returns:
ndarray of shape (n_samples,)
The predicted classes.
..
!! processed by numpydoc !!
.. py:method:: predict_proba(x, check_input=True)
Predict class probabilities of the input samples X.
The predicted class probability is the fraction of samples of the same
class in a leaf.
:Parameters:
**x** : array-like of shape (n_samples, n_timesteps)
The input time series.
**check_input** : bool, optional
Allow to bypass several input checking. Don't use this parameter unless you
know what you do.
:Returns:
ndarray of shape (n_samples, n_classes)
The class probabilities of the input samples. The order of the classes
corresponds to that in the attribute `classes_`.
..
!! processed by numpydoc !!
.. py:method:: score(X, y, sample_weight=None)
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy
which is a harsh metric since you require for each sample that
each label set be correctly predicted.
:Parameters:
**X** : array-like of shape (n_samples, n_features)
Test samples.
**y** : array-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for `X`.
**sample_weight** : array-like of shape (n_samples,), default=None
Sample weights.
:Returns:
**score** : float
Mean accuracy of ``self.predict(X)`` w.r.t. `y`.
..
!! processed by numpydoc !!
.. py:method:: set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as :class:`~sklearn.pipeline.Pipeline`). The latter have
parameters of the form ``__`` so that it's
possible to update each component of a nested object.
:Parameters:
**\*\*params** : dict
Estimator parameters.
:Returns:
**self** : estimator instance
Estimator instance.
..
!! processed by numpydoc !!
.. py:class:: ProximityTreeClassifier(n_pivot=1, *, criterion='entropy', pivot_sample='label', metric_sample='weighted', metric='auto', metric_params=None, metric_factories=None, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, class_weight=None, random_state=None)
A classifier that uses a k-branching tree based on pivot-time series.
:Parameters:
**n_pivot** : int, optional
The number of pivots to sample at each node.
**criterion** : {"entropy", "gini"}, optional
The impurity criterion.
**pivot_sample** : {"label", "uniform"}, optional
The pivot sampling method.
**metric_sample** : {"uniform", "weighted"}, optional
The metric sampling method.
**metric** : {"auto"}, str or list, optional
The distance metrics. By default, we use the parameterization suggested by
Lucas et.al (2019).
- If "auto", use the default metric specification, suggested by
(Lucas et. al, 2020).
- If str, use a single metric or default metric specification.
- If list, custom metric specification can be given as a list of
tuples, where the first element of the tuple is a metric name and the
second element a dictionary with a parameter grid specification. A
parameter grid specification is a `dict` with two mandatory and one
optional key-value pairs defining the lower and upper bound on the
values as well as the number of values in the grid. For example, to
specifiy a grid over the argument 'r' with 10 values in the range 0
to 1, we would give the following specification:
`dict(min_r=0, max_r=1, num_r=10)`.
Read more about the metrics and their parameters in the
:ref:`User guide `.
**metric_params** : dict, optional
Parameters for the distance measure. Ignored unless metric is a string.
Read more about the parameters in the :ref:`User guide
`.
**metric_factories** : dict, optional
A metric specification.
.. deprecated:: 1.2
Use the combination of metric and metric params.
**max_depth** : int, optional
The maximum tree depth.
**min_samples_split** : int, optional
The minimum number of samples to consider a split.
**min_samples_leaf** : int, optional
The minimum number of samples in a leaf.
**min_impurity_decrease** : float, optional
The minimum impurity decrease to build a sub-tree.
**class_weight** : dict or "balanced", optional
Weights associated with the labels.
- if dict, weights on the form {label: weight}.
- if "balanced" each class weight inversely proportional to the class
frequency.
- if None, each class has equal weight.
**random_state** : int or RandomState
- If `int`, `random_state` is the seed used by the random number generator
- If `RandomState` instance, `random_state` is the random number generator
- If `None`, the random number generator is the `RandomState` instance used
by `np.random`.
.. rubric:: References
Lucas, Benjamin, Ahmed Shifaz, Charlotte Pelletier, Lachlan O'Neill, Nayyar Zaidi, Bart Goethals, François Petitjean, and Geoffrey I. Webb. (2019)
Proximity forest: an effective and scalable distance-based classifier for time
series. Data Mining and Knowledge Discovery
.. only:: latex
.. rubric:: Examples
Fit a single proximity tree, with dynamic time warping and move-split-merge metrics.
>>> from wildboar.datasets import load_dataset
>>> from wildboar.tree import ProximityTreeClassifier
>>> x, y = load_dataset("GunPoint")
>>> f = ProximityTreeClassifier(
... n_pivot=10,
... metrics=[
... ("dtw", {"min_r": 0.1, "max_r": 0.25}),
... ("msm", {"min_c": 0.1, "max_c": 100, "num_c": 20})
... ],
... criterion="gini"
... )
>>> f.fit(x, y)
..
!! processed by numpydoc !!
.. py:method:: apply(x, check_input=True)
Return the index of the leaf that each sample is predicted by.
:Parameters:
**x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
**check_input** : bool, optional
Bypass array validation. Only set to True if you are sure your data
is valid.
:Returns:
ndarray of shape (n_samples, )
For every sample, return the index of the leaf that the sample
ends up in. The index is in the range [0; node_count].
.. rubric:: Examples
Get the leaf probability distribution of a prediction:
>>> from wildboar.datasets import load_gun_point
>>> from wildboar.tree import ShapeletTreeClassifier
>>> X, y = load_gun_point()
>>> tree = ShapeletTreeClassifier()
>>> tree.fit(X, y)
>>> leaves = tree.apply(X)
>>> tree.tree_.value.take(leaves, axis=0)
array([[0., 1.],
[0., 1.],
[1., 0.]])
This is equvivalent to using `tree.predict_proba`.
..
!! processed by numpydoc !!
.. py:method:: decision_path(x, check_input=True)
Compute the decision path of the tree.
:Parameters:
**x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
**check_input** : bool, optional
Bypass array validation. Only set to True if you are sure your data
is valid.
:Returns:
sparse matrix of shape (n_samples, n_nodes)
An indicator array where each nonzero values indicate that the sample
traverses a node.
..
!! processed by numpydoc !!
.. py:method:: fit(x, y, sample_weight=None, check_input=True)
Fit a classification tree.
:Parameters:
**x** : array-like of shape (n_samples, n_timesteps)
The training time series.
**y** : array-like of shape (n_samples,)
The target values.
**sample_weight** : array-like of shape (n_samples,), optional
If `None`, then samples are equally weighted. Splits that would create child
nodes with net zero or negative weight are ignored while searching for a
split in each node. Splits are also ignored if they would result in any
single class carrying a negative weight in either child node.
**check_input** : bool, optional
Allow to bypass several input checks.
:Returns:
self
This instance.
..
!! processed by numpydoc !!
.. py:method:: get_metadata_routing()
Get metadata routing of this object.
Please check :ref:`User Guide ` on how the routing
mechanism works.
:Returns:
**routing** : MetadataRequest
A :class:`~sklearn.utils.metadata_routing.MetadataRequest` encapsulating
routing information.
..
!! processed by numpydoc !!
.. py:method:: get_params(deep=True)
Get parameters for this estimator.
:Parameters:
**deep** : bool, default=True
If True, will return the parameters for this estimator and
contained subobjects that are estimators.
:Returns:
**params** : dict
Parameter names mapped to their values.
..
!! processed by numpydoc !!
.. py:method:: predict(x, check_input=True)
Predict the regression of the input samples x.
:Parameters:
**x** : array-like of shape (n_samples, n_timesteps)
The input time series.
**check_input** : bool, optional
Allow to bypass several input checking. Don't use this parameter unless you
know what you do.
:Returns:
ndarray of shape (n_samples,)
The predicted classes.
..
!! processed by numpydoc !!
.. py:method:: predict_proba(x, check_input=True)
Predict class probabilities of the input samples X.
The predicted class probability is the fraction of samples of the same
class in a leaf.
:Parameters:
**x** : array-like of shape (n_samples, n_timesteps)
The input time series.
**check_input** : bool, optional
Allow to bypass several input checking. Don't use this parameter unless you
know what you do.
:Returns:
ndarray of shape (n_samples, n_classes)
The class probabilities of the input samples. The order of the classes
corresponds to that in the attribute `classes_`.
..
!! processed by numpydoc !!
.. py:method:: score(X, y, sample_weight=None)
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy
which is a harsh metric since you require for each sample that
each label set be correctly predicted.
:Parameters:
**X** : array-like of shape (n_samples, n_features)
Test samples.
**y** : array-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for `X`.
**sample_weight** : array-like of shape (n_samples,), default=None
Sample weights.
:Returns:
**score** : float
Mean accuracy of ``self.predict(X)`` w.r.t. `y`.
..
!! processed by numpydoc !!
.. py:method:: set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as :class:`~sklearn.pipeline.Pipeline`). The latter have
parameters of the form ``__`` so that it's
possible to update each component of a nested object.
:Parameters:
**\*\*params** : dict
Estimator parameters.
:Returns:
**self** : estimator instance
Estimator instance.
..
!! processed by numpydoc !!
.. py:class:: RocketTreeClassifier(n_kernels=10, *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='entropy', sampling='normal', sampling_params=None, kernel_size=None, min_size=None, max_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, class_weight=None, random_state=None)
A tree classifier that uses random convolutions as features.
:Attributes:
**tree_** : Tree
The internal tree representation.
..
!! processed by numpydoc !!
.. py:method:: apply(x, check_input=True)
Return the index of the leaf that each sample is predicted by.
:Parameters:
**x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
**check_input** : bool, optional
Bypass array validation. Only set to True if you are sure your data
is valid.
:Returns:
ndarray of shape (n_samples, )
For every sample, return the index of the leaf that the sample
ends up in. The index is in the range [0; node_count].
.. rubric:: Examples
Get the leaf probability distribution of a prediction:
>>> from wildboar.datasets import load_gun_point
>>> from wildboar.tree import ShapeletTreeClassifier
>>> X, y = load_gun_point()
>>> tree = ShapeletTreeClassifier()
>>> tree.fit(X, y)
>>> leaves = tree.apply(X)
>>> tree.tree_.value.take(leaves, axis=0)
array([[0., 1.],
[0., 1.],
[1., 0.]])
This is equvivalent to using `tree.predict_proba`.
..
!! processed by numpydoc !!
.. py:method:: decision_path(x, check_input=True)
Compute the decision path of the tree.
:Parameters:
**x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
**check_input** : bool, optional
Bypass array validation. Only set to True if you are sure your data
is valid.
:Returns:
sparse matrix of shape (n_samples, n_nodes)
An indicator array where each nonzero values indicate that the sample
traverses a node.
..
!! processed by numpydoc !!
.. py:method:: fit(x, y, sample_weight=None, check_input=True)
Fit a classification tree.
:Parameters:
**x** : array-like of shape (n_samples, n_timesteps)
The training time series.
**y** : array-like of shape (n_samples,)
The target values.
**sample_weight** : array-like of shape (n_samples,), optional
If `None`, then samples are equally weighted. Splits that would create child
nodes with net zero or negative weight are ignored while searching for a
split in each node. Splits are also ignored if they would result in any
single class carrying a negative weight in either child node.
**check_input** : bool, optional
Allow to bypass several input checks.
:Returns:
self
This instance.
..
!! processed by numpydoc !!
.. py:method:: get_metadata_routing()
Get metadata routing of this object.
Please check :ref:`User Guide ` on how the routing
mechanism works.
:Returns:
**routing** : MetadataRequest
A :class:`~sklearn.utils.metadata_routing.MetadataRequest` encapsulating
routing information.
..
!! processed by numpydoc !!
.. py:method:: get_params(deep=True)
Get parameters for this estimator.
:Parameters:
**deep** : bool, default=True
If True, will return the parameters for this estimator and
contained subobjects that are estimators.
:Returns:
**params** : dict
Parameter names mapped to their values.
..
!! processed by numpydoc !!
.. py:method:: predict(x, check_input=True)
Predict the regression of the input samples x.
:Parameters:
**x** : array-like of shape (n_samples, n_timesteps)
The input time series.
**check_input** : bool, optional
Allow to bypass several input checking. Don't use this parameter unless you
know what you do.
:Returns:
ndarray of shape (n_samples,)
The predicted classes.
..
!! processed by numpydoc !!
.. py:method:: predict_proba(x, check_input=True)
Predict class probabilities of the input samples X.
The predicted class probability is the fraction of samples of the same
class in a leaf.
:Parameters:
**x** : array-like of shape (n_samples, n_timesteps)
The input time series.
**check_input** : bool, optional
Allow to bypass several input checking. Don't use this parameter unless you
know what you do.
:Returns:
ndarray of shape (n_samples, n_classes)
The class probabilities of the input samples. The order of the classes
corresponds to that in the attribute `classes_`.
..
!! processed by numpydoc !!
.. py:method:: score(X, y, sample_weight=None)
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy
which is a harsh metric since you require for each sample that
each label set be correctly predicted.
:Parameters:
**X** : array-like of shape (n_samples, n_features)
Test samples.
**y** : array-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for `X`.
**sample_weight** : array-like of shape (n_samples,), default=None
Sample weights.
:Returns:
**score** : float
Mean accuracy of ``self.predict(X)`` w.r.t. `y`.
..
!! processed by numpydoc !!
.. py:method:: set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as :class:`~sklearn.pipeline.Pipeline`). The latter have
parameters of the form ``__`` so that it's
possible to update each component of a nested object.
:Parameters:
**\*\*params** : dict
Estimator parameters.
:Returns:
**self** : estimator instance
Estimator instance.
..
!! processed by numpydoc !!
.. py:class:: RocketTreeRegressor(n_kernels=10, *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='squared_error', sampling='normal', sampling_params=None, kernel_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, random_state=None)
A tree regressor that uses random convolutions as features.
:Attributes:
**tree_** : Tree
The internal tree representation.
..
!! processed by numpydoc !!
.. py:method:: apply(x, check_input=True)
Return the index of the leaf that each sample is predicted by.
:Parameters:
**x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
**check_input** : bool, optional
Bypass array validation. Only set to True if you are sure your data
is valid.
:Returns:
ndarray of shape (n_samples, )
For every sample, return the index of the leaf that the sample
ends up in. The index is in the range [0; node_count].
.. rubric:: Examples
Get the leaf probability distribution of a prediction:
>>> from wildboar.datasets import load_gun_point
>>> from wildboar.tree import ShapeletTreeClassifier
>>> X, y = load_gun_point()
>>> tree = ShapeletTreeClassifier()
>>> tree.fit(X, y)
>>> leaves = tree.apply(X)
>>> tree.tree_.value.take(leaves, axis=0)
array([[0., 1.],
[0., 1.],
[1., 0.]])
This is equvivalent to using `tree.predict_proba`.
..
!! processed by numpydoc !!
.. py:method:: decision_path(x, check_input=True)
Compute the decision path of the tree.
:Parameters:
**x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
**check_input** : bool, optional
Bypass array validation. Only set to True if you are sure your data
is valid.
:Returns:
sparse matrix of shape (n_samples, n_nodes)
An indicator array where each nonzero values indicate that the sample
traverses a node.
..
!! processed by numpydoc !!
.. py:method:: fit(x, y, sample_weight=None, check_input=True)
Fit the estimator.
:Parameters:
**x** : array-like of shape (n_samples, n_timesteps)
The training time series.
**y** : array-like of shape (n_samples,)
Target values as floating point values.
**sample_weight** : array-like of shape (n_samples,), optional
If `None`, then samples are equally weighted. Splits that would create child
nodes with net zero or negative weight are ignored while searching for a
split in each node. Splits are also ignored if they would result in any
single class carrying a negative weight in either child node.
**check_input** : bool, optional
Allow to bypass several input checks.
:Returns:
self
This object.
..
!! processed by numpydoc !!
.. py:method:: get_metadata_routing()
Get metadata routing of this object.
Please check :ref:`User Guide ` on how the routing
mechanism works.
:Returns:
**routing** : MetadataRequest
A :class:`~sklearn.utils.metadata_routing.MetadataRequest` encapsulating
routing information.
..
!! processed by numpydoc !!
.. py:method:: get_params(deep=True)
Get parameters for this estimator.
:Parameters:
**deep** : bool, default=True
If True, will return the parameters for this estimator and
contained subobjects that are estimators.
:Returns:
**params** : dict
Parameter names mapped to their values.
..
!! processed by numpydoc !!
.. py:method:: predict(x, check_input=True)
Predict the value of x.
:Parameters:
**x** : array-like of shape (n_samples, n_timesteps)
The input time series.
**check_input** : bool, optional
Allow to bypass several input checking. Don't use this parameter unless you
know what you do.
:Returns:
ndarray of shape (n_samples,)
The predicted classes.
..
!! processed by numpydoc !!
.. py:method:: score(X, y, sample_weight=None)
Return the coefficient of determination of the prediction.
The coefficient of determination :math:`R^2` is defined as
:math:`(1 - \frac{u}{v})`, where :math:`u` is the residual
sum of squares ``((y_true - y_pred)** 2).sum()`` and :math:`v`
is the total sum of squares ``((y_true - y_true.mean()) ** 2).sum()``.
The best possible score is 1.0 and it can be negative (because the
model can be arbitrarily worse). A constant model that always predicts
the expected value of `y`, disregarding the input features, would get
a :math:`R^2` score of 0.0.
:Parameters:
**X** : array-like of shape (n_samples, n_features)
Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
``(n_samples, n_samples_fitted)``, where ``n_samples_fitted``
is the number of samples used in the fitting for the estimator.
**y** : array-like of shape (n_samples,) or (n_samples, n_outputs)
True values for `X`.
**sample_weight** : array-like of shape (n_samples,), default=None
Sample weights.
:Returns:
**score** : float
:math:`R^2` of ``self.predict(X)`` w.r.t. `y`.
.. rubric:: Notes
The :math:`R^2` score used when calling ``score`` on a regressor uses
``multioutput='uniform_average'`` from version 0.23 to keep consistent
with default value of :func:`~sklearn.metrics.r2_score`.
This influences the ``score`` method of all the multioutput
regressors (except for
:class:`~sklearn.multioutput.MultiOutputRegressor`).
..
!! processed by numpydoc !!
.. py:method:: set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as :class:`~sklearn.pipeline.Pipeline`). The latter have
parameters of the form ``__`` so that it's
possible to update each component of a nested object.
:Parameters:
**\*\*params** : dict
Estimator parameters.
:Returns:
**self** : estimator instance
Estimator instance.
..
!! processed by numpydoc !!
.. py:class:: ShapeletTreeClassifier(*, n_shapelets='log2', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, impurity_equality_tolerance=None, strategy='warn', shapelet_size=0.1, sample_size=1.0, min_shapelet_size=0.0, max_shapelet_size=1.0, coverage_probability=None, variability=1, alpha=None, metric='euclidean', metric_params=None, criterion='entropy', class_weight=None, random_state=None)
A shapelet tree classifier.
:Parameters:
**n_shapelets** : int or {"log2", "sqrt", "auto"}, optional
The number of shapelets in the resulting transform.
- if, "auto" the number of shapelets depend on the value of `strategy`.
For "best" the number is 1; and for "random" it is 1000.
- if, "log2", the number of shaplets is the log2 of the total possible
number of shapelets.
- if, "sqrt", the number of shaplets is the square root of the total
possible number of shapelets.
**max_depth** : int, optional
The maximum depth of the tree. If `None` the tree is expanded until all
leaves are pure or until all leaves contain less than `min_samples_split`
samples.
**min_samples_split** : int, optional
The minimum number of samples to split an internal node.
**min_samples_leaf** : int, optional
The minimum number of samples in a leaf.
**min_impurity_decrease** : float, optional
A split will be introduced only if the impurity decrease is larger than or
equal to this value.
**impurity_equality_tolerance** : float, optional
Tolerance for considering two impurities as equal. If the impurity decrease
is the same, we consider the split that maximizes the gap between the sum
of distances.
- If None, we never consider the separation gap.
.. versionadded:: 1.3
**strategy** : {"best", "random"}, optional
The strategy for selecting shapelets.
- If "random", `n_shapelets` shapelets are randomly selected in the
range defined by `min_shapelet_size` and `max_shapelet_size`
- If "best", `n_shapelets` shapelets are selected per input sample
of the size determined by `shapelet_size`.
.. versionadded:: 1.3
Add support for the "best" strategy. The default will change to
"best" in 1.4.
**shapelet_size** : int, float or array-like, optional
The shapelet size if `strategy="best"`.
- If int, the exact shapelet size.
- If float, a fraction of the number of input timestep.
- If array-like, a list of float or int.
.. versionadded:: 1.3
**sample_size** : float, optional
The size of the sample to determine the shapelets, if `shapelet_size="best"`.
.. versionadded:: 1.3
**min_shapelet_size** : float, optional
The minimum length of a sampled shapelet expressed as a fraction, computed
as `min(ceil(X.shape[-1] * min_shapelet_size), 2)`.
**max_shapelet_size** : float, optional
The maximum length of a sampled shapelet, expressed as a fraction, computed
as `ceil(X.shape[-1] * max_shapelet_size)`.
**coverage_probability** : float, optional
The probability that a time step is covered by a
shapelet, in the range 0 < coverage_probability <= 1.
- For larger `coverage_probability`, we get larger shapelets.
- For smaller `coverage_probability`, we get shorter shapelets.
**variability** : float, optional
Controls the shape of the Beta distribution used to
sample shapelets. Defaults to 1.
- Higher `variability` creates more uniform intervals.
- Lower `variability` creates more variable intervals sizes.
**alpha** : float, optional
Dynamically decrease the number of sampled shapelets at each node according
to the current depth.
.. math:`w = 1 - e^{-|alpha| * depth})`
- if :math:`alpha < 0`, the number of sampled shapelets decrease from
`n_shapelets` towards 1 with increased depth.
- if :math:`alpha > 0`, the number of sampled shapelets increase from
`1` towards `n_shapelets` with increased depth.
- if `None`, the number of sampled shapelets are the same independent
of depth.
**metric** : str or list, optional
- If `str`, the distance metric used to identify the best
shapelet.
- If `list`, multiple metrics specified as a list of
tuples, where the first element of the tuple is a metric name and
the second element a dictionary with a parameter grid
specification. A parameter grid specification is a dict with two
mandatory and one optional key-value pairs defining the lower and
upper bound on the values and number of values in the grid. For
example, to specify a grid over the argument `r` with 10
values in the range 0 to 1, we would give the following
specification: `dict(min_r=0, max_r=1, num_r=10)`.
Read more about metric specifications in the `User guide
`__.
.. versionchanged:: 1.2
Added support for multi-metric shapelet transform
**metric_params** : dict, optional
Parameters for the distance measure. Ignored unless metric is a string.
Read more about the parameters in the `User guide
`__.
**criterion** : {"entropy", "gini"}, optional
The criterion used to evaluate the utility of a split.
**class_weight** : dict or "balanced", optional
Weights associated with the labels
- if dict, weights on the form {label: weight}
- if "balanced" each class weight inversely proportional to the class
frequency
- if None, each class has equal weight.
**random_state** : int or RandomState
- If `int`, `random_state` is the seed used by the random number generator;
- If `RandomState` instance, `random_state` is the random number generator;
- If `None`, the random number generator is the `RandomState` instance used
by `np.random`.
:Attributes:
**tree_** : Tree
The tree data structure used internally
**classes_** : ndarray of shape (n_classes,)
The class labels
**n_classes_** : int
The number of class labels
.. seealso::
:obj:`ShapeletTreeRegressor`
A shapelet tree regressor.
:obj:`ExtraShapeletTreeClassifier`
An extra random shapelet tree classifier.
.. rubric:: Notes
When `strategy` is set to `"best"`, the shapelet tree is constructed by
selecting the top `n_shapelets` per sample. The initial construction of the
matrix profile for each sample may be computationally intensive for large
datasets. To balance accuracy and computational efficiency, the
`sample_size` parameter can be adjusted to determine the number of samples
utilized to compute the minimum distance annotation.
The significance of shapelets is determined by the difference between the
ab-join of a label with any other label and the self-join of the label,
selecting the shapelets with the greatest absolute values. This method is
detailed in the work of Zhu et al. (2020).
When `strategy` is set to `"random"`, the shapelet tree is constructed by
randomly sampling `n_shapelets` within the range defined by
`min_shapelet_size` and `max_shapelet_size`. This method is detailed in the
work of Karlsson et al. (2016). Alternatively, shapelets can be sampled with
a specified `coverage_probability` and `variability`. By specifying a coverage
probability, we define the probability of including a point in the extracted
shapelet. If `coverage_probability` is set,
`min_shapelet_size` and `max_shapelet_size` are ignored.
.. rubric:: References
Zhu, Y., et al. 2020.
The Swiss army knife of time series data mining: ten useful things you
can do with the matrix profile and ten lines of code. Data Mining and
Knowledge Discovery, 34, pp.949-979.
Karlsson, I., Papapetrou, P. and Boström, H., 2016.
Generalized random shapelet forests. Data mining and knowledge
discovery, 30, pp.1053-1085.
.. only:: latex
..
!! processed by numpydoc !!
.. py:method:: apply(x, check_input=True)
Return the index of the leaf that each sample is predicted by.
:Parameters:
**x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
**check_input** : bool, optional
Bypass array validation. Only set to True if you are sure your data
is valid.
:Returns:
ndarray of shape (n_samples, )
For every sample, return the index of the leaf that the sample
ends up in. The index is in the range [0; node_count].
.. rubric:: Examples
Get the leaf probability distribution of a prediction:
>>> from wildboar.datasets import load_gun_point
>>> from wildboar.tree import ShapeletTreeClassifier
>>> X, y = load_gun_point()
>>> tree = ShapeletTreeClassifier()
>>> tree.fit(X, y)
>>> leaves = tree.apply(X)
>>> tree.tree_.value.take(leaves, axis=0)
array([[0., 1.],
[0., 1.],
[1., 0.]])
This is equvivalent to using `tree.predict_proba`.
..
!! processed by numpydoc !!
.. py:method:: decision_path(x, check_input=True)
Compute the decision path of the tree.
:Parameters:
**x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
**check_input** : bool, optional
Bypass array validation. Only set to True if you are sure your data
is valid.
:Returns:
sparse matrix of shape (n_samples, n_nodes)
An indicator array where each nonzero values indicate that the sample
traverses a node.
..
!! processed by numpydoc !!
.. py:method:: fit(x, y, sample_weight=None, check_input=True)
Fit a classification tree.
:Parameters:
**x** : array-like of shape (n_samples, n_timesteps)
The training time series.
**y** : array-like of shape (n_samples,)
The target values.
**sample_weight** : array-like of shape (n_samples,), optional
If `None`, then samples are equally weighted. Splits that would create child
nodes with net zero or negative weight are ignored while searching for a
split in each node. Splits are also ignored if they would result in any
single class carrying a negative weight in either child node.
**check_input** : bool, optional
Allow to bypass several input checks.
:Returns:
self
This instance.
..
!! processed by numpydoc !!
.. py:method:: get_metadata_routing()
Get metadata routing of this object.
Please check :ref:`User Guide ` on how the routing
mechanism works.
:Returns:
**routing** : MetadataRequest
A :class:`~sklearn.utils.metadata_routing.MetadataRequest` encapsulating
routing information.
..
!! processed by numpydoc !!
.. py:method:: get_params(deep=True)
Get parameters for this estimator.
:Parameters:
**deep** : bool, default=True
If True, will return the parameters for this estimator and
contained subobjects that are estimators.
:Returns:
**params** : dict
Parameter names mapped to their values.
..
!! processed by numpydoc !!
.. py:method:: predict(x, check_input=True)
Predict the regression of the input samples x.
:Parameters:
**x** : array-like of shape (n_samples, n_timesteps)
The input time series.
**check_input** : bool, optional
Allow to bypass several input checking. Don't use this parameter unless you
know what you do.
:Returns:
ndarray of shape (n_samples,)
The predicted classes.
..
!! processed by numpydoc !!
.. py:method:: predict_proba(x, check_input=True)
Predict class probabilities of the input samples X.
The predicted class probability is the fraction of samples of the same
class in a leaf.
:Parameters:
**x** : array-like of shape (n_samples, n_timesteps)
The input time series.
**check_input** : bool, optional
Allow to bypass several input checking. Don't use this parameter unless you
know what you do.
:Returns:
ndarray of shape (n_samples, n_classes)
The class probabilities of the input samples. The order of the classes
corresponds to that in the attribute `classes_`.
..
!! processed by numpydoc !!
.. py:method:: score(X, y, sample_weight=None)
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy
which is a harsh metric since you require for each sample that
each label set be correctly predicted.
:Parameters:
**X** : array-like of shape (n_samples, n_features)
Test samples.
**y** : array-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for `X`.
**sample_weight** : array-like of shape (n_samples,), default=None
Sample weights.
:Returns:
**score** : float
Mean accuracy of ``self.predict(X)`` w.r.t. `y`.
..
!! processed by numpydoc !!
.. py:method:: set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as :class:`~sklearn.pipeline.Pipeline`). The latter have
parameters of the form ``__`` so that it's
possible to update each component of a nested object.
:Parameters:
**\*\*params** : dict
Estimator parameters.
:Returns:
**self** : estimator instance
Estimator instance.
..
!! processed by numpydoc !!
.. py:class:: ShapeletTreeRegressor(*, n_shapelets='log2', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, impurity_equality_tolerance=None, strategy='warn', shapelet_size=0.1, sample_size=1.0, min_shapelet_size=0, max_shapelet_size=1, coverage_probability=None, variability=1, alpha=None, metric='euclidean', metric_params=None, criterion='squared_error', random_state=None)
A shapelet tree regressor.
:Parameters:
**n_shapelets** : int, optional
The number of shapelets to sample at each node.
**max_depth** : int, optional
The maximum depth of the tree. If `None` the tree is
expanded until all leaves are pure or until all leaves contain less
than `min_samples_split` samples.
**min_samples_split** : int, optional
The minimum number of samples to split an internal node.
**min_samples_leaf** : int, optional
The minimum number of samples in a leaf.
**min_impurity_decrease** : float, optional
A split will be introduced only if the impurity decrease is larger
than or equal to this value.
**impurity_equality_tolerance** : float, optional
Tolerance for considering two impurities as equal. If the impurity decrease
is the same, we consider the split that maximizes the gap between the sum
of distances.
- If None, we never consider the separation gap.
.. versionadded:: 1.3
**strategy** : {"best", "random"}, optional
The strategy for selecting shapelets.
- If "random", `n_shapelets` shapelets are randomly selected in the
range defined by `min_shapelet_size` and `max_shapelet_size`
- If "best", `n_shapelets` shapelets are selected per input sample
of the size determined by `shapelet_size`.
.. versionadded:: 1.3
Add support for the "best" strategy. The default will change to
"best" in 1.4.
**shapelet_size** : int, float or array-like, optional
The shapelet size if `strategy="best"`.
- If int, the exact shapelet size.
- If float, a fraction of the number of input timestep.
- If array-like, a list of float or int.
.. versionadded:: 1.3
**sample_size** : float, optional
The size of the sample to determine the shapelets, if `shapelet_size="best"`.
.. versionadded:: 1.3
**min_shapelet_size** : float, optional
The minimum length of a shapelets expressed as a fraction of
*n_timestep*.
**max_shapelet_size** : float, optional
The maximum length of a shapelets expressed as a fraction of
*n_timestep*.
**coverage_probability** : float, optional
The probability that a time step is covered by a
shapelet, in the range 0 < coverage_probability <= 1.
- For larger `coverage_probability`, we get larger shapelets.
- For smaller `coverage_probability`, we get shorter shapelets.
**variability** : float, optional
Controls the shape of the Beta distribution used to
sample shapelets. Defaults to 1.
- Higher `variability` creates more uniform intervals.
- Lower `variability` creates more variable intervals sizes.
**alpha** : float, optional
Dynamically decrease the number of sampled shapelets at each node according
to the current depth, i.e.:
::
w = 1 - exp(-abs(alpha) * depth)
- if `alpha < 0`, the number of sampled shapelets decrease from
`n_shapelets` towards 1 with increased depth.
- if `alpha > 0`, the number of sampled shapelets increase from `1`
towards `n_shapelets` with increased depth.
- if `None`, the number of sampled shapelets are the same
independent of depth.
**metric** : str or list, optional
- If `str`, the distance metric used to identify the best
shapelet.
- If `list`, multiple metrics specified as a list of
tuples, where the first element of the tuple is a metric name and
the second element a dictionary with a parameter grid
specification. A parameter grid specification is a dict with two
mandatory and one optional key-value pairs defining the lower and
upper bound on the values and number of values in the grid. For
example, to specify a grid over the argument `r` with 10
values in the range 0 to 1, we would give the following
specification: `dict(min_r=0, max_r=1, num_r=10)`.
Read more about metric specifications in the `User guide
`__.
.. versionchanged:: 1.2
Added support for multi-metric shapelet transform
**metric_params** : dict, optional
Parameters for the distance measure. Ignored unless metric is a string.
Read more about the parameters in the `User guide
`__.
**criterion** : {"squared_error"}, optional
The criterion used to evaluate the utility of a split.
.. deprecated:: 1.1
Criterion "mse" was deprecated in v1.1 and removed in version 1.2.
**random_state** : int or RandomState
- If `int`, `random_state` is the seed used by the
random number generator
- If :class:`numpy.random.RandomState` instance, `random_state`
is the random number generator
- If `None`, the random number generator is the
:class:`numpy.random.RandomState` instance used by
:func:`numpy.random`.
:Attributes:
**tree_** : Tree
The internal tree representation
.. rubric:: Notes
When `strategy` is set to `"best"`, the shapelet tree is constructed by
selecting the top `n_shapelets` per sample. The initial construction of the
matrix profile for each sample may be computationally intensive for large
datasets. To balance accuracy and computational efficiency, the
`sample_size` parameter can be adjusted to determine the number of samples
utilized to compute the minimum distance annotation.
The significance of shapelets is determined by the difference between the
ab-join of a label with any other label and the self-join of the label,
selecting the shapelets with the greatest absolute values. This method is
detailed in the work of Zhu et al. (2020).
When `strategy` is set to `"random"`, the shapelet tree is constructed by
randomly sampling `n_shapelets` within the range defined by
`min_shapelet_size` and `max_shapelet_size`. This method is detailed in the
work of Karlsson et al. (2016). Alternatively, shapelets can be sampled with
a specified `coverage_probability` and `variability`. By specifying a coverage
probability, we define the probability of including a point in the extracted
shapelet. If `coverage_probability` is set,
`min_shapelet_size` and `max_shapelet_size` are ignored.
.. rubric:: References
Zhu, Y., et al. 2020.
The Swiss army knife of time series data mining: ten useful things you
can do with the matrix profile and ten lines of code. Data Mining and
Knowledge Discovery, 34, pp.949-979.
Karlsson, I., Papapetrou, P. and Boström, H., 2016.
Generalized random shapelet forests. Data mining and knowledge
discovery, 30, pp.1053-1085.
.. only:: latex
..
!! processed by numpydoc !!
.. py:method:: apply(x, check_input=True)
Return the index of the leaf that each sample is predicted by.
:Parameters:
**x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
**check_input** : bool, optional
Bypass array validation. Only set to True if you are sure your data
is valid.
:Returns:
ndarray of shape (n_samples, )
For every sample, return the index of the leaf that the sample
ends up in. The index is in the range [0; node_count].
.. rubric:: Examples
Get the leaf probability distribution of a prediction:
>>> from wildboar.datasets import load_gun_point
>>> from wildboar.tree import ShapeletTreeClassifier
>>> X, y = load_gun_point()
>>> tree = ShapeletTreeClassifier()
>>> tree.fit(X, y)
>>> leaves = tree.apply(X)
>>> tree.tree_.value.take(leaves, axis=0)
array([[0., 1.],
[0., 1.],
[1., 0.]])
This is equvivalent to using `tree.predict_proba`.
..
!! processed by numpydoc !!
.. py:method:: decision_path(x, check_input=True)
Compute the decision path of the tree.
:Parameters:
**x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input samples.
**check_input** : bool, optional
Bypass array validation. Only set to True if you are sure your data
is valid.
:Returns:
sparse matrix of shape (n_samples, n_nodes)
An indicator array where each nonzero values indicate that the sample
traverses a node.
..
!! processed by numpydoc !!
.. py:method:: fit(x, y, sample_weight=None, check_input=True)
Fit the estimator.
:Parameters:
**x** : array-like of shape (n_samples, n_timesteps)
The training time series.
**y** : array-like of shape (n_samples,)
Target values as floating point values.
**sample_weight** : array-like of shape (n_samples,), optional
If `None`, then samples are equally weighted. Splits that would create child
nodes with net zero or negative weight are ignored while searching for a
split in each node. Splits are also ignored if they would result in any
single class carrying a negative weight in either child node.
**check_input** : bool, optional
Allow to bypass several input checks.
:Returns:
self
This object.
..
!! processed by numpydoc !!
.. py:method:: get_metadata_routing()
Get metadata routing of this object.
Please check :ref:`User Guide ` on how the routing
mechanism works.
:Returns:
**routing** : MetadataRequest
A :class:`~sklearn.utils.metadata_routing.MetadataRequest` encapsulating
routing information.
..
!! processed by numpydoc !!
.. py:method:: get_params(deep=True)
Get parameters for this estimator.
:Parameters:
**deep** : bool, default=True
If True, will return the parameters for this estimator and
contained subobjects that are estimators.
:Returns:
**params** : dict
Parameter names mapped to their values.
..
!! processed by numpydoc !!
.. py:method:: predict(x, check_input=True)
Predict the value of x.
:Parameters:
**x** : array-like of shape (n_samples, n_timesteps)
The input time series.
**check_input** : bool, optional
Allow to bypass several input checking. Don't use this parameter unless you
know what you do.
:Returns:
ndarray of shape (n_samples,)
The predicted classes.
..
!! processed by numpydoc !!
.. py:method:: score(X, y, sample_weight=None)
Return the coefficient of determination of the prediction.
The coefficient of determination :math:`R^2` is defined as
:math:`(1 - \frac{u}{v})`, where :math:`u` is the residual
sum of squares ``((y_true - y_pred)** 2).sum()`` and :math:`v`
is the total sum of squares ``((y_true - y_true.mean()) ** 2).sum()``.
The best possible score is 1.0 and it can be negative (because the
model can be arbitrarily worse). A constant model that always predicts
the expected value of `y`, disregarding the input features, would get
a :math:`R^2` score of 0.0.
:Parameters:
**X** : array-like of shape (n_samples, n_features)
Test samples. For some estimators this may be a precomputed
kernel matrix or a list of generic objects instead with shape
``(n_samples, n_samples_fitted)``, where ``n_samples_fitted``
is the number of samples used in the fitting for the estimator.
**y** : array-like of shape (n_samples,) or (n_samples, n_outputs)
True values for `X`.
**sample_weight** : array-like of shape (n_samples,), default=None
Sample weights.
:Returns:
**score** : float
:math:`R^2` of ``self.predict(X)`` w.r.t. `y`.
.. rubric:: Notes
The :math:`R^2` score used when calling ``score`` on a regressor uses
``multioutput='uniform_average'`` from version 0.23 to keep consistent
with default value of :func:`~sklearn.metrics.r2_score`.
This influences the ``score`` method of all the multioutput
regressors (except for
:class:`~sklearn.multioutput.MultiOutputRegressor`).
..
!! processed by numpydoc !!
.. py:method:: set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as :class:`~sklearn.pipeline.Pipeline`). The latter have
parameters of the form ``__`` so that it's
possible to update each component of a nested object.
:Parameters:
**\*\*params** : dict
Estimator parameters.
:Returns:
**self** : estimator instance
Estimator instance.
..
!! processed by numpydoc !!
.. py:function:: plot_tree(clf, *, ax=None, bbox_args=dict(), arrow_args=dict(arrowstyle='<-'), max_depth=None, class_labels=True, fontsize=None, node_labeler=None)
Plot a tree
:Parameters:
**clf** : tree-based estimator
A decision tree.
**ax** : axes, optional
The axes to plot the tree to.
**bbox_args** : dict, optional
Arguments to the node box.
**arrow_args** : dict, optional
Arguments to the arrow.
**max_depth** : int, optional
Only show the branches until `max_depth`.
**class_labels** : bool or array-like, optional
Show the classes
- if True, show classes from the `classes_` attribute of the decision
tree.
- if False, show leaf probabilities.
- if array-like, show classes from the array.
**fontsize** : int, optional
The font size. If `None`, the font size is determined automatically.
**node_labeler** : callable, optional
A function returning the label for a node on the form `f(node) ->
str)`.
- If ``node.children is not None`` the node is a leaf.
- ``node._attr`` contains information about the node:
- ``n_node_samples``: the number of samples reaching the node
- if leaf, ``value`` is an array with the fractions of labels
reaching the leaf (in case of classification); or the mean among
the samples reach the leaf (if regression). Determine if it is a
classification or regression tree by inspecting the shape of the
value array.
- if branch, ``threshold`` contains the threshold used to split the
node.
- if branch, ``dim`` contains the dimension from which the attribute
was extracted.
- if branch, ``attribute`` contains the attribute used for computing
the feature value. The attribute depends on the estimator.
:Returns:
axes
The axes.
.. rubric:: Examples
>>> from wildboar.datasets import load_two_lead_ecg
>>> from wildboar.tree import ShapeletTreeClassifier, plot_tree
>>> X, y = load_two_lead_ecg()
>>> clf = ShapeletTreeClassifier(strategy="random").fit(X, y)
>>> plot_tree(clf)
..
!! processed by numpydoc !!