*********************** :py:mod:`wildboar.tree` *********************** .. py:module:: wildboar.tree .. autoapi-nested-parse:: Tree-based estimators for classification and regression. .. !! processed by numpydoc !! Package Contents ---------------- Classes ------- .. autoapisummary:: wildboar.tree.ExtraShapeletTreeClassifier wildboar.tree.ExtraShapeletTreeRegressor wildboar.tree.IntervalTreeClassifier wildboar.tree.IntervalTreeRegressor wildboar.tree.PivotTreeClassifier wildboar.tree.ProximityTreeClassifier wildboar.tree.RocketTreeClassifier wildboar.tree.RocketTreeRegressor wildboar.tree.ShapeletTreeClassifier wildboar.tree.ShapeletTreeRegressor .. py:class:: ExtraShapeletTreeClassifier(*, n_shapelets=1, max_depth=None, min_samples_leaf=1, min_impurity_decrease=0.0, min_samples_split=2, min_shapelet_size=0.0, max_shapelet_size=1.0, metric='euclidean', metric_params=None, criterion='entropy', class_weight=None, random_state=None) An extra shapelet tree classifier. Extra shapelet trees are constructed by sampling a distance threshold uniformly in the range ``[min(dist), max(dist)]``. :Parameters: **n_shapelets** : int, optional The number of shapelets to sample at each node. **max_depth** : int, optional The maximum depth of the tree. If `None` the tree is expanded until all leaves are pure or until all leaves contain less than `min_samples_split` samples. **min_samples_leaf** : int, optional The minimum number of samples in a leaf. **min_impurity_decrease** : float, optional A split will be introduced only if the impurity decrease is larger than or equal to this value. **min_samples_split** : int, optional The minimum number of samples to split an internal node. **min_shapelet_size** : float, optional The minimum length of a sampled shapelet expressed as a fraction, computed as `min(ceil(X.shape[-1] * min_shapelet_size), 2)`. **max_shapelet_size** : float, optional The maximum length of a sampled shapelet, expressed as a fraction, computed as `ceil(X.shape[-1] * max_shapelet_size)`. **metric** : {"euclidean", "scaled_euclidean", "dtw", "scaled_dtw"}, optional Distance metric used to identify the best shapelet. **metric_params** : dict, optional Parameters for the distance measure. **criterion** : {"entropy", "gini"}, optional The criterion used to evaluate the utility of a split. **class_weight** : dict or "balanced", optional Weights associated with the labels - if dict, weights on the form {label: weight} - if "balanced" each class weight inversely proportional to the class frequency - if None, each class has equal weight. **random_state** : int or RandomState, optional - If `int`, `random_state` is the seed used by the random number generator; - If `RandomState` instance, `random_state` is the random number generator; - If `None`, the random number generator is the `RandomState` instance used by `np.random`. :Attributes: **tree_** : Tree The tree representation .. !! processed by numpydoc !! .. py:method:: apply(x, check_input=True) Return the index of the leaf that each sample is predicted by. :Parameters: **x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep) The input samples. **check_input** : bool, optional Bypass array validation. Only set to True if you are sure your data is valid. :Returns: ndarray of shape (n_samples, ) For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count]. .. rubric:: Examples Get the leaf probability distribution of a prediction: >>> from wildboar.datasets import load_gun_point >>> from wildboar.tree import ShapeletTreeClassifier >>> X, y = load_gun_point() >>> tree = ShapeletTreeClassifier() >>> tree.fit(X, y) >>> leaves = tree.apply(X) >>> tree.tree_.value.take(leaves, axis=0) array([[0., 1.], [0., 1.], [1., 0.]]) This is equvivalent to using `tree.predict_proba`. .. !! processed by numpydoc !! .. py:method:: decision_path(x, check_input=True) Compute the decision path of the tree. :Parameters: **x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep) The input samples. **check_input** : bool, optional Bypass array validation. Only set to True if you are sure your data is valid. :Returns: sparse matrix of shape (n_samples, n_nodes) An indicator array where each nonzero values indicate that the sample traverses a node. .. !! processed by numpydoc !! .. py:method:: fit(x, y, sample_weight=None, check_input=True) Fit a classification tree. :Parameters: **x** : array-like of shape (n_samples, n_timesteps) The training time series. **y** : array-like of shape (n_samples,) The target values. **sample_weight** : array-like of shape (n_samples,), optional If `None`, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node. **check_input** : bool, optional Allow to bypass several input checks. :Returns: self This instance. .. !! processed by numpydoc !! .. py:method:: get_metadata_routing() Get metadata routing of this object. Please check :ref:`User Guide ` on how the routing mechanism works. :Returns: **routing** : MetadataRequest A :class:`~sklearn.utils.metadata_routing.MetadataRequest` encapsulating routing information. .. !! processed by numpydoc !! .. py:method:: get_params(deep=True) Get parameters for this estimator. :Parameters: **deep** : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. :Returns: **params** : dict Parameter names mapped to their values. .. !! processed by numpydoc !! .. py:method:: predict(x, check_input=True) Predict the regression of the input samples x. :Parameters: **x** : array-like of shape (n_samples, n_timesteps) The input time series. **check_input** : bool, optional Allow to bypass several input checking. Don't use this parameter unless you know what you do. :Returns: ndarray of shape (n_samples,) The predicted classes. .. !! processed by numpydoc !! .. py:method:: predict_proba(x, check_input=True) Predict class probabilities of the input samples X. The predicted class probability is the fraction of samples of the same class in a leaf. :Parameters: **x** : array-like of shape (n_samples, n_timesteps) The input time series. **check_input** : bool, optional Allow to bypass several input checking. Don't use this parameter unless you know what you do. :Returns: ndarray of shape (n_samples, n_classes) The class probabilities of the input samples. The order of the classes corresponds to that in the attribute `classes_`. .. !! processed by numpydoc !! .. py:method:: score(X, y, sample_weight=None) Return the mean accuracy on the given test data and labels. In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted. :Parameters: **X** : array-like of shape (n_samples, n_features) Test samples. **y** : array-like of shape (n_samples,) or (n_samples, n_outputs) True labels for `X`. **sample_weight** : array-like of shape (n_samples,), default=None Sample weights. :Returns: **score** : float Mean accuracy of ``self.predict(X)`` w.r.t. `y`. .. !! processed by numpydoc !! .. py:method:: set_params(**params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as :class:`~sklearn.pipeline.Pipeline`). The latter have parameters of the form ``__`` so that it's possible to update each component of a nested object. :Parameters: **\*\*params** : dict Estimator parameters. :Returns: **self** : estimator instance Estimator instance. .. !! processed by numpydoc !! .. py:class:: ExtraShapeletTreeRegressor(*, n_shapelets=1, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, metric='euclidean', metric_params=None, criterion='squared_error', random_state=None) An extra shapelet tree regressor. Extra shapelet trees are constructed by sampling a distance threshold uniformly in the range [min(dist), max(dist)]. :Parameters: **n_shapelets** : int, optional The number of shapelets to sample at each node. **max_depth** : int, optional The maximum depth of the tree. If `None` the tree is expanded until all leaves are pure or until all leaves contain less than `min_samples_split` samples. **min_samples_split** : int, optional The minimum number of samples to split an internal node. **min_samples_leaf** : int, optional The minimum number of samples in a leaf. **criterion** : {"squared_error"}, optional The criterion used to evaluate the utility of a split. .. deprecated:: 1.1 Criterion "mse" was deprecated in v1.1 and removed in version 1.2. **min_impurity_decrease** : float, optional A split will be introduced only if the impurity decrease is larger than or equal to this value. **min_shapelet_size** : float, optional The minimum length of a sampled shapelet expressed as a fraction, computed as `min(ceil(X.shape[-1] * min_shapelet_size), 2)`. **max_shapelet_size** : float, optional The maximum length of a sampled shapelet, expressed as a fraction, computed as `ceil(X.shape[-1] * max_shapelet_size)`. **metric** : {'euclidean', 'scaled_euclidean', 'scaled_dtw'}, optional Distance metric used to identify the best shapelet. **metric_params** : dict, optional Parameters for the distance measure. **random_state** : int or RandomState - If `int`, `random_state` is the seed used by the random number generator; - If `RandomState` instance, `random_state` is the random number generator; - If `None`, the random number generator is the `RandomState` instance used by `np.random`. :Attributes: **tree_** : Tree The internal tree representation .. !! processed by numpydoc !! .. py:method:: apply(x, check_input=True) Return the index of the leaf that each sample is predicted by. :Parameters: **x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep) The input samples. **check_input** : bool, optional Bypass array validation. Only set to True if you are sure your data is valid. :Returns: ndarray of shape (n_samples, ) For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count]. .. rubric:: Examples Get the leaf probability distribution of a prediction: >>> from wildboar.datasets import load_gun_point >>> from wildboar.tree import ShapeletTreeClassifier >>> X, y = load_gun_point() >>> tree = ShapeletTreeClassifier() >>> tree.fit(X, y) >>> leaves = tree.apply(X) >>> tree.tree_.value.take(leaves, axis=0) array([[0., 1.], [0., 1.], [1., 0.]]) This is equvivalent to using `tree.predict_proba`. .. !! processed by numpydoc !! .. py:method:: decision_path(x, check_input=True) Compute the decision path of the tree. :Parameters: **x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep) The input samples. **check_input** : bool, optional Bypass array validation. Only set to True if you are sure your data is valid. :Returns: sparse matrix of shape (n_samples, n_nodes) An indicator array where each nonzero values indicate that the sample traverses a node. .. !! processed by numpydoc !! .. py:method:: fit(x, y, sample_weight=None, check_input=True) Fit the estimator. :Parameters: **x** : array-like of shape (n_samples, n_timesteps) The training time series. **y** : array-like of shape (n_samples,) Target values as floating point values. **sample_weight** : array-like of shape (n_samples,), optional If `None`, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node. **check_input** : bool, optional Allow to bypass several input checks. :Returns: self This object. .. !! processed by numpydoc !! .. py:method:: get_metadata_routing() Get metadata routing of this object. Please check :ref:`User Guide ` on how the routing mechanism works. :Returns: **routing** : MetadataRequest A :class:`~sklearn.utils.metadata_routing.MetadataRequest` encapsulating routing information. .. !! processed by numpydoc !! .. py:method:: get_params(deep=True) Get parameters for this estimator. :Parameters: **deep** : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. :Returns: **params** : dict Parameter names mapped to their values. .. !! processed by numpydoc !! .. py:method:: predict(x, check_input=True) Predict the value of x. :Parameters: **x** : array-like of shape (n_samples, n_timesteps) The input time series. **check_input** : bool, optional Allow to bypass several input checking. Don't use this parameter unless you know what you do. :Returns: ndarray of shape (n_samples,) The predicted classes. .. !! processed by numpydoc !! .. py:method:: score(X, y, sample_weight=None) Return the coefficient of determination of the prediction. The coefficient of determination :math:`R^2` is defined as :math:`(1 - \frac{u}{v})`, where :math:`u` is the residual sum of squares ``((y_true - y_pred)** 2).sum()`` and :math:`v` is the total sum of squares ``((y_true - y_true.mean()) ** 2).sum()``. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of `y`, disregarding the input features, would get a :math:`R^2` score of 0.0. :Parameters: **X** : array-like of shape (n_samples, n_features) Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape ``(n_samples, n_samples_fitted)``, where ``n_samples_fitted`` is the number of samples used in the fitting for the estimator. **y** : array-like of shape (n_samples,) or (n_samples, n_outputs) True values for `X`. **sample_weight** : array-like of shape (n_samples,), default=None Sample weights. :Returns: **score** : float :math:`R^2` of ``self.predict(X)`` w.r.t. `y`. .. rubric:: Notes The :math:`R^2` score used when calling ``score`` on a regressor uses ``multioutput='uniform_average'`` from version 0.23 to keep consistent with default value of :func:`~sklearn.metrics.r2_score`. This influences the ``score`` method of all the multioutput regressors (except for :class:`~sklearn.multioutput.MultiOutputRegressor`). .. !! processed by numpydoc !! .. py:method:: set_params(**params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as :class:`~sklearn.pipeline.Pipeline`). The latter have parameters of the form ``__`` so that it's possible to update each component of a nested object. :Parameters: **\*\*params** : dict Estimator parameters. :Returns: **self** : estimator instance Estimator instance. .. !! processed by numpydoc !! .. py:class:: IntervalTreeClassifier(n_intervals='sqrt', *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='entropy', intervals='fixed', sample_size=0.5, min_size=0.0, max_size=1.0, summarizer='mean_var_slope', class_weight=None, random_state=None) An interval based tree classifier. :Attributes: **tree_** : Tree The internal tree structure. .. !! processed by numpydoc !! .. py:method:: apply(x, check_input=True) Return the index of the leaf that each sample is predicted by. :Parameters: **x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep) The input samples. **check_input** : bool, optional Bypass array validation. Only set to True if you are sure your data is valid. :Returns: ndarray of shape (n_samples, ) For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count]. .. rubric:: Examples Get the leaf probability distribution of a prediction: >>> from wildboar.datasets import load_gun_point >>> from wildboar.tree import ShapeletTreeClassifier >>> X, y = load_gun_point() >>> tree = ShapeletTreeClassifier() >>> tree.fit(X, y) >>> leaves = tree.apply(X) >>> tree.tree_.value.take(leaves, axis=0) array([[0., 1.], [0., 1.], [1., 0.]]) This is equvivalent to using `tree.predict_proba`. .. !! processed by numpydoc !! .. py:method:: decision_path(x, check_input=True) Compute the decision path of the tree. :Parameters: **x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep) The input samples. **check_input** : bool, optional Bypass array validation. Only set to True if you are sure your data is valid. :Returns: sparse matrix of shape (n_samples, n_nodes) An indicator array where each nonzero values indicate that the sample traverses a node. .. !! processed by numpydoc !! .. py:method:: fit(x, y, sample_weight=None, check_input=True) Fit a classification tree. :Parameters: **x** : array-like of shape (n_samples, n_timesteps) The training time series. **y** : array-like of shape (n_samples,) The target values. **sample_weight** : array-like of shape (n_samples,), optional If `None`, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node. **check_input** : bool, optional Allow to bypass several input checks. :Returns: self This instance. .. !! processed by numpydoc !! .. py:method:: get_metadata_routing() Get metadata routing of this object. Please check :ref:`User Guide ` on how the routing mechanism works. :Returns: **routing** : MetadataRequest A :class:`~sklearn.utils.metadata_routing.MetadataRequest` encapsulating routing information. .. !! processed by numpydoc !! .. py:method:: get_params(deep=True) Get parameters for this estimator. :Parameters: **deep** : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. :Returns: **params** : dict Parameter names mapped to their values. .. !! processed by numpydoc !! .. py:method:: predict(x, check_input=True) Predict the regression of the input samples x. :Parameters: **x** : array-like of shape (n_samples, n_timesteps) The input time series. **check_input** : bool, optional Allow to bypass several input checking. Don't use this parameter unless you know what you do. :Returns: ndarray of shape (n_samples,) The predicted classes. .. !! processed by numpydoc !! .. py:method:: predict_proba(x, check_input=True) Predict class probabilities of the input samples X. The predicted class probability is the fraction of samples of the same class in a leaf. :Parameters: **x** : array-like of shape (n_samples, n_timesteps) The input time series. **check_input** : bool, optional Allow to bypass several input checking. Don't use this parameter unless you know what you do. :Returns: ndarray of shape (n_samples, n_classes) The class probabilities of the input samples. The order of the classes corresponds to that in the attribute `classes_`. .. !! processed by numpydoc !! .. py:method:: score(X, y, sample_weight=None) Return the mean accuracy on the given test data and labels. In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted. :Parameters: **X** : array-like of shape (n_samples, n_features) Test samples. **y** : array-like of shape (n_samples,) or (n_samples, n_outputs) True labels for `X`. **sample_weight** : array-like of shape (n_samples,), default=None Sample weights. :Returns: **score** : float Mean accuracy of ``self.predict(X)`` w.r.t. `y`. .. !! processed by numpydoc !! .. py:method:: set_params(**params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as :class:`~sklearn.pipeline.Pipeline`). The latter have parameters of the form ``__`` so that it's possible to update each component of a nested object. :Parameters: **\*\*params** : dict Estimator parameters. :Returns: **self** : estimator instance Estimator instance. .. !! processed by numpydoc !! .. py:class:: IntervalTreeRegressor(n_intervals='sqrt', *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='squared_error', intervals='fixed', sample_size=0.5, min_size=0.0, max_size=1.0, summarizer='mean_var_slope', random_state=None) An interval based tree regressor. :Attributes: **tree_** : Tree The internal tree structure. .. !! processed by numpydoc !! .. py:method:: apply(x, check_input=True) Return the index of the leaf that each sample is predicted by. :Parameters: **x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep) The input samples. **check_input** : bool, optional Bypass array validation. Only set to True if you are sure your data is valid. :Returns: ndarray of shape (n_samples, ) For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count]. .. rubric:: Examples Get the leaf probability distribution of a prediction: >>> from wildboar.datasets import load_gun_point >>> from wildboar.tree import ShapeletTreeClassifier >>> X, y = load_gun_point() >>> tree = ShapeletTreeClassifier() >>> tree.fit(X, y) >>> leaves = tree.apply(X) >>> tree.tree_.value.take(leaves, axis=0) array([[0., 1.], [0., 1.], [1., 0.]]) This is equvivalent to using `tree.predict_proba`. .. !! processed by numpydoc !! .. py:method:: decision_path(x, check_input=True) Compute the decision path of the tree. :Parameters: **x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep) The input samples. **check_input** : bool, optional Bypass array validation. Only set to True if you are sure your data is valid. :Returns: sparse matrix of shape (n_samples, n_nodes) An indicator array where each nonzero values indicate that the sample traverses a node. .. !! processed by numpydoc !! .. py:method:: fit(x, y, sample_weight=None, check_input=True) Fit the estimator. :Parameters: **x** : array-like of shape (n_samples, n_timesteps) The training time series. **y** : array-like of shape (n_samples,) Target values as floating point values. **sample_weight** : array-like of shape (n_samples,), optional If `None`, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node. **check_input** : bool, optional Allow to bypass several input checks. :Returns: self This object. .. !! processed by numpydoc !! .. py:method:: get_metadata_routing() Get metadata routing of this object. Please check :ref:`User Guide ` on how the routing mechanism works. :Returns: **routing** : MetadataRequest A :class:`~sklearn.utils.metadata_routing.MetadataRequest` encapsulating routing information. .. !! processed by numpydoc !! .. py:method:: get_params(deep=True) Get parameters for this estimator. :Parameters: **deep** : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. :Returns: **params** : dict Parameter names mapped to their values. .. !! processed by numpydoc !! .. py:method:: predict(x, check_input=True) Predict the value of x. :Parameters: **x** : array-like of shape (n_samples, n_timesteps) The input time series. **check_input** : bool, optional Allow to bypass several input checking. Don't use this parameter unless you know what you do. :Returns: ndarray of shape (n_samples,) The predicted classes. .. !! processed by numpydoc !! .. py:method:: score(X, y, sample_weight=None) Return the coefficient of determination of the prediction. The coefficient of determination :math:`R^2` is defined as :math:`(1 - \frac{u}{v})`, where :math:`u` is the residual sum of squares ``((y_true - y_pred)** 2).sum()`` and :math:`v` is the total sum of squares ``((y_true - y_true.mean()) ** 2).sum()``. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of `y`, disregarding the input features, would get a :math:`R^2` score of 0.0. :Parameters: **X** : array-like of shape (n_samples, n_features) Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape ``(n_samples, n_samples_fitted)``, where ``n_samples_fitted`` is the number of samples used in the fitting for the estimator. **y** : array-like of shape (n_samples,) or (n_samples, n_outputs) True values for `X`. **sample_weight** : array-like of shape (n_samples,), default=None Sample weights. :Returns: **score** : float :math:`R^2` of ``self.predict(X)`` w.r.t. `y`. .. rubric:: Notes The :math:`R^2` score used when calling ``score`` on a regressor uses ``multioutput='uniform_average'`` from version 0.23 to keep consistent with default value of :func:`~sklearn.metrics.r2_score`. This influences the ``score`` method of all the multioutput regressors (except for :class:`~sklearn.multioutput.MultiOutputRegressor`). .. !! processed by numpydoc !! .. py:method:: set_params(**params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as :class:`~sklearn.pipeline.Pipeline`). The latter have parameters of the form ``__`` so that it's possible to update each component of a nested object. :Parameters: **\*\*params** : dict Estimator parameters. :Returns: **self** : estimator instance Estimator instance. .. !! processed by numpydoc !! .. py:class:: PivotTreeClassifier(n_pivot='sqrt', *, metrics='all', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='entropy', class_weight=None, random_state=None) A tree classifier that uses pivot time series. :Attributes: **tree_** : Tree The internal tree representation .. !! processed by numpydoc !! .. py:method:: apply(x, check_input=True) Return the index of the leaf that each sample is predicted by. :Parameters: **x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep) The input samples. **check_input** : bool, optional Bypass array validation. Only set to True if you are sure your data is valid. :Returns: ndarray of shape (n_samples, ) For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count]. .. rubric:: Examples Get the leaf probability distribution of a prediction: >>> from wildboar.datasets import load_gun_point >>> from wildboar.tree import ShapeletTreeClassifier >>> X, y = load_gun_point() >>> tree = ShapeletTreeClassifier() >>> tree.fit(X, y) >>> leaves = tree.apply(X) >>> tree.tree_.value.take(leaves, axis=0) array([[0., 1.], [0., 1.], [1., 0.]]) This is equvivalent to using `tree.predict_proba`. .. !! processed by numpydoc !! .. py:method:: decision_path(x, check_input=True) Compute the decision path of the tree. :Parameters: **x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep) The input samples. **check_input** : bool, optional Bypass array validation. Only set to True if you are sure your data is valid. :Returns: sparse matrix of shape (n_samples, n_nodes) An indicator array where each nonzero values indicate that the sample traverses a node. .. !! processed by numpydoc !! .. py:method:: fit(x, y, sample_weight=None, check_input=True) Fit a classification tree. :Parameters: **x** : array-like of shape (n_samples, n_timesteps) The training time series. **y** : array-like of shape (n_samples,) The target values. **sample_weight** : array-like of shape (n_samples,), optional If `None`, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node. **check_input** : bool, optional Allow to bypass several input checks. :Returns: self This instance. .. !! processed by numpydoc !! .. py:method:: get_metadata_routing() Get metadata routing of this object. Please check :ref:`User Guide ` on how the routing mechanism works. :Returns: **routing** : MetadataRequest A :class:`~sklearn.utils.metadata_routing.MetadataRequest` encapsulating routing information. .. !! processed by numpydoc !! .. py:method:: get_params(deep=True) Get parameters for this estimator. :Parameters: **deep** : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. :Returns: **params** : dict Parameter names mapped to their values. .. !! processed by numpydoc !! .. py:method:: predict(x, check_input=True) Predict the regression of the input samples x. :Parameters: **x** : array-like of shape (n_samples, n_timesteps) The input time series. **check_input** : bool, optional Allow to bypass several input checking. Don't use this parameter unless you know what you do. :Returns: ndarray of shape (n_samples,) The predicted classes. .. !! processed by numpydoc !! .. py:method:: predict_proba(x, check_input=True) Predict class probabilities of the input samples X. The predicted class probability is the fraction of samples of the same class in a leaf. :Parameters: **x** : array-like of shape (n_samples, n_timesteps) The input time series. **check_input** : bool, optional Allow to bypass several input checking. Don't use this parameter unless you know what you do. :Returns: ndarray of shape (n_samples, n_classes) The class probabilities of the input samples. The order of the classes corresponds to that in the attribute `classes_`. .. !! processed by numpydoc !! .. py:method:: score(X, y, sample_weight=None) Return the mean accuracy on the given test data and labels. In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted. :Parameters: **X** : array-like of shape (n_samples, n_features) Test samples. **y** : array-like of shape (n_samples,) or (n_samples, n_outputs) True labels for `X`. **sample_weight** : array-like of shape (n_samples,), default=None Sample weights. :Returns: **score** : float Mean accuracy of ``self.predict(X)`` w.r.t. `y`. .. !! processed by numpydoc !! .. py:method:: set_params(**params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as :class:`~sklearn.pipeline.Pipeline`). The latter have parameters of the form ``__`` so that it's possible to update each component of a nested object. :Parameters: **\*\*params** : dict Estimator parameters. :Returns: **self** : estimator instance Estimator instance. .. !! processed by numpydoc !! .. py:class:: ProximityTreeClassifier(n_pivot=1, *, criterion='entropy', pivot_sample='label', metric_sample='weighted', metric='auto', metric_params=None, metric_factories=None, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, class_weight=None, random_state=None) A classifier that uses a k-branching tree based on pivot-time series. :Parameters: **n_pivot** : int, optional The number of pivots to sample at each node. **criterion** : {"entropy", "gini"}, optional The impurity criterion. **pivot_sample** : {"label", "uniform"}, optional The pivot sampling method. **metric_sample** : {"uniform", "weighted"}, optional The metric sampling method. **metric** : {"auto", "default"}, str or list, optional The distance metrics. By default, we use the parameterization suggested by Lucas et.al (2019). - If "auto", use the default metric specification, suggested by (Lucas et. al, 2020). - If str, use a single metric or default metric specification. - If list, custom metric specification can be given as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a `dict` with two mandatory and one optional key-value pairs defining the lower and upper bound on the values as well as the number of values in the grid. For example, to specifiy a grid over the argument 'r' with 10 values in the range 0 to 1, we would give the following specification: `dict(min_r=0, max_r=1, num_r=10)`. Read more about the metrics and their parameters in the :ref:`User guide `. **metric_params** : dict, optional Parameters for the distance measure. Ignored unless metric is a string. Read more about the parameters in the :ref:`User guide `. **metric_factories** : dict, optional A metric specification. .. deprecated:: 1.2 Use the combination of metric and metric params. **max_depth** : int, optional The maximum tree depth. **min_samples_split** : int, optional The minimum number of samples to consider a split. **min_samples_leaf** : int, optional The minimum number of samples in a leaf. **min_impurity_decrease** : float, optional The minimum impurity decrease to build a sub-tree. **class_weight** : dict or "balanced", optional Weights associated with the labels. - if dict, weights on the form {label: weight}. - if "balanced" each class weight inversely proportional to the class frequency. - if None, each class has equal weight. **random_state** : int or RandomState - If `int`, `random_state` is the seed used by the random number generator - If `RandomState` instance, `random_state` is the random number generator - If `None`, the random number generator is the `RandomState` instance used by `np.random`. .. rubric:: References Lucas, Benjamin, Ahmed Shifaz, Charlotte Pelletier, Lachlan O'Neill, Nayyar Zaidi, Bart Goethals, François Petitjean, and Geoffrey I. Webb. (2019) Proximity forest: an effective and scalable distance-based classifier for time series. Data Mining and Knowledge Discovery .. only:: latex .. rubric:: Examples Fit a single proximity tree, with dynamic time warping and move-split-merge metrics. >>> from wildboar.datasets import load_dataset >>> from wildboar.tree import ProximityTreeClassifier >>> x, y = load_dataset("GunPoint") >>> f = ProximityTreeClassifier( ... n_pivot=10, ... metrics=[ ... ("dtw", {"min_r": 0.1, "max_r": 0.25}), ... ("msm", {"min_c": 0.1, "max_c": 100, "num_c": 20}) ... ], ... criterion="gini" ... ) >>> f.fit(x, y) .. !! processed by numpydoc !! .. py:method:: apply(x, check_input=True) Return the index of the leaf that each sample is predicted by. :Parameters: **x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep) The input samples. **check_input** : bool, optional Bypass array validation. Only set to True if you are sure your data is valid. :Returns: ndarray of shape (n_samples, ) For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count]. .. rubric:: Examples Get the leaf probability distribution of a prediction: >>> from wildboar.datasets import load_gun_point >>> from wildboar.tree import ShapeletTreeClassifier >>> X, y = load_gun_point() >>> tree = ShapeletTreeClassifier() >>> tree.fit(X, y) >>> leaves = tree.apply(X) >>> tree.tree_.value.take(leaves, axis=0) array([[0., 1.], [0., 1.], [1., 0.]]) This is equvivalent to using `tree.predict_proba`. .. !! processed by numpydoc !! .. py:method:: decision_path(x, check_input=True) Compute the decision path of the tree. :Parameters: **x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep) The input samples. **check_input** : bool, optional Bypass array validation. Only set to True if you are sure your data is valid. :Returns: sparse matrix of shape (n_samples, n_nodes) An indicator array where each nonzero values indicate that the sample traverses a node. .. !! processed by numpydoc !! .. py:method:: fit(x, y, sample_weight=None, check_input=True) Fit a classification tree. :Parameters: **x** : array-like of shape (n_samples, n_timesteps) The training time series. **y** : array-like of shape (n_samples,) The target values. **sample_weight** : array-like of shape (n_samples,), optional If `None`, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node. **check_input** : bool, optional Allow to bypass several input checks. :Returns: self This instance. .. !! processed by numpydoc !! .. py:method:: get_metadata_routing() Get metadata routing of this object. Please check :ref:`User Guide ` on how the routing mechanism works. :Returns: **routing** : MetadataRequest A :class:`~sklearn.utils.metadata_routing.MetadataRequest` encapsulating routing information. .. !! processed by numpydoc !! .. py:method:: get_params(deep=True) Get parameters for this estimator. :Parameters: **deep** : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. :Returns: **params** : dict Parameter names mapped to their values. .. !! processed by numpydoc !! .. py:method:: predict(x, check_input=True) Predict the regression of the input samples x. :Parameters: **x** : array-like of shape (n_samples, n_timesteps) The input time series. **check_input** : bool, optional Allow to bypass several input checking. Don't use this parameter unless you know what you do. :Returns: ndarray of shape (n_samples,) The predicted classes. .. !! processed by numpydoc !! .. py:method:: predict_proba(x, check_input=True) Predict class probabilities of the input samples X. The predicted class probability is the fraction of samples of the same class in a leaf. :Parameters: **x** : array-like of shape (n_samples, n_timesteps) The input time series. **check_input** : bool, optional Allow to bypass several input checking. Don't use this parameter unless you know what you do. :Returns: ndarray of shape (n_samples, n_classes) The class probabilities of the input samples. The order of the classes corresponds to that in the attribute `classes_`. .. !! processed by numpydoc !! .. py:method:: score(X, y, sample_weight=None) Return the mean accuracy on the given test data and labels. In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted. :Parameters: **X** : array-like of shape (n_samples, n_features) Test samples. **y** : array-like of shape (n_samples,) or (n_samples, n_outputs) True labels for `X`. **sample_weight** : array-like of shape (n_samples,), default=None Sample weights. :Returns: **score** : float Mean accuracy of ``self.predict(X)`` w.r.t. `y`. .. !! processed by numpydoc !! .. py:method:: set_params(**params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as :class:`~sklearn.pipeline.Pipeline`). The latter have parameters of the form ``__`` so that it's possible to update each component of a nested object. :Parameters: **\*\*params** : dict Estimator parameters. :Returns: **self** : estimator instance Estimator instance. .. !! processed by numpydoc !! .. py:class:: RocketTreeClassifier(n_kernels=10, *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='entropy', sampling='normal', sampling_params=None, kernel_size=None, min_size=None, max_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, class_weight=None, random_state=None) A tree classifier that uses random convolutions as features. :Attributes: **tree_** : Tree The internal tree representation. .. !! processed by numpydoc !! .. py:method:: apply(x, check_input=True) Return the index of the leaf that each sample is predicted by. :Parameters: **x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep) The input samples. **check_input** : bool, optional Bypass array validation. Only set to True if you are sure your data is valid. :Returns: ndarray of shape (n_samples, ) For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count]. .. rubric:: Examples Get the leaf probability distribution of a prediction: >>> from wildboar.datasets import load_gun_point >>> from wildboar.tree import ShapeletTreeClassifier >>> X, y = load_gun_point() >>> tree = ShapeletTreeClassifier() >>> tree.fit(X, y) >>> leaves = tree.apply(X) >>> tree.tree_.value.take(leaves, axis=0) array([[0., 1.], [0., 1.], [1., 0.]]) This is equvivalent to using `tree.predict_proba`. .. !! processed by numpydoc !! .. py:method:: decision_path(x, check_input=True) Compute the decision path of the tree. :Parameters: **x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep) The input samples. **check_input** : bool, optional Bypass array validation. Only set to True if you are sure your data is valid. :Returns: sparse matrix of shape (n_samples, n_nodes) An indicator array where each nonzero values indicate that the sample traverses a node. .. !! processed by numpydoc !! .. py:method:: fit(x, y, sample_weight=None, check_input=True) Fit a classification tree. :Parameters: **x** : array-like of shape (n_samples, n_timesteps) The training time series. **y** : array-like of shape (n_samples,) The target values. **sample_weight** : array-like of shape (n_samples,), optional If `None`, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node. **check_input** : bool, optional Allow to bypass several input checks. :Returns: self This instance. .. !! processed by numpydoc !! .. py:method:: get_metadata_routing() Get metadata routing of this object. Please check :ref:`User Guide ` on how the routing mechanism works. :Returns: **routing** : MetadataRequest A :class:`~sklearn.utils.metadata_routing.MetadataRequest` encapsulating routing information. .. !! processed by numpydoc !! .. py:method:: get_params(deep=True) Get parameters for this estimator. :Parameters: **deep** : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. :Returns: **params** : dict Parameter names mapped to their values. .. !! processed by numpydoc !! .. py:method:: predict(x, check_input=True) Predict the regression of the input samples x. :Parameters: **x** : array-like of shape (n_samples, n_timesteps) The input time series. **check_input** : bool, optional Allow to bypass several input checking. Don't use this parameter unless you know what you do. :Returns: ndarray of shape (n_samples,) The predicted classes. .. !! processed by numpydoc !! .. py:method:: predict_proba(x, check_input=True) Predict class probabilities of the input samples X. The predicted class probability is the fraction of samples of the same class in a leaf. :Parameters: **x** : array-like of shape (n_samples, n_timesteps) The input time series. **check_input** : bool, optional Allow to bypass several input checking. Don't use this parameter unless you know what you do. :Returns: ndarray of shape (n_samples, n_classes) The class probabilities of the input samples. The order of the classes corresponds to that in the attribute `classes_`. .. !! processed by numpydoc !! .. py:method:: score(X, y, sample_weight=None) Return the mean accuracy on the given test data and labels. In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted. :Parameters: **X** : array-like of shape (n_samples, n_features) Test samples. **y** : array-like of shape (n_samples,) or (n_samples, n_outputs) True labels for `X`. **sample_weight** : array-like of shape (n_samples,), default=None Sample weights. :Returns: **score** : float Mean accuracy of ``self.predict(X)`` w.r.t. `y`. .. !! processed by numpydoc !! .. py:method:: set_params(**params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as :class:`~sklearn.pipeline.Pipeline`). The latter have parameters of the form ``__`` so that it's possible to update each component of a nested object. :Parameters: **\*\*params** : dict Estimator parameters. :Returns: **self** : estimator instance Estimator instance. .. !! processed by numpydoc !! .. py:class:: RocketTreeRegressor(n_kernels=10, *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='squared_error', sampling='normal', sampling_params=None, kernel_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, random_state=None) A tree regressor that uses random convolutions as features. :Attributes: **tree_** : Tree The internal tree representation. .. !! processed by numpydoc !! .. py:method:: apply(x, check_input=True) Return the index of the leaf that each sample is predicted by. :Parameters: **x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep) The input samples. **check_input** : bool, optional Bypass array validation. Only set to True if you are sure your data is valid. :Returns: ndarray of shape (n_samples, ) For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count]. .. rubric:: Examples Get the leaf probability distribution of a prediction: >>> from wildboar.datasets import load_gun_point >>> from wildboar.tree import ShapeletTreeClassifier >>> X, y = load_gun_point() >>> tree = ShapeletTreeClassifier() >>> tree.fit(X, y) >>> leaves = tree.apply(X) >>> tree.tree_.value.take(leaves, axis=0) array([[0., 1.], [0., 1.], [1., 0.]]) This is equvivalent to using `tree.predict_proba`. .. !! processed by numpydoc !! .. py:method:: decision_path(x, check_input=True) Compute the decision path of the tree. :Parameters: **x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep) The input samples. **check_input** : bool, optional Bypass array validation. Only set to True if you are sure your data is valid. :Returns: sparse matrix of shape (n_samples, n_nodes) An indicator array where each nonzero values indicate that the sample traverses a node. .. !! processed by numpydoc !! .. py:method:: fit(x, y, sample_weight=None, check_input=True) Fit the estimator. :Parameters: **x** : array-like of shape (n_samples, n_timesteps) The training time series. **y** : array-like of shape (n_samples,) Target values as floating point values. **sample_weight** : array-like of shape (n_samples,), optional If `None`, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node. **check_input** : bool, optional Allow to bypass several input checks. :Returns: self This object. .. !! processed by numpydoc !! .. py:method:: get_metadata_routing() Get metadata routing of this object. Please check :ref:`User Guide ` on how the routing mechanism works. :Returns: **routing** : MetadataRequest A :class:`~sklearn.utils.metadata_routing.MetadataRequest` encapsulating routing information. .. !! processed by numpydoc !! .. py:method:: get_params(deep=True) Get parameters for this estimator. :Parameters: **deep** : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. :Returns: **params** : dict Parameter names mapped to their values. .. !! processed by numpydoc !! .. py:method:: predict(x, check_input=True) Predict the value of x. :Parameters: **x** : array-like of shape (n_samples, n_timesteps) The input time series. **check_input** : bool, optional Allow to bypass several input checking. Don't use this parameter unless you know what you do. :Returns: ndarray of shape (n_samples,) The predicted classes. .. !! processed by numpydoc !! .. py:method:: score(X, y, sample_weight=None) Return the coefficient of determination of the prediction. The coefficient of determination :math:`R^2` is defined as :math:`(1 - \frac{u}{v})`, where :math:`u` is the residual sum of squares ``((y_true - y_pred)** 2).sum()`` and :math:`v` is the total sum of squares ``((y_true - y_true.mean()) ** 2).sum()``. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of `y`, disregarding the input features, would get a :math:`R^2` score of 0.0. :Parameters: **X** : array-like of shape (n_samples, n_features) Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape ``(n_samples, n_samples_fitted)``, where ``n_samples_fitted`` is the number of samples used in the fitting for the estimator. **y** : array-like of shape (n_samples,) or (n_samples, n_outputs) True values for `X`. **sample_weight** : array-like of shape (n_samples,), default=None Sample weights. :Returns: **score** : float :math:`R^2` of ``self.predict(X)`` w.r.t. `y`. .. rubric:: Notes The :math:`R^2` score used when calling ``score`` on a regressor uses ``multioutput='uniform_average'`` from version 0.23 to keep consistent with default value of :func:`~sklearn.metrics.r2_score`. This influences the ``score`` method of all the multioutput regressors (except for :class:`~sklearn.multioutput.MultiOutputRegressor`). .. !! processed by numpydoc !! .. py:method:: set_params(**params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as :class:`~sklearn.pipeline.Pipeline`). The latter have parameters of the form ``__`` so that it's possible to update each component of a nested object. :Parameters: **\*\*params** : dict Estimator parameters. :Returns: **self** : estimator instance Estimator instance. .. !! processed by numpydoc !! .. py:class:: ShapeletTreeClassifier(*, n_shapelets='log2', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, alpha=None, metric='euclidean', metric_params=None, criterion='entropy', class_weight=None, random_state=None) A shapelet tree classifier. :Parameters: **n_shapelets** : int, optional The number of shapelets to sample at each node. **max_depth** : int, optional The maximum depth of the tree. If `None` the tree is expanded until all leaves are pure or until all leaves contain less than `min_samples_split` samples. **min_samples_split** : int, optional The minimum number of samples to split an internal node. **min_samples_leaf** : int, optional The minimum number of samples in a leaf. **criterion** : {"entropy", "gini"}, optional The criterion used to evaluate the utility of a split. **min_impurity_decrease** : float, optional A split will be introduced only if the impurity decrease is larger than or equal to this value. **min_shapelet_size** : float, optional The minimum length of a sampled shapelet expressed as a fraction, computed as ``min(ceil(X.shape[-1] * min_shapelet_size), 2)``. **max_shapelet_size** : float, optional The maximum length of a sampled shapelet, expressed as a fraction, computed as ``ceil(X.shape[-1] * max_shapelet_size)``. **alpha** : float, optional Dynamically decrease the number of sampled shapelets at each node according to the current depth. .. math:`w = 1 - e^{-|alpha| * depth})` - if :math:`alpha < 0`, the number of sampled shapelets decrease from ``n_shapelets`` towards 1 with increased depth. .. math:`n_shapelets * (1 - w)` - if :math:`alpha > 0`, the number of sampled shapelets increase from ``1`` towards ``n_shapelets`` with increased depth. .. math:`n_shapelets * w` - if ``None``, the number of sampled shapelets are the same independeth of depth. **metric** : {"euclidean", "scaled_euclidean", "dtw", "scaled_dtw"}, optional Distance metric used to identify the best shapelet. **metric_params** : dict, optional Parameters for the distance measure **class_weight** : dict or "balanced", optional Weights associated with the labels - if dict, weights on the form {label: weight} - if "balanced" each class weight inversely proportional to the class frequency - if None, each class has equal weight. **random_state** : int or RandomState - If `int`, `random_state` is the seed used by the random number generator; - If `RandomState` instance, `random_state` is the random number generator; - If `None`, the random number generator is the `RandomState` instance used by `np.random`. .. seealso:: :obj:`ShapeletTreeRegressor` A shapelet tree regressor. :obj:`ExtraShapeletTreeClassifier` An extra random shapelet tree classifier. :Attributes: **tree_** : Tree The tree data structure used internally **classes_** : ndarray of shape (n_classes,) The class labels **n_classes_** : int The number of class labels .. !! processed by numpydoc !! .. py:method:: apply(x, check_input=True) Return the index of the leaf that each sample is predicted by. :Parameters: **x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep) The input samples. **check_input** : bool, optional Bypass array validation. Only set to True if you are sure your data is valid. :Returns: ndarray of shape (n_samples, ) For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count]. .. rubric:: Examples Get the leaf probability distribution of a prediction: >>> from wildboar.datasets import load_gun_point >>> from wildboar.tree import ShapeletTreeClassifier >>> X, y = load_gun_point() >>> tree = ShapeletTreeClassifier() >>> tree.fit(X, y) >>> leaves = tree.apply(X) >>> tree.tree_.value.take(leaves, axis=0) array([[0., 1.], [0., 1.], [1., 0.]]) This is equvivalent to using `tree.predict_proba`. .. !! processed by numpydoc !! .. py:method:: decision_path(x, check_input=True) Compute the decision path of the tree. :Parameters: **x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep) The input samples. **check_input** : bool, optional Bypass array validation. Only set to True if you are sure your data is valid. :Returns: sparse matrix of shape (n_samples, n_nodes) An indicator array where each nonzero values indicate that the sample traverses a node. .. !! processed by numpydoc !! .. py:method:: fit(x, y, sample_weight=None, check_input=True) Fit a classification tree. :Parameters: **x** : array-like of shape (n_samples, n_timesteps) The training time series. **y** : array-like of shape (n_samples,) The target values. **sample_weight** : array-like of shape (n_samples,), optional If `None`, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node. **check_input** : bool, optional Allow to bypass several input checks. :Returns: self This instance. .. !! processed by numpydoc !! .. py:method:: get_metadata_routing() Get metadata routing of this object. Please check :ref:`User Guide ` on how the routing mechanism works. :Returns: **routing** : MetadataRequest A :class:`~sklearn.utils.metadata_routing.MetadataRequest` encapsulating routing information. .. !! processed by numpydoc !! .. py:method:: get_params(deep=True) Get parameters for this estimator. :Parameters: **deep** : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. :Returns: **params** : dict Parameter names mapped to their values. .. !! processed by numpydoc !! .. py:method:: predict(x, check_input=True) Predict the regression of the input samples x. :Parameters: **x** : array-like of shape (n_samples, n_timesteps) The input time series. **check_input** : bool, optional Allow to bypass several input checking. Don't use this parameter unless you know what you do. :Returns: ndarray of shape (n_samples,) The predicted classes. .. !! processed by numpydoc !! .. py:method:: predict_proba(x, check_input=True) Predict class probabilities of the input samples X. The predicted class probability is the fraction of samples of the same class in a leaf. :Parameters: **x** : array-like of shape (n_samples, n_timesteps) The input time series. **check_input** : bool, optional Allow to bypass several input checking. Don't use this parameter unless you know what you do. :Returns: ndarray of shape (n_samples, n_classes) The class probabilities of the input samples. The order of the classes corresponds to that in the attribute `classes_`. .. !! processed by numpydoc !! .. py:method:: score(X, y, sample_weight=None) Return the mean accuracy on the given test data and labels. In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted. :Parameters: **X** : array-like of shape (n_samples, n_features) Test samples. **y** : array-like of shape (n_samples,) or (n_samples, n_outputs) True labels for `X`. **sample_weight** : array-like of shape (n_samples,), default=None Sample weights. :Returns: **score** : float Mean accuracy of ``self.predict(X)`` w.r.t. `y`. .. !! processed by numpydoc !! .. py:method:: set_params(**params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as :class:`~sklearn.pipeline.Pipeline`). The latter have parameters of the form ``__`` so that it's possible to update each component of a nested object. :Parameters: **\*\*params** : dict Estimator parameters. :Returns: **self** : estimator instance Estimator instance. .. !! processed by numpydoc !! .. py:class:: ShapeletTreeRegressor(*, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, n_shapelets='log2', min_shapelet_size=0, max_shapelet_size=1, alpha=None, metric='euclidean', metric_params=None, criterion='squared_error', random_state=None) A shapelet tree regressor. :Parameters: **max_depth** : int, optional The maximum depth of the tree. If ``None`` the tree is expanded until all leaves are pure or until all leaves contain less than ``min_samples_split`` samples. **min_samples_split** : int, optional The minimum number of samples to split an internal node. **min_samples_leaf** : int, optional The minimum number of samples in a leaf. **min_impurity_decrease** : float, optional A split will be introduced only if the impurity decrease is larger than or equal to this value. **n_shapelets** : int, optional The number of shapelets to sample at each node. **min_shapelet_size** : float, optional The minimum length of a shapelets expressed as a fraction of *n_timestep*. **max_shapelet_size** : float, optional The maximum length of a shapelets expressed as a fraction of *n_timestep*. **alpha** : float, optional Dynamically decrease the number of sampled shapelets at each node according to the current depth, i.e.: :: w = 1 - exp(-abs(alpha) * depth) - if ``alpha < 0``, the number of sampled shapelets decrease from ``n_shapelets`` towards 1 with increased depth. - if ``alpha > 0``, the number of sampled shapelets increase from ``1`` towards ``n_shapelets`` with increased depth. - if ``None``, the number of sampled shapelets are the same independeth of depth. **metric** : str or list, optional - If ``str``, the distance metric used to identify the best shapelet. - If ``list``, multiple metrics specified as a list of tuples, where the first element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument ``r`` with 10 values in the range 0 to 1, we would give the following specification: ``dict(min_r=0, max_r=1, num_r=10)``. Read more about metric specifications in the `User guide `__ .. versionchanged:: 1.2 Added support for multi-metric shapelet transform **metric_params** : dict, optional Parameters for the distance measure. Ignored unless metric is a string. Read more about the parameters in the `User guide `__. **criterion** : {"squared_error"}, optional The criterion used to evaluate the utility of a split. .. deprecated:: 1.1 Criterion "mse" was deprecated in v1.1 and removed in version 1.2. **random_state** : int or RandomState - If ``int``, ``random_state`` is the seed used by the random number generator - If :class:`numpy.random.RandomState` instance, ``random_state`` is the random number generator - If ``None``, the random number generator is the :class:`numpy.random.RandomState` instance used by :func:`numpy.random`. :Attributes: **tree_** : Tree The internal tree representation .. !! processed by numpydoc !! .. py:method:: apply(x, check_input=True) Return the index of the leaf that each sample is predicted by. :Parameters: **x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep) The input samples. **check_input** : bool, optional Bypass array validation. Only set to True if you are sure your data is valid. :Returns: ndarray of shape (n_samples, ) For every sample, return the index of the leaf that the sample ends up in. The index is in the range [0; node_count]. .. rubric:: Examples Get the leaf probability distribution of a prediction: >>> from wildboar.datasets import load_gun_point >>> from wildboar.tree import ShapeletTreeClassifier >>> X, y = load_gun_point() >>> tree = ShapeletTreeClassifier() >>> tree.fit(X, y) >>> leaves = tree.apply(X) >>> tree.tree_.value.take(leaves, axis=0) array([[0., 1.], [0., 1.], [1., 0.]]) This is equvivalent to using `tree.predict_proba`. .. !! processed by numpydoc !! .. py:method:: decision_path(x, check_input=True) Compute the decision path of the tree. :Parameters: **x** : array-like of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep) The input samples. **check_input** : bool, optional Bypass array validation. Only set to True if you are sure your data is valid. :Returns: sparse matrix of shape (n_samples, n_nodes) An indicator array where each nonzero values indicate that the sample traverses a node. .. !! processed by numpydoc !! .. py:method:: fit(x, y, sample_weight=None, check_input=True) Fit the estimator. :Parameters: **x** : array-like of shape (n_samples, n_timesteps) The training time series. **y** : array-like of shape (n_samples,) Target values as floating point values. **sample_weight** : array-like of shape (n_samples,), optional If `None`, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. Splits are also ignored if they would result in any single class carrying a negative weight in either child node. **check_input** : bool, optional Allow to bypass several input checks. :Returns: self This object. .. !! processed by numpydoc !! .. py:method:: get_metadata_routing() Get metadata routing of this object. Please check :ref:`User Guide ` on how the routing mechanism works. :Returns: **routing** : MetadataRequest A :class:`~sklearn.utils.metadata_routing.MetadataRequest` encapsulating routing information. .. !! processed by numpydoc !! .. py:method:: get_params(deep=True) Get parameters for this estimator. :Parameters: **deep** : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. :Returns: **params** : dict Parameter names mapped to their values. .. !! processed by numpydoc !! .. py:method:: predict(x, check_input=True) Predict the value of x. :Parameters: **x** : array-like of shape (n_samples, n_timesteps) The input time series. **check_input** : bool, optional Allow to bypass several input checking. Don't use this parameter unless you know what you do. :Returns: ndarray of shape (n_samples,) The predicted classes. .. !! processed by numpydoc !! .. py:method:: score(X, y, sample_weight=None) Return the coefficient of determination of the prediction. The coefficient of determination :math:`R^2` is defined as :math:`(1 - \frac{u}{v})`, where :math:`u` is the residual sum of squares ``((y_true - y_pred)** 2).sum()`` and :math:`v` is the total sum of squares ``((y_true - y_true.mean()) ** 2).sum()``. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of `y`, disregarding the input features, would get a :math:`R^2` score of 0.0. :Parameters: **X** : array-like of shape (n_samples, n_features) Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape ``(n_samples, n_samples_fitted)``, where ``n_samples_fitted`` is the number of samples used in the fitting for the estimator. **y** : array-like of shape (n_samples,) or (n_samples, n_outputs) True values for `X`. **sample_weight** : array-like of shape (n_samples,), default=None Sample weights. :Returns: **score** : float :math:`R^2` of ``self.predict(X)`` w.r.t. `y`. .. rubric:: Notes The :math:`R^2` score used when calling ``score`` on a regressor uses ``multioutput='uniform_average'`` from version 0.23 to keep consistent with default value of :func:`~sklearn.metrics.r2_score`. This influences the ``score`` method of all the multioutput regressors (except for :class:`~sklearn.multioutput.MultiOutputRegressor`). .. !! processed by numpydoc !! .. py:method:: set_params(**params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as :class:`~sklearn.pipeline.Pipeline`). The latter have parameters of the form ``__`` so that it's possible to update each component of a nested object. :Parameters: **\*\*params** : dict Estimator parameters. :Returns: **self** : estimator instance Estimator instance. .. !! processed by numpydoc !!