wildboar.tree#

Submodules#

Package Contents#

Classes#

ExtraShapeletTreeClassifier

An extra shapelet tree classifier.

ExtraShapeletTreeRegressor

An extra shapelet tree regressor.

IntervalTreeClassifier

An interval based tree classifier.

IntervalTreeRegressor

An interval based tree regressor.

PivotTreeClassifier

A tree classifier that uses pivot time series.

ProximityTreeClassifier

A classifier that uses a k-branching tree based on pivot-time series.

RocketTreeClassifier

A tree classifier that uses random convolutions as features.

RocketTreeRegressor

A tree regressor that uses random convolutions as features.

ShapeletTreeClassifier

A shapelet tree classifier.

ShapeletTreeRegressor

A shapelet tree regressor.

class wildboar.tree.ExtraShapeletTreeClassifier(*, n_shapelets=1, max_depth=None, min_samples_leaf=1, min_impurity_decrease=0.0, min_samples_split=2, min_shapelet_size=0.0, max_shapelet_size=1.0, metric='euclidean', metric_params=None, criterion='entropy', class_weight=None, random_state=None)[source]#

Bases: ShapeletTreeClassifier

An extra shapelet tree classifier.

Extra shapelet trees are constructed by sampling a distance threshold uniformly in the range [min(dist), max(dist)].

tree_[source]#

The tree representation

Type:

Tree

Parameters:
  • n_shapelets (int, optional) – The number of shapelets to sample at each node.

  • max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples

  • min_samples_split (int, optional) – The minimum number of samples to split an internal node

  • min_samples_leaf (int, optional) – The minimum number of samples in a leaf

  • criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split

  • min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value

  • min_shapelet_size (float, optional) – The minimum length of a sampled shapelet expressed as a fraction, computed as min(ceil(X.shape[-1] * min_shapelet_size), 2).

  • max_shapelet_size (float, optional) – The maximum length of a sampled shapelet, expressed as a fraction, computed as ceil(X.shape[-1] * max_shapelet_size).

  • metric ({"euclidean", "scaled_euclidean", "dtw", "scaled_dtw"}, optional) – Distance metric used to identify the best shapelet.

  • metric_params (dict, optional) – Parameters for the distance measure

  • class_weight (dict or "balanced", optional) –

    Weights associated with the labels

    • if dict, weights on the form {label: weight}

    • if “balanced” each class weight inversely proportional to the class frequency

    • if None, each class has equal weight

  • random_state (int or RandomState) –

    • If int, random_state is the seed used by the random number generator;

    • If RandomState instance, random_state is the random number generator;

    • If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree.ExtraShapeletTreeRegressor(*, n_shapelets=1, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, metric='euclidean', metric_params=None, criterion='squared_error', random_state=None)[source]#

Bases: ShapeletTreeRegressor

An extra shapelet tree regressor.

Extra shapelet trees are constructed by sampling a distance threshold uniformly in the range [min(dist), max(dist)].

tree_[source]#

The internal tree representation

Type:

Tree

Parameters:
  • n_shapelets (int, optional) – The number of shapelets to sample at each node.

  • max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples

  • min_samples_split (int, optional) – The minimum number of samples to split an internal node

  • min_samples_leaf (int, optional) – The minimum number of samples in a leaf

  • criterion ({"mse"}, optional) –

    The criterion used to evaluate the utility of a split

    Deprecated since version 1.0: Criterion “mse” was deprecated in v1.1 and will be removed in version 1.2. Use criterion=”squared_error” which is equivalent.

  • min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value

  • n_shapelets – The number of shapelets to sample at each node.

  • min_shapelet_size (float, optional) – The minimum length of a sampled shapelet expressed as a fraction, computed as min(ceil(X.shape[-1] * min_shapelet_size), 2).

  • max_shapelet_size (float, optional) – The maximum length of a sampled shapelet, expressed as a fraction, computed as ceil(X.shape[-1] * max_shapelet_size).

  • metric ({'euclidean', 'scaled_euclidean', 'scaled_dtw'}, optional) – Distance metric used to identify the best shapelet.

  • metric_params (dict, optional) – Parameters for the distance measure

  • random_state (int or RandomState) –

    • If int, random_state is the seed used by the random number generator;

    • If RandomState instance, random_state is the random number generator;

    • If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree.IntervalTreeClassifier(n_intervals='sqrt', *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='entropy', intervals='fixed', sample_size=0.5, min_size=0.0, max_size=1.0, summarizer='mean_var_slope', class_weight=None, random_state=None)[source]#

Bases: FeatureTreeClassifierMixin, BaseIntervalTree

An interval based tree classifier.

tree_[source]#

The internal tree structure.

Type:

Tree

Parameters:
  • n_intervals ({"log", "sqrt"}, int or float, optional) –

    The number of intervals to partition the time series into.

    • if “log”, the number of intervals is log2(n_timestep).

    • if “sqrt”, the number of intervals is sqrt(n_timestep).

    • if int, the number of intervals is n_intervals.

    • if float, the number of intervals is n_intervals * n_timestep, with 0 < n_intervals < 1.

  • max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

  • min_samples_split (int, optional) – The minimum number of samples to split an internal node.

  • min_samples_leaf (int, optional) – The minimum number of samples in a leaf.

  • min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value.

  • criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split.

  • intervals ({"fixed", "sample", "random"}, optional) –

    • if “fixed”, n_intervals non-overlapping intervals.

    • if “sample”, n_intervals * sample_size non-overlapping intervals.

    • if “random”, n_intervals possibly overlapping intervals of randomly sampled in [min_size * n_timestep, max_size * n_timestep]

  • sample_size (float, optional) – The fraction of intervals to sample at each node. Ignored unless intervals="sample".

  • min_size (float, optional) – The minmum interval size. Ignored unless intervals="random".

  • max_size (float, optional) – The maximum interval size. Ignored unless intervals="random".

  • summarizer (list or str, optional) –

    The summarization of each interval.

    • if list, a list of callables accepting a numpy array returing a float.

    • if str, a predified summarized. See wildboar.transform._interval._INTERVALS.keys() for all supported summarizers.

  • class_weight (dict or "balanced", optional) –

    Weights associated with the labels

    • if dict, weights on the form {label: weight}

    • if “balanced” each class weight inversely proportional to the class frequency

    • if None, each class has equal weight

  • random_state (int or RandomState) –

    • If int, random_state is the seed used by the random number generator

    • If RandomState instance, random_state is the random number generator

    • If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree.IntervalTreeRegressor(n_intervals='sqrt', *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='squared_error', intervals='fixed', sample_size=0.5, min_size=0.0, max_size=1.0, summarizer='mean_var_slope', random_state=None)[source]#

Bases: FeatureTreeRegressorMixin, BaseIntervalTree

An interval based tree regressor.

tree_[source]#

The internal tree structure.

Type:

Tree

Parameters:
  • n_intervals ({"log", "sqrt"}, int or float, optional) –

    The number of intervals to partition the time series into.

    • if “log”, the number of intervals is log2(n_timestep).

    • if “sqrt”, the number of intervals is sqrt(n_timestep).

    • if int, the number of intervals is n_intervals.

    • if float, the number of intervals is n_intervals * n_timestep, with 0 < n_intervals < 1.

  • max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

  • min_samples_split (int, optional) – The minimum number of samples to split an internal node.

  • min_samples_leaf (int, optional) – The minimum number of samples in a leaf.

  • min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value.

  • criterion ({"squared_error"}, optional) –

    The criterion used to evaluate the utility of a split.

    Deprecated since version 1.0: Criterion “mse” was deprecated in v1.1 and will be removed in version 1.2. Use criterion=”squared_error” which is equivalent.

  • intervals ({"fixed", "sample", "random"}, optional) –

    • if “fixed”, n_intervals non-overlapping intervals.

    • if “sample”, n_intervals * sample_size non-overlapping intervals.

    • if “random”, n_intervals possibly overlapping intervals of randomly sampled in [min_size * n_timestep, max_size * n_timestep]

  • sample_size (float, optional) – The fraction of intervals to sample at each node. Ignored unless intervals="sample".

  • min_size (float, optional) – The minmum interval size. Ignored unless intervals="random".

  • max_size (float, optional) – The maximum interval size. Ignored unless intervals="random".

  • summarizer (list or str, optional) –

    The summarization of each interval.

    • if list, a list of callables accepting a numpy array returing a float.

    • if str, a predified summarized. See wildboar.transform._interval._INTERVALS.keys() for all supported summarizers.

  • random_state (int or RandomState) –

    • If int, random_state is the seed used by the random number generator

    • If RandomState instance, random_state is the random number generator

    • If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree.PivotTreeClassifier(n_pivot='sqrt', *, metrics='all', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='entropy', class_weight=None, random_state=None)[source]#

Bases: FeatureTreeClassifierMixin, BasePivotTree

A tree classifier that uses pivot time series.

tree_[source]#

The internal tree representation

Type:

Tree

Parameters:
  • n_pivot (str or int, optional) – The number of pivot time series to sample at each node.

  • metrics (str, optional) – The metrics to sample from. Currently, we only support “all”.

  • max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

  • min_samples_split (int, optional) – The minimum number of samples to split an internal node.

  • min_samples_leaf (int, optional) – The minimum number of samples in a leaf.

  • min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value.

  • criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split.

  • class_weight (dict or "balanced", optional) –

    Weights associated with the labels.

    • if dict, weights on the form {label: weight}.

    • if “balanced” each class weight inversely proportional to the class frequency.

    • if None, each class has equal weight.

  • random_state (int or RandomState) –

    • If int, random_state is the seed used by the random number generator

    • If RandomState instance, random_state is the random number generator

    • If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree.ProximityTreeClassifier(n_pivot=1, *, criterion='entropy', pivot_sample='label', metric_sample='weighted', metric_factories='default', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, class_weight=None, random_state=None)[source]#

Bases: wildboar.tree.base.TreeClassifierMixin, wildboar.tree.base.BaseTree

A classifier that uses a k-branching tree based on pivot-time series.

Examples

>>> from wildboar.datasets import load_dataset
>>> from wildboar.tree import ProximityTreeClassifier
>>> x, y = load_dataset("GunPoint")
>>> f = ProximityTreeClassifier(
...     n_pivot=10,
...     metric_factories={
...         "rdtw": {"min_r": 0.1, "max_r": 0.25},
...         "msm": {"min_c": 0.1, "max_c": 100, "n": 20}
...     },
...     "max_criterion="gini"
... )
>>> f.fit(x, y)

References

Lucas, Benjamin, Ahmed Shifaz, Charlotte Pelletier, Lachlan O’Neill, Nayyar Zaidi, Bart Goethals, François Petitjean, and Geoffrey I. Webb. (2019)

Proximity forest: an effective and scalable distance-based classifier for time series. Data Mining and Knowledge Discovery

Parameters:
  • n_pivot (int, optional) – The number of pivots to sample at each node.

  • criterion ({"entropy", "gini"}, optional) – The impurity criterion.

  • pivot_sample ({"label", "uniform"}, optional) – The pivot sampling method.

  • metric_sample ({"uniform", "weighted"}, optional) – The metric sampling method.

  • metric_factories ("default", list or dict, optional) –

    The distance metrics.

    If dict, a dictionary where key is:

    • if str, a named distance factory (See _DISTANCE_FACTORIES.keys())

    • if callable, a function returning a list of DistanceMeasure-objects

    and where value is a dict of parameters to the factory.

    If list, a list of named factories or callables.

    If “default”, use the parameterization of (Lucas et.al, 2019)

  • max_depth (int, optional) – The maximum tree depth.

  • min_samples_split (int, optional) – The minimum number of samples to consider a split.

  • min_samples_leaf (int, optional) – The minimum number of samples in a leaf.

  • min_impurity_decrease (float, optional) – The minimum impurity decrease to build a sub-tree.

  • class_weight (dict or "balanced", optional) –

    Weights associated with the labels.

    • if dict, weights on the form {label: weight}.

    • if “balanced” each class weight inversely proportional to the class frequency.

    • if None, each class has equal weight.

  • random_state (int or RandomState) –

    • If int, random_state is the seed used by the random number generator

    • If RandomState instance, random_state is the random number generator

    • If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree.RocketTreeClassifier(n_kernels=10, *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='entropy', sampling='normal', sampling_params=None, kernel_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, class_weight=None, random_state=None)[source]#

Bases: FeatureTreeClassifierMixin, BaseRocketTree

A tree classifier that uses random convolutions as features.

tree_[source]#

The internal tree representation.

Type:

Tree

Parameters:
  • n_kernels (int, optional) – The number of kernels to sample at each node.

  • max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

  • min_samples_split (int, optional) – The minimum number of samples to split an internal node.

  • min_samples_leaf (int, optional) – The minimum number of samples in a leaf.

  • min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value.

  • criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split.

  • sampling ({"normal", "uniform", "shapelet"}, optional) –

    The sampling of convolutional filters.

    • if “normal”, sample filter according to a normal distribution with mean and scale.

    • if “uniform”, sample filter according to a uniform distribution with lower and upper.

    • if “shapelet”, sample filters as subsequences in the training data.

  • sampling_params (dict, optional) –

    The parameters for the sampling.

    • if “normal”, {"mean": float, "scale": float}, defaults to

      {"mean": 0, "scale": 1}.

    • if “uniform”, {"lower": float, "upper": float}, defaults to

      {"lower": -1, "upper": 1}.

  • kernel_size ((min_size, max_size) or array-like, optional) –

    The kernel size.

    • if (min_size, max_size), all kernel sizes between min_size * n_timestep and max_size * n_timestep

    • if array-like, all defined kernel sizes.

  • bias_prob (float, optional) – The probability of using a bias term.

  • normalize_prob (float, optional) – The probability of performing normalization.

  • padding_prob (float, optional) – The probability of padding with zeros.

  • class_weight (dict or "balanced", optional) –

    Weights associated with the labels

    • if dict, weights on the form {label: weight}

    • if “balanced” each class weight inversely proportional to the class frequency

    • if None, each class has equal weight

  • random_state (int or RandomState) –

    • If int, random_state is the seed used by the random number generator

    • If RandomState instance, random_state is the random number generator

    • If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree.RocketTreeRegressor(n_kernels=10, *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='squared_error', sampling='normal', sampling_params=None, kernel_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, random_state=None)[source]#

Bases: FeatureTreeRegressorMixin, BaseRocketTree

A tree regressor that uses random convolutions as features.

tree_[source]#

The internal tree representation.

Type:

Tree

Parameters:
  • n_kernels (int, optional) – The number of kernels to sample at each node.

  • max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

  • min_samples_split (int, optional) – The minimum number of samples to split an internal node.

  • min_samples_leaf (int, optional) – The minimum number of samples in a leaf.

  • min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value.

  • criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split.

  • sampling ({"normal", "uniform", "shapelet"}, optional) –

    The sampling of convolutional filters.

    • if “normal”, sample filter according to a normal distribution with mean and scale.

    • if “uniform”, sample filter according to a uniform distribution with lower and upper.

    • if “shapelet”, sample filters as subsequences in the training data.

  • sampling_params (dict, optional) –

    The parameters for the sampling.

    • if “normal”, {"mean": float, "scale": float}, defaults to

      {"mean": 0, "scale": 1}.

    • if “uniform”, {"lower": float, "upper": float}, defaults to

      {"lower": -1, "upper": 1}.

  • kernel_size ((min_size, max_size) or array-like, optional) –

    The kernel size.

    • if (min_size, max_size), all kernel sizes between min_size * n_timestep and max_size * n_timestep

    • if array-like, all defined kernel sizes.

  • bias_prob (float, optional) – The probability of using a bias term.

  • normalize_prob (float, optional) – The probability of performing normalization.

  • padding_prob (float, optional) – The probability of padding with zeros.

  • random_state (int or RandomState) –

    • If int, random_state is the seed used by the random number generator

    • If RandomState instance, random_state is the random number generator

    • If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree.ShapeletTreeClassifier(*, n_shapelets='warn', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, alpha=None, metric='euclidean', metric_params=None, criterion='entropy', class_weight=None, random_state=None)[source]#

Bases: DynamicTreeMixin, FeatureTreeClassifierMixin, BaseShapeletTree

A shapelet tree classifier.

tree_[source]#

The tree data structure used internally

Type:

Tree

classes_[source]#

The class labels

Type:

ndarray of shape (n_classes,)

n_classes_[source]#

The number of class labels

Type:

int

See also

ShapeletTreeRegressor

A shapelet tree regressor.

ExtraShapeletTreeClassifier

An extra random shapelet tree classifier.

Parameters:
  • max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples

  • min_samples_split (int, optional) – The minimum number of samples to split an internal node

  • min_samples_leaf (int, optional) – The minimum number of samples in a leaf

  • criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split

  • min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value

  • n_shapelets (int, optional) – The number of shapelets to sample at each node.

  • min_shapelet_size (float, optional) – The minimum length of a sampled shapelet expressed as a fraction, computed as min(ceil(X.shape[-1] * min_shapelet_size), 2).

  • max_shapelet_size (float, optional) – The maximum length of a sampled shapelet, expressed as a fraction, computed as ceil(X.shape[-1] * max_shapelet_size).

  • alpha (float, optional) –

    Dynamically decrease the number of sampled shapelets at each node according to the current depth.

    • if \(alpha < 0\), the number of sampled shapelets decrease from n_shapelets towards 1 with increased depth.

    • if \(alpha > 0\), the number of sampled shapelets increase from 1 towards n_shapelets with increased depth.

    • if None, the number of sampled shapelets are the same independeth of depth.

  • metric ({"euclidean", "scaled_euclidean", "dtw", "scaled_dtw"}, optional) – Distance metric used to identify the best shapelet.

  • metric_params (dict, optional) – Parameters for the distance measure

  • class_weight (dict or "balanced", optional) –

    Weights associated with the labels

    • if dict, weights on the form {label: weight}

    • if “balanced” each class weight inversely proportional to the class frequency

    • if None, each class has equal weight

  • random_state (int or RandomState) –

    • If int, random_state is the seed used by the random number generator;

    • If RandomState instance, random_state is the random number generator;

    • If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree.ShapeletTreeRegressor(*, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, n_shapelets='warn', min_shapelet_size=0, max_shapelet_size=1, alpha=None, metric='euclidean', metric_params=None, criterion='squared_error', random_state=None)[source]#

Bases: DynamicTreeMixin, FeatureTreeRegressorMixin, BaseShapeletTree

A shapelet tree regressor.

tree_[source]#

The internal tree representation

Type:

Tree

Parameters:
  • max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples

  • min_samples_split (int, optional) – The minimum number of samples to split an internal node

  • min_samples_leaf (int, optional) – The minimum number of samples in a leaf

  • criterion ({"squared_error"}, optional) –

    The criterion used to evaluate the utility of a split

    Deprecated since version 1.0: Criterion “mse” was deprecated in v1.1 and will be removed in version 1.2. Use criterion=”squared_error” which is equivalent.

  • min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value

  • n_shapelets (int, optional) – The number of shapelets to sample at each node.

  • min_shapelet_size (float, optional) – The minimum length of a sampled shapelet expressed as a fraction, computed as min(ceil(X.shape[-1] * min_shapelet_size), 2).

  • max_shapelet_size (float, optional) – The maximum length of a sampled shapelet, expressed as a fraction, computed as ceil(X.shape[-1] * max_shapelet_size).

  • alpha (float, optional) –

    Dynamically decrease the number of sampled shapelets at each node according to the current depth.

    \[w = 1 - e^{-|alpha| * depth}\]
    • if \(alpha < 0\), the number of sampled shapelets decrease from n_shapelets towards 1 with increased depth.

      \[n_shapelets * (1 - w)\]
    • if \(alpha > 0\), the number of sampled shapelets increase from 1 towards n_shapelets with increased depth.

      \[n_shapelets * w\]
    • if None, the number of sampled shapelets are the same independeth of depth.

  • metric (str, optional) –

    Distance metric used to identify the best shapelet.

    See distance._SUBSEQUENCE_DISTANCE_MEASURE.keys() for a list of supported metrics.

  • metric_params (dict, optional) –

    Parameters for the distance measure.

    Read more about the parameters in the User guide.

  • random_state (int or RandomState) –

    • If int, random_state is the seed used by the random number generator

    • If RandomState instance, random_state is the random number generator

    • If None, the random number generator is the RandomState instance used by np.random.