wildboar.tree._tree#

Module Contents#

Classes#

BaseFeatureTree

Base class for trees using feature engineering.

BaseIntervalTree

Base class for trees using feature engineering.

BasePivotTree

Base class for trees using feature engineering.

BaseRocketTree

Base class for trees using feature engineering.

BaseShapeletTree

Base class for trees using feature engineering.

DynamicTreeMixin

ExtraShapeletTreeClassifier

An extra shapelet tree classifier.

ExtraShapeletTreeRegressor

An extra shapelet tree regressor.

FeatureTreeClassifierMixin

Mixin for classification trees.

FeatureTreeRegressorMixin

Mixin for regression trees.

IntervalTreeClassifier

An interval based tree classifier.

IntervalTreeRegressor

An interval based tree regressor.

PivotTreeClassifier

A tree classifier that uses pivot time series.

RocketTreeClassifier

A tree classifier that uses random convolutions as features.

RocketTreeRegressor

A tree regressor that uses random convolutions as features.

ShapeletTreeClassifier

A shapelet tree classifier.

ShapeletTreeRegressor

A shapelet tree regressor.

Attributes#

class wildboar.tree._tree.BaseFeatureTree(*, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0)[source]#

Bases: wildboar.tree.base.BaseTree

Base class for trees using feature engineering.

class wildboar.tree._tree.BaseIntervalTree(n_intervals='sqrt', *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, intervals='fixed', sample_size=0.5, min_size=0.0, max_size=1.0, summarizer='mean_var_slope', random_state=None)[source]#

Bases: BaseFeatureTree

Base class for trees using feature engineering.

class wildboar.tree._tree.BasePivotTree(n_pivot='sqrt', *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, metrics='all', random_state=None)[source]#

Bases: BaseFeatureTree

Base class for trees using feature engineering.

class wildboar.tree._tree.BaseRocketTree(n_kernels=10, *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='entropy', sampling='normal', sampling_params=None, kernel_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, random_state=None)[source]#

Bases: BaseFeatureTree

Base class for trees using feature engineering.

class wildboar.tree._tree.BaseShapeletTree(*, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, n_shapelets='warn', min_shapelet_size=0.0, max_shapelet_size=1.0, metric='euclidean', metric_params=None, random_state=None)[source]#

Bases: BaseFeatureTree

Base class for trees using feature engineering.

class wildboar.tree._tree.DynamicTreeMixin[source]#
class wildboar.tree._tree.ExtraShapeletTreeClassifier(*, n_shapelets=1, max_depth=None, min_samples_leaf=1, min_impurity_decrease=0.0, min_samples_split=2, min_shapelet_size=0.0, max_shapelet_size=1.0, metric='euclidean', metric_params=None, criterion='entropy', class_weight=None, random_state=None)[source]#

Bases: ShapeletTreeClassifier

An extra shapelet tree classifier.

Extra shapelet trees are constructed by sampling a distance threshold uniformly in the range [min(dist), max(dist)].

tree_[source]#

The tree representation

Type:

Tree

Parameters:
  • n_shapelets (int, optional) – The number of shapelets to sample at each node.

  • max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples

  • min_samples_split (int, optional) – The minimum number of samples to split an internal node

  • min_samples_leaf (int, optional) – The minimum number of samples in a leaf

  • criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split

  • min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value

  • min_shapelet_size (float, optional) – The minimum length of a sampled shapelet expressed as a fraction, computed as min(ceil(X.shape[-1] * min_shapelet_size), 2).

  • max_shapelet_size (float, optional) – The maximum length of a sampled shapelet, expressed as a fraction, computed as ceil(X.shape[-1] * max_shapelet_size).

  • metric ({"euclidean", "scaled_euclidean", "dtw", "scaled_dtw"}, optional) – Distance metric used to identify the best shapelet.

  • metric_params (dict, optional) – Parameters for the distance measure

  • class_weight (dict or "balanced", optional) –

    Weights associated with the labels

    • if dict, weights on the form {label: weight}

    • if “balanced” each class weight inversely proportional to the class frequency

    • if None, each class has equal weight

  • random_state (int or RandomState) –

    • If int, random_state is the seed used by the random number generator;

    • If RandomState instance, random_state is the random number generator;

    • If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree._tree.ExtraShapeletTreeRegressor(*, n_shapelets=1, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, metric='euclidean', metric_params=None, criterion='squared_error', random_state=None)[source]#

Bases: ShapeletTreeRegressor

An extra shapelet tree regressor.

Extra shapelet trees are constructed by sampling a distance threshold uniformly in the range [min(dist), max(dist)].

tree_[source]#

The internal tree representation

Type:

Tree

Parameters:
  • n_shapelets (int, optional) – The number of shapelets to sample at each node.

  • max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples

  • min_samples_split (int, optional) – The minimum number of samples to split an internal node

  • min_samples_leaf (int, optional) – The minimum number of samples in a leaf

  • criterion ({"mse"}, optional) –

    The criterion used to evaluate the utility of a split

    Deprecated since version 1.0: Criterion “mse” was deprecated in v1.1 and will be removed in version 1.2. Use criterion=”squared_error” which is equivalent.

  • min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value

  • n_shapelets – The number of shapelets to sample at each node.

  • min_shapelet_size (float, optional) – The minimum length of a sampled shapelet expressed as a fraction, computed as min(ceil(X.shape[-1] * min_shapelet_size), 2).

  • max_shapelet_size (float, optional) – The maximum length of a sampled shapelet, expressed as a fraction, computed as ceil(X.shape[-1] * max_shapelet_size).

  • metric ({'euclidean', 'scaled_euclidean', 'scaled_dtw'}, optional) – Distance metric used to identify the best shapelet.

  • metric_params (dict, optional) – Parameters for the distance measure

  • random_state (int or RandomState) –

    • If int, random_state is the seed used by the random number generator;

    • If RandomState instance, random_state is the random number generator;

    • If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree._tree.FeatureTreeClassifierMixin[source]#

Bases: wildboar.tree.base.TreeClassifierMixin

Mixin for classification trees.

class wildboar.tree._tree.FeatureTreeRegressorMixin[source]#

Bases: wildboar.tree.base.TreeRegressorMixin

Mixin for regression trees.

class wildboar.tree._tree.IntervalTreeClassifier(n_intervals='sqrt', *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='entropy', intervals='fixed', sample_size=0.5, min_size=0.0, max_size=1.0, summarizer='mean_var_slope', class_weight=None, random_state=None)[source]#

Bases: FeatureTreeClassifierMixin, BaseIntervalTree

An interval based tree classifier.

tree_[source]#

The internal tree structure.

Type:

Tree

Parameters:
  • n_intervals ({"log", "sqrt"}, int or float, optional) –

    The number of intervals to partition the time series into.

    • if “log”, the number of intervals is log2(n_timestep).

    • if “sqrt”, the number of intervals is sqrt(n_timestep).

    • if int, the number of intervals is n_intervals.

    • if float, the number of intervals is n_intervals * n_timestep, with 0 < n_intervals < 1.

  • max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

  • min_samples_split (int, optional) – The minimum number of samples to split an internal node.

  • min_samples_leaf (int, optional) – The minimum number of samples in a leaf.

  • min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value.

  • criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split.

  • intervals ({"fixed", "sample", "random"}, optional) –

    • if “fixed”, n_intervals non-overlapping intervals.

    • if “sample”, n_intervals * sample_size non-overlapping intervals.

    • if “random”, n_intervals possibly overlapping intervals of randomly sampled in [min_size * n_timestep, max_size * n_timestep]

  • sample_size (float, optional) – The fraction of intervals to sample at each node. Ignored unless intervals="sample".

  • min_size (float, optional) – The minmum interval size. Ignored unless intervals="random".

  • max_size (float, optional) – The maximum interval size. Ignored unless intervals="random".

  • summarizer (list or str, optional) –

    The summarization of each interval.

    • if list, a list of callables accepting a numpy array returing a float.

    • if str, a predified summarized. See wildboar.transform._interval._INTERVALS.keys() for all supported summarizers.

  • class_weight (dict or "balanced", optional) –

    Weights associated with the labels

    • if dict, weights on the form {label: weight}

    • if “balanced” each class weight inversely proportional to the class frequency

    • if None, each class has equal weight

  • random_state (int or RandomState) –

    • If int, random_state is the seed used by the random number generator

    • If RandomState instance, random_state is the random number generator

    • If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree._tree.IntervalTreeRegressor(n_intervals='sqrt', *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='squared_error', intervals='fixed', sample_size=0.5, min_size=0.0, max_size=1.0, summarizer='mean_var_slope', random_state=None)[source]#

Bases: FeatureTreeRegressorMixin, BaseIntervalTree

An interval based tree regressor.

tree_[source]#

The internal tree structure.

Type:

Tree

Parameters:
  • n_intervals ({"log", "sqrt"}, int or float, optional) –

    The number of intervals to partition the time series into.

    • if “log”, the number of intervals is log2(n_timestep).

    • if “sqrt”, the number of intervals is sqrt(n_timestep).

    • if int, the number of intervals is n_intervals.

    • if float, the number of intervals is n_intervals * n_timestep, with 0 < n_intervals < 1.

  • max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

  • min_samples_split (int, optional) – The minimum number of samples to split an internal node.

  • min_samples_leaf (int, optional) – The minimum number of samples in a leaf.

  • min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value.

  • criterion ({"squared_error"}, optional) –

    The criterion used to evaluate the utility of a split.

    Deprecated since version 1.0: Criterion “mse” was deprecated in v1.1 and will be removed in version 1.2. Use criterion=”squared_error” which is equivalent.

  • intervals ({"fixed", "sample", "random"}, optional) –

    • if “fixed”, n_intervals non-overlapping intervals.

    • if “sample”, n_intervals * sample_size non-overlapping intervals.

    • if “random”, n_intervals possibly overlapping intervals of randomly sampled in [min_size * n_timestep, max_size * n_timestep]

  • sample_size (float, optional) – The fraction of intervals to sample at each node. Ignored unless intervals="sample".

  • min_size (float, optional) – The minmum interval size. Ignored unless intervals="random".

  • max_size (float, optional) – The maximum interval size. Ignored unless intervals="random".

  • summarizer (list or str, optional) –

    The summarization of each interval.

    • if list, a list of callables accepting a numpy array returing a float.

    • if str, a predified summarized. See wildboar.transform._interval._INTERVALS.keys() for all supported summarizers.

  • random_state (int or RandomState) –

    • If int, random_state is the seed used by the random number generator

    • If RandomState instance, random_state is the random number generator

    • If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree._tree.PivotTreeClassifier(n_pivot='sqrt', *, metrics='all', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='entropy', class_weight=None, random_state=None)[source]#

Bases: FeatureTreeClassifierMixin, BasePivotTree

A tree classifier that uses pivot time series.

tree_[source]#

The internal tree representation

Type:

Tree

Parameters:
  • n_pivot (str or int, optional) – The number of pivot time series to sample at each node.

  • metrics (str, optional) – The metrics to sample from. Currently, we only support “all”.

  • max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

  • min_samples_split (int, optional) – The minimum number of samples to split an internal node.

  • min_samples_leaf (int, optional) – The minimum number of samples in a leaf.

  • min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value.

  • criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split.

  • class_weight (dict or "balanced", optional) –

    Weights associated with the labels.

    • if dict, weights on the form {label: weight}.

    • if “balanced” each class weight inversely proportional to the class frequency.

    • if None, each class has equal weight.

  • random_state (int or RandomState) –

    • If int, random_state is the seed used by the random number generator

    • If RandomState instance, random_state is the random number generator

    • If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree._tree.RocketTreeClassifier(n_kernels=10, *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='entropy', sampling='normal', sampling_params=None, kernel_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, class_weight=None, random_state=None)[source]#

Bases: FeatureTreeClassifierMixin, BaseRocketTree

A tree classifier that uses random convolutions as features.

tree_[source]#

The internal tree representation.

Type:

Tree

Parameters:
  • n_kernels (int, optional) – The number of kernels to sample at each node.

  • max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

  • min_samples_split (int, optional) – The minimum number of samples to split an internal node.

  • min_samples_leaf (int, optional) – The minimum number of samples in a leaf.

  • min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value.

  • criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split.

  • sampling ({"normal", "uniform", "shapelet"}, optional) –

    The sampling of convolutional filters.

    • if “normal”, sample filter according to a normal distribution with mean and scale.

    • if “uniform”, sample filter according to a uniform distribution with lower and upper.

    • if “shapelet”, sample filters as subsequences in the training data.

  • sampling_params (dict, optional) –

    The parameters for the sampling.

    • if “normal”, {"mean": float, "scale": float}, defaults to

      {"mean": 0, "scale": 1}.

    • if “uniform”, {"lower": float, "upper": float}, defaults to

      {"lower": -1, "upper": 1}.

  • kernel_size ((min_size, max_size) or array-like, optional) –

    The kernel size.

    • if (min_size, max_size), all kernel sizes between min_size * n_timestep and max_size * n_timestep

    • if array-like, all defined kernel sizes.

  • bias_prob (float, optional) – The probability of using a bias term.

  • normalize_prob (float, optional) – The probability of performing normalization.

  • padding_prob (float, optional) – The probability of padding with zeros.

  • class_weight (dict or "balanced", optional) –

    Weights associated with the labels

    • if dict, weights on the form {label: weight}

    • if “balanced” each class weight inversely proportional to the class frequency

    • if None, each class has equal weight

  • random_state (int or RandomState) –

    • If int, random_state is the seed used by the random number generator

    • If RandomState instance, random_state is the random number generator

    • If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree._tree.RocketTreeRegressor(n_kernels=10, *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='squared_error', sampling='normal', sampling_params=None, kernel_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, random_state=None)[source]#

Bases: FeatureTreeRegressorMixin, BaseRocketTree

A tree regressor that uses random convolutions as features.

tree_[source]#

The internal tree representation.

Type:

Tree

Parameters:
  • n_kernels (int, optional) – The number of kernels to sample at each node.

  • max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

  • min_samples_split (int, optional) – The minimum number of samples to split an internal node.

  • min_samples_leaf (int, optional) – The minimum number of samples in a leaf.

  • min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value.

  • criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split.

  • sampling ({"normal", "uniform", "shapelet"}, optional) –

    The sampling of convolutional filters.

    • if “normal”, sample filter according to a normal distribution with mean and scale.

    • if “uniform”, sample filter according to a uniform distribution with lower and upper.

    • if “shapelet”, sample filters as subsequences in the training data.

  • sampling_params (dict, optional) –

    The parameters for the sampling.

    • if “normal”, {"mean": float, "scale": float}, defaults to

      {"mean": 0, "scale": 1}.

    • if “uniform”, {"lower": float, "upper": float}, defaults to

      {"lower": -1, "upper": 1}.

  • kernel_size ((min_size, max_size) or array-like, optional) –

    The kernel size.

    • if (min_size, max_size), all kernel sizes between min_size * n_timestep and max_size * n_timestep

    • if array-like, all defined kernel sizes.

  • bias_prob (float, optional) – The probability of using a bias term.

  • normalize_prob (float, optional) – The probability of performing normalization.

  • padding_prob (float, optional) – The probability of padding with zeros.

  • random_state (int or RandomState) –

    • If int, random_state is the seed used by the random number generator

    • If RandomState instance, random_state is the random number generator

    • If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree._tree.ShapeletTreeClassifier(*, n_shapelets='warn', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, alpha=None, metric='euclidean', metric_params=None, criterion='entropy', class_weight=None, random_state=None)[source]#

Bases: DynamicTreeMixin, FeatureTreeClassifierMixin, BaseShapeletTree

A shapelet tree classifier.

tree_[source]#

The tree data structure used internally

Type:

Tree

classes_[source]#

The class labels

Type:

ndarray of shape (n_classes,)

n_classes_[source]#

The number of class labels

Type:

int

See also

ShapeletTreeRegressor

A shapelet tree regressor.

ExtraShapeletTreeClassifier

An extra random shapelet tree classifier.

Parameters:
  • max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples

  • min_samples_split (int, optional) – The minimum number of samples to split an internal node

  • min_samples_leaf (int, optional) – The minimum number of samples in a leaf

  • criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split

  • min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value

  • n_shapelets (int, optional) – The number of shapelets to sample at each node.

  • min_shapelet_size (float, optional) – The minimum length of a sampled shapelet expressed as a fraction, computed as min(ceil(X.shape[-1] * min_shapelet_size), 2).

  • max_shapelet_size (float, optional) – The maximum length of a sampled shapelet, expressed as a fraction, computed as ceil(X.shape[-1] * max_shapelet_size).

  • alpha (float, optional) –

    Dynamically decrease the number of sampled shapelets at each node according to the current depth.

    • if \(alpha < 0\), the number of sampled shapelets decrease from n_shapelets towards 1 with increased depth.

    • if \(alpha > 0\), the number of sampled shapelets increase from 1 towards n_shapelets with increased depth.

    • if None, the number of sampled shapelets are the same independeth of depth.

  • metric ({"euclidean", "scaled_euclidean", "dtw", "scaled_dtw"}, optional) – Distance metric used to identify the best shapelet.

  • metric_params (dict, optional) – Parameters for the distance measure

  • class_weight (dict or "balanced", optional) –

    Weights associated with the labels

    • if dict, weights on the form {label: weight}

    • if “balanced” each class weight inversely proportional to the class frequency

    • if None, each class has equal weight

  • random_state (int or RandomState) –

    • If int, random_state is the seed used by the random number generator;

    • If RandomState instance, random_state is the random number generator;

    • If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree._tree.ShapeletTreeRegressor(*, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, n_shapelets='warn', min_shapelet_size=0, max_shapelet_size=1, alpha=None, metric='euclidean', metric_params=None, criterion='squared_error', random_state=None)[source]#

Bases: DynamicTreeMixin, FeatureTreeRegressorMixin, BaseShapeletTree

A shapelet tree regressor.

tree_[source]#

The internal tree representation

Type:

Tree

Parameters:
  • max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples

  • min_samples_split (int, optional) – The minimum number of samples to split an internal node

  • min_samples_leaf (int, optional) – The minimum number of samples in a leaf

  • criterion ({"squared_error"}, optional) –

    The criterion used to evaluate the utility of a split

    Deprecated since version 1.0: Criterion “mse” was deprecated in v1.1 and will be removed in version 1.2. Use criterion=”squared_error” which is equivalent.

  • min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value

  • n_shapelets (int, optional) – The number of shapelets to sample at each node.

  • min_shapelet_size (float, optional) – The minimum length of a sampled shapelet expressed as a fraction, computed as min(ceil(X.shape[-1] * min_shapelet_size), 2).

  • max_shapelet_size (float, optional) – The maximum length of a sampled shapelet, expressed as a fraction, computed as ceil(X.shape[-1] * max_shapelet_size).

  • alpha (float, optional) –

    Dynamically decrease the number of sampled shapelets at each node according to the current depth.

    \[w = 1 - e^{-|alpha| * depth}\]
    • if \(alpha < 0\), the number of sampled shapelets decrease from n_shapelets towards 1 with increased depth.

      \[n_shapelets * (1 - w)\]
    • if \(alpha > 0\), the number of sampled shapelets increase from 1 towards n_shapelets with increased depth.

      \[n_shapelets * w\]
    • if None, the number of sampled shapelets are the same independeth of depth.

  • metric (str, optional) –

    Distance metric used to identify the best shapelet.

    See distance._SUBSEQUENCE_DISTANCE_MEASURE.keys() for a list of supported metrics.

  • metric_params (dict, optional) –

    Parameters for the distance measure.

    Read more about the parameters in the User guide.

  • random_state (int or RandomState) –

    • If int, random_state is the seed used by the random number generator

    • If RandomState instance, random_state is the random number generator

    • If None, the random number generator is the RandomState instance used by np.random.

wildboar.tree._tree.CLF_CRITERION[source]#
wildboar.tree._tree.REG_CRITERION[source]#