`wildboar.tree._tree`#

Module Contents#

Classes#

`BaseFeatureTree`	Base class for trees using feature engineering.
`BaseIntervalTree`	Base class for trees using feature engineering.
`BasePivotTree`	Base class for trees using feature engineering.
`BaseRocketTree`	Base class for trees using feature engineering.
`BaseShapeletTree`	Base class for trees using feature engineering.
`DynamicTreeMixin`
`ExtraShapeletTreeClassifier`	An extra shapelet tree classifier.
`ExtraShapeletTreeRegressor`	An extra shapelet tree regressor.
`FeatureTreeClassifierMixin`	Mixin for classification trees.
`FeatureTreeRegressorMixin`	Mixin for regression trees.
`IntervalTreeClassifier`	An interval based tree classifier.
`IntervalTreeRegressor`	An interval based tree regressor.
`PivotTreeClassifier`	A tree classifier that uses pivot time series.
`RocketTreeClassifier`	A tree classifier that uses random convolutions as features.
`RocketTreeRegressor`	A tree regressor that uses random convolutions as features.
`ShapeletTreeClassifier`	A shapelet tree classifier.
`ShapeletTreeRegressor`	A shapelet tree regressor.

Attributes#

`CLF_CRITERION`
`REG_CRITERION`

class wildboar.tree._tree.BaseFeatureTree(*, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0)[source]#

Bases: wildboar.tree.base.BaseTree

Base class for trees using feature engineering.

class wildboar.tree._tree.BaseIntervalTree(n_intervals='sqrt', *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, intervals='fixed', sample_size=0.5, min_size=0.0, max_size=1.0, summarizer='mean_var_slope', random_state=None)[source]#

Bases: BaseFeatureTree

Base class for trees using feature engineering.

class wildboar.tree._tree.BasePivotTree(n_pivot='sqrt', *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, metrics='all', random_state=None)[source]#

Bases: BaseFeatureTree

Base class for trees using feature engineering.

class wildboar.tree._tree.BaseRocketTree(n_kernels=10, *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='entropy', sampling='normal', sampling_params=None, kernel_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, random_state=None)[source]#

Bases: BaseFeatureTree

Base class for trees using feature engineering.

class wildboar.tree._tree.BaseShapeletTree(*, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, n_shapelets='warn', min_shapelet_size=0.0, max_shapelet_size=1.0, metric='euclidean', metric_params=None, random_state=None)[source]#

Bases: BaseFeatureTree

Base class for trees using feature engineering.

class wildboar.tree._tree.DynamicTreeMixin[source]#

class wildboar.tree._tree.ExtraShapeletTreeClassifier(*, n_shapelets=1, max_depth=None, min_samples_leaf=1, min_impurity_decrease=0.0, min_samples_split=2, min_shapelet_size=0.0, max_shapelet_size=1.0, metric='euclidean', metric_params=None, criterion='entropy', class_weight=None, random_state=None)[source]#

Bases: ShapeletTreeClassifier

An extra shapelet tree classifier.

Extra shapelet trees are constructed by sampling a distance threshold uniformly in the range [min(dist), max(dist)].

tree_[source]#

The tree representation

Type:: Tree

Parameters:

n_shapelets (int, optional) – The number of shapelets to sample at each node.
max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples
min_samples_split (int, optional) – The minimum number of samples to split an internal node
min_samples_leaf (int, optional) – The minimum number of samples in a leaf
criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split
min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value
min_shapelet_size (float, optional) – The minimum length of a sampled shapelet expressed as a fraction, computed as min(ceil(X.shape[-1] * min_shapelet_size), 2).
max_shapelet_size (float, optional) – The maximum length of a sampled shapelet, expressed as a fraction, computed as ceil(X.shape[-1] * max_shapelet_size).
metric ({"euclidean", "scaled_euclidean", "dtw", "scaled_dtw"}, optional) – Distance metric used to identify the best shapelet.
metric_params (dict, optional) – Parameters for the distance measure
class_weight (dict or "balanced", optional) –
Weights associated with the labels
- if dict, weights on the form {label: weight}
- if “balanced” each class weight inversely proportional to the class frequency
- if None, each class has equal weight
random_state (int or RandomState) –
- If int, random_state is the seed used by the random number generator;
- If RandomState instance, random_state is the random number generator;
- If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree._tree.ExtraShapeletTreeRegressor(*, n_shapelets=1, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, metric='euclidean', metric_params=None, criterion='squared_error', random_state=None)[source]#

Bases: ShapeletTreeRegressor

An extra shapelet tree regressor.

Extra shapelet trees are constructed by sampling a distance threshold uniformly in the range [min(dist), max(dist)].

tree_[source]#

The internal tree representation

Type:: Tree

Parameters:

n_shapelets (int, optional) – The number of shapelets to sample at each node.
max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples
min_samples_split (int, optional) – The minimum number of samples to split an internal node
min_samples_leaf (int, optional) – The minimum number of samples in a leaf
criterion ({"mse"}, optional) –
The criterion used to evaluate the utility of a split

Deprecated since version 1.0: Criterion “mse” was deprecated in v1.1 and will be removed in version 1.2. Use criterion=”squared_error” which is equivalent.
min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value
n_shapelets – The number of shapelets to sample at each node.
min_shapelet_size (float, optional) – The minimum length of a sampled shapelet expressed as a fraction, computed as min(ceil(X.shape[-1] * min_shapelet_size), 2).
max_shapelet_size (float, optional) – The maximum length of a sampled shapelet, expressed as a fraction, computed as ceil(X.shape[-1] * max_shapelet_size).
metric ({'euclidean', 'scaled_euclidean', 'scaled_dtw'}, optional) – Distance metric used to identify the best shapelet.
metric_params (dict, optional) – Parameters for the distance measure
random_state (int or RandomState) –
- If int, random_state is the seed used by the random number generator;
- If RandomState instance, random_state is the random number generator;
- If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree._tree.FeatureTreeClassifierMixin[source]#

Bases: wildboar.tree.base.TreeClassifierMixin

Mixin for classification trees.

class wildboar.tree._tree.FeatureTreeRegressorMixin[source]#

Bases: wildboar.tree.base.TreeRegressorMixin

Mixin for regression trees.

class wildboar.tree._tree.IntervalTreeClassifier(n_intervals='sqrt', *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='entropy', intervals='fixed', sample_size=0.5, min_size=0.0, max_size=1.0, summarizer='mean_var_slope', class_weight=None, random_state=None)[source]#

Bases: FeatureTreeClassifierMixin, BaseIntervalTree

An interval based tree classifier.

tree_[source]#

The internal tree structure.

Type:: Tree

Parameters:

n_intervals ({"log", "sqrt"}, int or float, optional) –
The number of intervals to partition the time series into.
- if “log”, the number of intervals is log2(n_timestep).
- if “sqrt”, the number of intervals is sqrt(n_timestep).
- if int, the number of intervals is n_intervals.
- if float, the number of intervals is n_intervals * n_timestep, with 0 < n_intervals < 1.
max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
min_samples_split (int, optional) – The minimum number of samples to split an internal node.
min_samples_leaf (int, optional) – The minimum number of samples in a leaf.
min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value.
criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split.
intervals ({"fixed", "sample", "random"}, optional) –
- if “fixed”, n_intervals non-overlapping intervals.
- if “sample”, n_intervals * sample_size non-overlapping intervals.
- if “random”, n_intervals possibly overlapping intervals of randomly sampled in [min_size * n_timestep, max_size * n_timestep]
sample_size (float, optional) – The fraction of intervals to sample at each node. Ignored unless intervals="sample".
min_size (float, optional) – The minmum interval size. Ignored unless intervals="random".
max_size (float, optional) – The maximum interval size. Ignored unless intervals="random".
summarizer (list or str, optional) –
The summarization of each interval.
- if list, a list of callables accepting a numpy array returing a float.
- if str, a predified summarized. See wildboar.transform._interval._INTERVALS.keys() for all supported summarizers.
class_weight (dict or "balanced", optional) –
Weights associated with the labels
- if dict, weights on the form {label: weight}
- if “balanced” each class weight inversely proportional to the class frequency
- if None, each class has equal weight
random_state (int or RandomState) –
- If int, random_state is the seed used by the random number generator
- If RandomState instance, random_state is the random number generator
- If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree._tree.IntervalTreeRegressor(n_intervals='sqrt', *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='squared_error', intervals='fixed', sample_size=0.5, min_size=0.0, max_size=1.0, summarizer='mean_var_slope', random_state=None)[source]#

Bases: FeatureTreeRegressorMixin, BaseIntervalTree

An interval based tree regressor.

tree_[source]#

The internal tree structure.

Type:: Tree

Parameters:

n_intervals ({"log", "sqrt"}, int or float, optional) –
The number of intervals to partition the time series into.
- if “log”, the number of intervals is log2(n_timestep).
- if “sqrt”, the number of intervals is sqrt(n_timestep).
- if int, the number of intervals is n_intervals.
- if float, the number of intervals is n_intervals * n_timestep, with 0 < n_intervals < 1.
max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
min_samples_split (int, optional) – The minimum number of samples to split an internal node.
min_samples_leaf (int, optional) – The minimum number of samples in a leaf.
min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value.
criterion ({"squared_error"}, optional) –
The criterion used to evaluate the utility of a split.

Deprecated since version 1.0: Criterion “mse” was deprecated in v1.1 and will be removed in version 1.2. Use criterion=”squared_error” which is equivalent.
intervals ({"fixed", "sample", "random"}, optional) –
- if “fixed”, n_intervals non-overlapping intervals.
- if “sample”, n_intervals * sample_size non-overlapping intervals.
- if “random”, n_intervals possibly overlapping intervals of randomly sampled in [min_size * n_timestep, max_size * n_timestep]
sample_size (float, optional) – The fraction of intervals to sample at each node. Ignored unless intervals="sample".
min_size (float, optional) – The minmum interval size. Ignored unless intervals="random".
max_size (float, optional) – The maximum interval size. Ignored unless intervals="random".
summarizer (list or str, optional) –
The summarization of each interval.
- if list, a list of callables accepting a numpy array returing a float.
- if str, a predified summarized. See wildboar.transform._interval._INTERVALS.keys() for all supported summarizers.
random_state (int or RandomState) –
- If int, random_state is the seed used by the random number generator
- If RandomState instance, random_state is the random number generator
- If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree._tree.PivotTreeClassifier(n_pivot='sqrt', *, metrics='all', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='entropy', class_weight=None, random_state=None)[source]#

Bases: FeatureTreeClassifierMixin, BasePivotTree

A tree classifier that uses pivot time series.

tree_[source]#

The internal tree representation

Type:: Tree

Parameters:

n_pivot (str or int, optional) – The number of pivot time series to sample at each node.
metrics (str, optional) – The metrics to sample from. Currently, we only support “all”.
max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
min_samples_split (int, optional) – The minimum number of samples to split an internal node.
min_samples_leaf (int, optional) – The minimum number of samples in a leaf.
min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value.
criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split.
class_weight (dict or "balanced", optional) –
Weights associated with the labels.
- if dict, weights on the form {label: weight}.
- if “balanced” each class weight inversely proportional to the class frequency.
- if None, each class has equal weight.
random_state (int or RandomState) –
- If int, random_state is the seed used by the random number generator
- If RandomState instance, random_state is the random number generator
- If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree._tree.RocketTreeClassifier(n_kernels=10, *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='entropy', sampling='normal', sampling_params=None, kernel_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, class_weight=None, random_state=None)[source]#

Bases: FeatureTreeClassifierMixin, BaseRocketTree

A tree classifier that uses random convolutions as features.

tree_[source]#

The internal tree representation.

Type:: Tree

Parameters:

n_kernels (int, optional) – The number of kernels to sample at each node.
max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
min_samples_split (int, optional) – The minimum number of samples to split an internal node.
min_samples_leaf (int, optional) – The minimum number of samples in a leaf.
min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value.
criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split.
sampling ({"normal", "uniform", "shapelet"}, optional) –
The sampling of convolutional filters.
- if “normal”, sample filter according to a normal distribution with mean and scale.
- if “uniform”, sample filter according to a uniform distribution with lower and upper.
- if “shapelet”, sample filters as subsequences in the training data.
sampling_params (dict, optional) –
The parameters for the sampling.
- if “normal”, {"mean": float, "scale": float}, defaults to
  {"mean": 0, "scale": 1}.
- if “uniform”, {"lower": float, "upper": float}, defaults to
  {"lower": -1, "upper": 1}.
kernel_size ((min_size, max_size) or array-like, optional) –
The kernel size.
- if (min_size, max_size), all kernel sizes between min_size * n_timestep and max_size * n_timestep
- if array-like, all defined kernel sizes.
bias_prob (float, optional) – The probability of using a bias term.
normalize_prob (float, optional) – The probability of performing normalization.
padding_prob (float, optional) – The probability of padding with zeros.
class_weight (dict or "balanced", optional) –
Weights associated with the labels
- if dict, weights on the form {label: weight}
- if “balanced” each class weight inversely proportional to the class frequency
- if None, each class has equal weight
random_state (int or RandomState) –
- If int, random_state is the seed used by the random number generator
- If RandomState instance, random_state is the random number generator
- If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree._tree.RocketTreeRegressor(n_kernels=10, *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='squared_error', sampling='normal', sampling_params=None, kernel_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, random_state=None)[source]#

Bases: FeatureTreeRegressorMixin, BaseRocketTree

A tree regressor that uses random convolutions as features.

tree_[source]#

The internal tree representation.

Type:: Tree

Parameters:

n_kernels (int, optional) – The number of kernels to sample at each node.
max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
min_samples_split (int, optional) – The minimum number of samples to split an internal node.
min_samples_leaf (int, optional) – The minimum number of samples in a leaf.
min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value.
criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split.
sampling ({"normal", "uniform", "shapelet"}, optional) –
The sampling of convolutional filters.
- if “normal”, sample filter according to a normal distribution with mean and scale.
- if “uniform”, sample filter according to a uniform distribution with lower and upper.
- if “shapelet”, sample filters as subsequences in the training data.
sampling_params (dict, optional) –
The parameters for the sampling.
- if “normal”, {"mean": float, "scale": float}, defaults to
  {"mean": 0, "scale": 1}.
- if “uniform”, {"lower": float, "upper": float}, defaults to
  {"lower": -1, "upper": 1}.
kernel_size ((min_size, max_size) or array-like, optional) –
The kernel size.
- if (min_size, max_size), all kernel sizes between min_size * n_timestep and max_size * n_timestep
- if array-like, all defined kernel sizes.
bias_prob (float, optional) – The probability of using a bias term.
normalize_prob (float, optional) – The probability of performing normalization.
padding_prob (float, optional) – The probability of padding with zeros.
random_state (int or RandomState) –
- If int, random_state is the seed used by the random number generator
- If RandomState instance, random_state is the random number generator
- If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree._tree.ShapeletTreeClassifier(*, n_shapelets='warn', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, alpha=None, metric='euclidean', metric_params=None, criterion='entropy', class_weight=None, random_state=None)[source]#

Bases: DynamicTreeMixin, FeatureTreeClassifierMixin, BaseShapeletTree

A shapelet tree classifier.

tree_[source]#

The tree data structure used internally

Type:: Tree

classes_[source]#

The class labels

Type:: ndarray of shape (n_classes,)

n_classes_[source]#

The number of class labels

Type:: int

See also

ShapeletTreeRegressor: A shapelet tree regressor.
ExtraShapeletTreeClassifier: An extra random shapelet tree classifier.

Parameters:

max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples
min_samples_split (int, optional) – The minimum number of samples to split an internal node
min_samples_leaf (int, optional) – The minimum number of samples in a leaf
criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split
min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value
n_shapelets (int, optional) – The number of shapelets to sample at each node.
min_shapelet_size (float, optional) – The minimum length of a sampled shapelet expressed as a fraction, computed as min(ceil(X.shape[-1] * min_shapelet_size), 2).
max_shapelet_size (float, optional) – The maximum length of a sampled shapelet, expressed as a fraction, computed as ceil(X.shape[-1] * max_shapelet_size).
alpha (float, optional) –
Dynamically decrease the number of sampled shapelets at each node according to the current depth.
- if \(alpha < 0\), the number of sampled shapelets decrease from n_shapelets towards 1 with increased depth.
- if \(alpha > 0\), the number of sampled shapelets increase from 1 towards n_shapelets with increased depth.
- if None, the number of sampled shapelets are the same independeth of depth.
metric ({"euclidean", "scaled_euclidean", "dtw", "scaled_dtw"}, optional) – Distance metric used to identify the best shapelet.
metric_params (dict, optional) – Parameters for the distance measure
class_weight (dict or "balanced", optional) –
Weights associated with the labels
- if dict, weights on the form {label: weight}
- if “balanced” each class weight inversely proportional to the class frequency
- if None, each class has equal weight
random_state (int or RandomState) –
- If int, random_state is the seed used by the random number generator;
- If RandomState instance, random_state is the random number generator;
- If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree._tree.ShapeletTreeRegressor(*, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, n_shapelets='warn', min_shapelet_size=0, max_shapelet_size=1, alpha=None, metric='euclidean', metric_params=None, criterion='squared_error', random_state=None)[source]#

Bases: DynamicTreeMixin, FeatureTreeRegressorMixin, BaseShapeletTree

A shapelet tree regressor.

tree_[source]#

The internal tree representation

Type:: Tree

Parameters:

max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples
min_samples_split (int, optional) – The minimum number of samples to split an internal node
min_samples_leaf (int, optional) – The minimum number of samples in a leaf
criterion ({"squared_error"}, optional) –
The criterion used to evaluate the utility of a split

Deprecated since version 1.0: Criterion “mse” was deprecated in v1.1 and will be removed in version 1.2. Use criterion=”squared_error” which is equivalent.
min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value
n_shapelets (int, optional) – The number of shapelets to sample at each node.
min_shapelet_size (float, optional) – The minimum length of a sampled shapelet expressed as a fraction, computed as min(ceil(X.shape[-1] * min_shapelet_size), 2).
max_shapelet_size (float, optional) – The maximum length of a sampled shapelet, expressed as a fraction, computed as ceil(X.shape[-1] * max_shapelet_size).
alpha (float, optional) –
Dynamically decrease the number of sampled shapelets at each node according to the current depth.

\[w = 1 - e^{-|alpha| * depth}\]
- if \(alpha < 0\), the number of sampled shapelets decrease from n_shapelets towards 1 with increased depth.
  
  \[n_shapelets * (1 - w)\]
- if \(alpha > 0\), the number of sampled shapelets increase from 1 towards n_shapelets with increased depth.
  
  \[n_shapelets * w\]
- if None, the number of sampled shapelets are the same independeth of depth.
metric (str, optional) –
Distance metric used to identify the best shapelet.

See distance._SUBSEQUENCE_DISTANCE_MEASURE.keys() for a list of supported metrics.
metric_params (dict, optional) –
Parameters for the distance measure.

Read more about the parameters in the User guide.
random_state (int or RandomState) –
- If int, random_state is the seed used by the random number generator
- If RandomState instance, random_state is the random number generator
- If None, the random number generator is the RandomState instance used by np.random.

wildboar.tree._tree.CLF_CRITERION[source]#

wildboar.tree._tree.REG_CRITERION[source]#