`wildboar.tree`#

Submodules#

wildboar.tree.base

Package Contents#

Classes#

`ExtraShapeletTreeClassifier`	An extra shapelet tree classifier.
`ExtraShapeletTreeRegressor`	An extra shapelet tree regressor.
`IntervalTreeClassifier`	An interval based tree classifier.
`IntervalTreeRegressor`	An interval based tree regressor.
`PivotTreeClassifier`	A tree classifier that uses pivot time series.
`ProximityTreeClassifier`	A classifier that uses a k-branching tree based on pivot-time series.
`RocketTreeClassifier`	A tree classifier that uses random convolutions as features.
`RocketTreeRegressor`	A tree regressor that uses random convolutions as features.
`ShapeletTreeClassifier`	A shapelet tree classifier.
`ShapeletTreeRegressor`	A shapelet tree regressor.

class wildboar.tree.ExtraShapeletTreeClassifier(*, n_shapelets=1, max_depth=None, min_samples_leaf=1, min_impurity_decrease=0.0, min_samples_split=2, min_shapelet_size=0.0, max_shapelet_size=1.0, metric='euclidean', metric_params=None, criterion='entropy', class_weight=None, random_state=None)[source]#

Bases: ShapeletTreeClassifier

An extra shapelet tree classifier.

Extra shapelet trees are constructed by sampling a distance threshold uniformly in the range [min(dist), max(dist)].

tree_[source]#

The tree representation

Type:: Tree

Parameters:

n_shapelets (int, optional) – The number of shapelets to sample at each node.
max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples
min_samples_split (int, optional) – The minimum number of samples to split an internal node
min_samples_leaf (int, optional) – The minimum number of samples in a leaf
criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split
min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value
min_shapelet_size (float, optional) – The minimum length of a sampled shapelet expressed as a fraction, computed as min(ceil(X.shape[-1] * min_shapelet_size), 2).
max_shapelet_size (float, optional) – The maximum length of a sampled shapelet, expressed as a fraction, computed as ceil(X.shape[-1] * max_shapelet_size).
metric ({"euclidean", "scaled_euclidean", "dtw", "scaled_dtw"}, optional) – Distance metric used to identify the best shapelet.
metric_params (dict, optional) – Parameters for the distance measure
class_weight (dict or "balanced", optional) –
Weights associated with the labels
- if dict, weights on the form {label: weight}
- if “balanced” each class weight inversely proportional to the class frequency
- if None, each class has equal weight
random_state (int or RandomState) –
- If int, random_state is the seed used by the random number generator;
- If RandomState instance, random_state is the random number generator;
- If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree.ExtraShapeletTreeRegressor(*, n_shapelets=1, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, metric='euclidean', metric_params=None, criterion='squared_error', random_state=None)[source]#

Bases: ShapeletTreeRegressor

An extra shapelet tree regressor.

Extra shapelet trees are constructed by sampling a distance threshold uniformly in the range [min(dist), max(dist)].

tree_[source]#

The internal tree representation

Type:: Tree

Parameters:

n_shapelets (int, optional) – The number of shapelets to sample at each node.
max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples
min_samples_split (int, optional) – The minimum number of samples to split an internal node
min_samples_leaf (int, optional) – The minimum number of samples in a leaf
criterion ({"mse"}, optional) –
The criterion used to evaluate the utility of a split

Deprecated since version 1.0: Criterion “mse” was deprecated in v1.1 and will be removed in version 1.2. Use criterion=”squared_error” which is equivalent.
min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value
n_shapelets – The number of shapelets to sample at each node.
min_shapelet_size (float, optional) – The minimum length of a sampled shapelet expressed as a fraction, computed as min(ceil(X.shape[-1] * min_shapelet_size), 2).
max_shapelet_size (float, optional) – The maximum length of a sampled shapelet, expressed as a fraction, computed as ceil(X.shape[-1] * max_shapelet_size).
metric ({'euclidean', 'scaled_euclidean', 'scaled_dtw'}, optional) – Distance metric used to identify the best shapelet.
metric_params (dict, optional) – Parameters for the distance measure
random_state (int or RandomState) –
- If int, random_state is the seed used by the random number generator;
- If RandomState instance, random_state is the random number generator;
- If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree.IntervalTreeClassifier(n_intervals='sqrt', *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='entropy', intervals='fixed', sample_size=0.5, min_size=0.0, max_size=1.0, summarizer='mean_var_slope', class_weight=None, random_state=None)[source]#

Bases: FeatureTreeClassifierMixin, BaseIntervalTree

An interval based tree classifier.

tree_[source]#

The internal tree structure.

Type:: Tree

Parameters:

n_intervals ({"log", "sqrt"}, int or float, optional) –
The number of intervals to partition the time series into.
- if “log”, the number of intervals is log2(n_timestep).
- if “sqrt”, the number of intervals is sqrt(n_timestep).
- if int, the number of intervals is n_intervals.
- if float, the number of intervals is n_intervals * n_timestep, with 0 < n_intervals < 1.
max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
min_samples_split (int, optional) – The minimum number of samples to split an internal node.
min_samples_leaf (int, optional) – The minimum number of samples in a leaf.
min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value.
criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split.
intervals ({"fixed", "sample", "random"}, optional) –
- if “fixed”, n_intervals non-overlapping intervals.
- if “sample”, n_intervals * sample_size non-overlapping intervals.
- if “random”, n_intervals possibly overlapping intervals of randomly sampled in [min_size * n_timestep, max_size * n_timestep]
sample_size (float, optional) – The fraction of intervals to sample at each node. Ignored unless intervals="sample".
min_size (float, optional) – The minmum interval size. Ignored unless intervals="random".
max_size (float, optional) – The maximum interval size. Ignored unless intervals="random".
summarizer (list or str, optional) –
The summarization of each interval.
- if list, a list of callables accepting a numpy array returing a float.
- if str, a predified summarized. See wildboar.transform._interval._INTERVALS.keys() for all supported summarizers.
class_weight (dict or "balanced", optional) –
Weights associated with the labels
- if dict, weights on the form {label: weight}
- if “balanced” each class weight inversely proportional to the class frequency
- if None, each class has equal weight
random_state (int or RandomState) –
- If int, random_state is the seed used by the random number generator
- If RandomState instance, random_state is the random number generator
- If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree.IntervalTreeRegressor(n_intervals='sqrt', *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='squared_error', intervals='fixed', sample_size=0.5, min_size=0.0, max_size=1.0, summarizer='mean_var_slope', random_state=None)[source]#

Bases: FeatureTreeRegressorMixin, BaseIntervalTree

An interval based tree regressor.

tree_[source]#

The internal tree structure.

Type:: Tree

Parameters:

n_intervals ({"log", "sqrt"}, int or float, optional) –
The number of intervals to partition the time series into.
- if “log”, the number of intervals is log2(n_timestep).
- if “sqrt”, the number of intervals is sqrt(n_timestep).
- if int, the number of intervals is n_intervals.
- if float, the number of intervals is n_intervals * n_timestep, with 0 < n_intervals < 1.
max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
min_samples_split (int, optional) – The minimum number of samples to split an internal node.
min_samples_leaf (int, optional) – The minimum number of samples in a leaf.
min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value.
criterion ({"squared_error"}, optional) –
The criterion used to evaluate the utility of a split.

Deprecated since version 1.0: Criterion “mse” was deprecated in v1.1 and will be removed in version 1.2. Use criterion=”squared_error” which is equivalent.
intervals ({"fixed", "sample", "random"}, optional) –
- if “fixed”, n_intervals non-overlapping intervals.
- if “sample”, n_intervals * sample_size non-overlapping intervals.
- if “random”, n_intervals possibly overlapping intervals of randomly sampled in [min_size * n_timestep, max_size * n_timestep]
sample_size (float, optional) – The fraction of intervals to sample at each node. Ignored unless intervals="sample".
min_size (float, optional) – The minmum interval size. Ignored unless intervals="random".
max_size (float, optional) – The maximum interval size. Ignored unless intervals="random".
summarizer (list or str, optional) –
The summarization of each interval.
- if list, a list of callables accepting a numpy array returing a float.
- if str, a predified summarized. See wildboar.transform._interval._INTERVALS.keys() for all supported summarizers.
random_state (int or RandomState) –
- If int, random_state is the seed used by the random number generator
- If RandomState instance, random_state is the random number generator
- If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree.PivotTreeClassifier(n_pivot='sqrt', *, metrics='all', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='entropy', class_weight=None, random_state=None)[source]#

Bases: FeatureTreeClassifierMixin, BasePivotTree

A tree classifier that uses pivot time series.

tree_[source]#

The internal tree representation

Type:: Tree

Parameters:

n_pivot (str or int, optional) – The number of pivot time series to sample at each node.
metrics (str, optional) – The metrics to sample from. Currently, we only support “all”.
max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
min_samples_split (int, optional) – The minimum number of samples to split an internal node.
min_samples_leaf (int, optional) – The minimum number of samples in a leaf.
min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value.
criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split.
class_weight (dict or "balanced", optional) –
Weights associated with the labels.
- if dict, weights on the form {label: weight}.
- if “balanced” each class weight inversely proportional to the class frequency.
- if None, each class has equal weight.
random_state (int or RandomState) –
- If int, random_state is the seed used by the random number generator
- If RandomState instance, random_state is the random number generator
- If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree.ProximityTreeClassifier(n_pivot=1, *, criterion='entropy', pivot_sample='label', metric_sample='weighted', metric_factories='default', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, class_weight=None, random_state=None)[source]#

Bases: wildboar.tree.base.TreeClassifierMixin, wildboar.tree.base.BaseTree

A classifier that uses a k-branching tree based on pivot-time series.

Examples

>>> from wildboar.datasets import load_dataset
>>> from wildboar.tree import ProximityTreeClassifier
>>> x, y = load_dataset("GunPoint")
>>> f = ProximityTreeClassifier(
...     n_pivot=10,
...     metric_factories={
...         "rdtw": {"min_r": 0.1, "max_r": 0.25},
...         "msm": {"min_c": 0.1, "max_c": 100, "n": 20}
...     },
...     "max_criterion="gini"
... )
>>> f.fit(x, y)

References

Lucas, Benjamin, Ahmed Shifaz, Charlotte Pelletier, Lachlan O’Neill, Nayyar Zaidi, Bart Goethals, François Petitjean, and Geoffrey I. Webb. (2019): Proximity forest: an effective and scalable distance-based classifier for time series. Data Mining and Knowledge Discovery

Parameters:

n_pivot (int, optional) – The number of pivots to sample at each node.
criterion ({"entropy", "gini"}, optional) – The impurity criterion.
pivot_sample ({"label", "uniform"}, optional) – The pivot sampling method.
metric_sample ({"uniform", "weighted"}, optional) – The metric sampling method.
metric_factories ("default", list or dict, optional) –
The distance metrics.

If dict, a dictionary where key is:
- if str, a named distance factory (See _DISTANCE_FACTORIES.keys())
- if callable, a function returning a list of DistanceMeasure-objects
and where value is a dict of parameters to the factory.

If list, a list of named factories or callables.

If “default”, use the parameterization of (Lucas et.al, 2019)
max_depth (int, optional) – The maximum tree depth.
min_samples_split (int, optional) – The minimum number of samples to consider a split.
min_samples_leaf (int, optional) – The minimum number of samples in a leaf.
min_impurity_decrease (float, optional) – The minimum impurity decrease to build a sub-tree.
class_weight (dict or "balanced", optional) –
Weights associated with the labels.
- if dict, weights on the form {label: weight}.
- if “balanced” each class weight inversely proportional to the class frequency.
- if None, each class has equal weight.
random_state (int or RandomState) –
- If int, random_state is the seed used by the random number generator
- If RandomState instance, random_state is the random number generator
- If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree.RocketTreeClassifier(n_kernels=10, *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='entropy', sampling='normal', sampling_params=None, kernel_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, class_weight=None, random_state=None)[source]#

Bases: FeatureTreeClassifierMixin, BaseRocketTree

A tree classifier that uses random convolutions as features.

tree_[source]#

The internal tree representation.

Type:: Tree

Parameters:

n_kernels (int, optional) – The number of kernels to sample at each node.
max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
min_samples_split (int, optional) – The minimum number of samples to split an internal node.
min_samples_leaf (int, optional) – The minimum number of samples in a leaf.
min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value.
criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split.
sampling ({"normal", "uniform", "shapelet"}, optional) –
The sampling of convolutional filters.
- if “normal”, sample filter according to a normal distribution with mean and scale.
- if “uniform”, sample filter according to a uniform distribution with lower and upper.
- if “shapelet”, sample filters as subsequences in the training data.
sampling_params (dict, optional) –
The parameters for the sampling.
- if “normal”, {"mean": float, "scale": float}, defaults to
  {"mean": 0, "scale": 1}.
- if “uniform”, {"lower": float, "upper": float}, defaults to
  {"lower": -1, "upper": 1}.
kernel_size ((min_size, max_size) or array-like, optional) –
The kernel size.
- if (min_size, max_size), all kernel sizes between min_size * n_timestep and max_size * n_timestep
- if array-like, all defined kernel sizes.
bias_prob (float, optional) – The probability of using a bias term.
normalize_prob (float, optional) – The probability of performing normalization.
padding_prob (float, optional) – The probability of padding with zeros.
class_weight (dict or "balanced", optional) –
Weights associated with the labels
- if dict, weights on the form {label: weight}
- if “balanced” each class weight inversely proportional to the class frequency
- if None, each class has equal weight
random_state (int or RandomState) –
- If int, random_state is the seed used by the random number generator
- If RandomState instance, random_state is the random number generator
- If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree.RocketTreeRegressor(n_kernels=10, *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='squared_error', sampling='normal', sampling_params=None, kernel_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, random_state=None)[source]#

Bases: FeatureTreeRegressorMixin, BaseRocketTree

A tree regressor that uses random convolutions as features.

tree_[source]#

The internal tree representation.

Type:: Tree

Parameters:

n_kernels (int, optional) – The number of kernels to sample at each node.
max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
min_samples_split (int, optional) – The minimum number of samples to split an internal node.
min_samples_leaf (int, optional) – The minimum number of samples in a leaf.
min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value.
criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split.
sampling ({"normal", "uniform", "shapelet"}, optional) –
The sampling of convolutional filters.
- if “normal”, sample filter according to a normal distribution with mean and scale.
- if “uniform”, sample filter according to a uniform distribution with lower and upper.
- if “shapelet”, sample filters as subsequences in the training data.
sampling_params (dict, optional) –
The parameters for the sampling.
- if “normal”, {"mean": float, "scale": float}, defaults to
  {"mean": 0, "scale": 1}.
- if “uniform”, {"lower": float, "upper": float}, defaults to
  {"lower": -1, "upper": 1}.
kernel_size ((min_size, max_size) or array-like, optional) –
The kernel size.
- if (min_size, max_size), all kernel sizes between min_size * n_timestep and max_size * n_timestep
- if array-like, all defined kernel sizes.
bias_prob (float, optional) – The probability of using a bias term.
normalize_prob (float, optional) – The probability of performing normalization.
padding_prob (float, optional) – The probability of padding with zeros.
random_state (int or RandomState) –
- If int, random_state is the seed used by the random number generator
- If RandomState instance, random_state is the random number generator
- If None, the random number generator is the RandomState instance used by np.random.

class wildboar.tree.ShapeletTreeClassifier(*, n_shapelets='warn', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, alpha=None, metric='euclidean', metric_params=None, criterion='entropy', class_weight=None, random_state=None)[source]#

Bases: DynamicTreeMixin, FeatureTreeClassifierMixin, BaseShapeletTree

A shapelet tree classifier.

tree_[source]#

The tree data structure used internally

Type:: Tree

classes_[source]#

The class labels

Type:: ndarray of shape (n_classes,)

n_classes_[source]#

The number of class labels

Type:: int

wildboar.tree#

Submodules#

Package Contents#

Classes#

`wildboar.tree`#