wildboar.tree#
Submodules#
Package Contents#
Classes#
An extra shapelet tree classifier. |
|
An extra shapelet tree regressor. |
|
An interval based tree classifier. |
|
An interval based tree regressor. |
|
A tree classifier that uses pivot time series. |
|
A classifier that uses a k-branching tree based on pivot-time series. |
|
A tree classifier that uses random convolutions as features. |
|
A tree regressor that uses random convolutions as features. |
|
A shapelet tree classifier. |
|
A shapelet tree regressor. |
- class wildboar.tree.ExtraShapeletTreeClassifier(*, n_shapelets=1, max_depth=None, min_samples_leaf=1, min_impurity_decrease=0.0, min_samples_split=2, min_shapelet_size=0.0, max_shapelet_size=1.0, metric='euclidean', metric_params=None, criterion='entropy', class_weight=None, random_state=None)[source]#
Bases:
ShapeletTreeClassifierAn extra shapelet tree classifier.
Extra shapelet trees are constructed by sampling a distance threshold uniformly in the range
[min(dist), max(dist)].- Parameters:
n_shapelets (int, optional) – The number of shapelets to sample at each node.
max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples
min_samples_split (int, optional) – The minimum number of samples to split an internal node
min_samples_leaf (int, optional) – The minimum number of samples in a leaf
criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split
min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value
min_shapelet_size (float, optional) – The minimum length of a sampled shapelet expressed as a fraction, computed as
min(ceil(X.shape[-1] * min_shapelet_size), 2).max_shapelet_size (float, optional) – The maximum length of a sampled shapelet, expressed as a fraction, computed as
ceil(X.shape[-1] * max_shapelet_size).metric ({"euclidean", "scaled_euclidean", "dtw", "scaled_dtw"}, optional) – Distance metric used to identify the best shapelet.
metric_params (dict, optional) – Parameters for the distance measure
class_weight (dict or "balanced", optional) –
Weights associated with the labels
if dict, weights on the form {label: weight}
if “balanced” each class weight inversely proportional to the class frequency
if None, each class has equal weight
random_state (int or RandomState) –
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used by np.random.
- class wildboar.tree.ExtraShapeletTreeRegressor(*, n_shapelets=1, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, metric='euclidean', metric_params=None, criterion='squared_error', random_state=None)[source]#
Bases:
ShapeletTreeRegressorAn extra shapelet tree regressor.
Extra shapelet trees are constructed by sampling a distance threshold uniformly in the range [min(dist), max(dist)].
- Parameters:
n_shapelets (int, optional) – The number of shapelets to sample at each node.
max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples
min_samples_split (int, optional) – The minimum number of samples to split an internal node
min_samples_leaf (int, optional) – The minimum number of samples in a leaf
criterion ({"mse"}, optional) –
The criterion used to evaluate the utility of a split
Deprecated since version 1.0: Criterion “mse” was deprecated in v1.1 and will be removed in version 1.2. Use criterion=”squared_error” which is equivalent.
min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value
n_shapelets – The number of shapelets to sample at each node.
min_shapelet_size (float, optional) – The minimum length of a sampled shapelet expressed as a fraction, computed as min(ceil(X.shape[-1] * min_shapelet_size), 2).
max_shapelet_size (float, optional) – The maximum length of a sampled shapelet, expressed as a fraction, computed as ceil(X.shape[-1] * max_shapelet_size).
metric ({'euclidean', 'scaled_euclidean', 'scaled_dtw'}, optional) – Distance metric used to identify the best shapelet.
metric_params (dict, optional) – Parameters for the distance measure
random_state (int or RandomState) –
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used by np.random.
- class wildboar.tree.IntervalTreeClassifier(n_intervals='sqrt', *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='entropy', intervals='fixed', sample_size=0.5, min_size=0.0, max_size=1.0, summarizer='mean_var_slope', class_weight=None, random_state=None)[source]#
Bases:
FeatureTreeClassifierMixin,BaseIntervalTreeAn interval based tree classifier.
- Parameters:
n_intervals ({"log", "sqrt"}, int or float, optional) –
The number of intervals to partition the time series into.
if “log”, the number of intervals is
log2(n_timestep).if “sqrt”, the number of intervals is
sqrt(n_timestep).if int, the number of intervals is
n_intervals.if float, the number of intervals is
n_intervals * n_timestep, with0 < n_intervals < 1.
max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
min_samples_split (int, optional) – The minimum number of samples to split an internal node.
min_samples_leaf (int, optional) – The minimum number of samples in a leaf.
min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value.
criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split.
intervals ({"fixed", "sample", "random"}, optional) –
if “fixed”, n_intervals non-overlapping intervals.
if “sample”,
n_intervals * sample_sizenon-overlapping intervals.if “random”, n_intervals possibly overlapping intervals of randomly sampled in
[min_size * n_timestep, max_size * n_timestep]
sample_size (float, optional) – The fraction of intervals to sample at each node. Ignored unless
intervals="sample".min_size (float, optional) – The minmum interval size. Ignored unless
intervals="random".max_size (float, optional) – The maximum interval size. Ignored unless
intervals="random".summarizer (list or str, optional) –
The summarization of each interval.
if list, a list of callables accepting a numpy array returing a float.
if str, a predified summarized. See
wildboar.transform._interval._INTERVALS.keys()for all supported summarizers.
class_weight (dict or "balanced", optional) –
Weights associated with the labels
if dict, weights on the form {label: weight}
if “balanced” each class weight inversely proportional to the class frequency
if None, each class has equal weight
random_state (int or RandomState) –
If int, random_state is the seed used by the random number generator
If RandomState instance, random_state is the random number generator
If None, the random number generator is the RandomState instance used by np.random.
- class wildboar.tree.IntervalTreeRegressor(n_intervals='sqrt', *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='squared_error', intervals='fixed', sample_size=0.5, min_size=0.0, max_size=1.0, summarizer='mean_var_slope', random_state=None)[source]#
Bases:
FeatureTreeRegressorMixin,BaseIntervalTreeAn interval based tree regressor.
- Parameters:
n_intervals ({"log", "sqrt"}, int or float, optional) –
The number of intervals to partition the time series into.
if “log”, the number of intervals is
log2(n_timestep).if “sqrt”, the number of intervals is
sqrt(n_timestep).if int, the number of intervals is
n_intervals.if float, the number of intervals is
n_intervals * n_timestep, with0 < n_intervals < 1.
max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
min_samples_split (int, optional) – The minimum number of samples to split an internal node.
min_samples_leaf (int, optional) – The minimum number of samples in a leaf.
min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value.
criterion ({"squared_error"}, optional) –
The criterion used to evaluate the utility of a split.
Deprecated since version 1.0: Criterion “mse” was deprecated in v1.1 and will be removed in version 1.2. Use criterion=”squared_error” which is equivalent.
intervals ({"fixed", "sample", "random"}, optional) –
if “fixed”, n_intervals non-overlapping intervals.
if “sample”,
n_intervals * sample_sizenon-overlapping intervals.if “random”, n_intervals possibly overlapping intervals of randomly sampled in
[min_size * n_timestep, max_size * n_timestep]
sample_size (float, optional) – The fraction of intervals to sample at each node. Ignored unless
intervals="sample".min_size (float, optional) – The minmum interval size. Ignored unless
intervals="random".max_size (float, optional) – The maximum interval size. Ignored unless
intervals="random".summarizer (list or str, optional) –
The summarization of each interval.
if list, a list of callables accepting a numpy array returing a float.
if str, a predified summarized. See
wildboar.transform._interval._INTERVALS.keys()for all supported summarizers.
random_state (int or RandomState) –
If int, random_state is the seed used by the random number generator
If RandomState instance, random_state is the random number generator
If None, the random number generator is the RandomState instance used by np.random.
- class wildboar.tree.PivotTreeClassifier(n_pivot='sqrt', *, metrics='all', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='entropy', class_weight=None, random_state=None)[source]#
Bases:
FeatureTreeClassifierMixin,BasePivotTreeA tree classifier that uses pivot time series.
- Parameters:
n_pivot (str or int, optional) – The number of pivot time series to sample at each node.
metrics (str, optional) – The metrics to sample from. Currently, we only support “all”.
max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
min_samples_split (int, optional) – The minimum number of samples to split an internal node.
min_samples_leaf (int, optional) – The minimum number of samples in a leaf.
min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value.
criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split.
class_weight (dict or "balanced", optional) –
Weights associated with the labels.
if dict, weights on the form {label: weight}.
if “balanced” each class weight inversely proportional to the class frequency.
if None, each class has equal weight.
random_state (int or RandomState) –
If int, random_state is the seed used by the random number generator
If RandomState instance, random_state is the random number generator
If None, the random number generator is the RandomState instance used by np.random.
- class wildboar.tree.ProximityTreeClassifier(n_pivot=1, *, criterion='entropy', pivot_sample='label', metric_sample='weighted', metric_factories='default', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, class_weight=None, random_state=None)[source]#
Bases:
wildboar.tree.base.TreeClassifierMixin,wildboar.tree.base.BaseTreeA classifier that uses a k-branching tree based on pivot-time series.
Examples
>>> from wildboar.datasets import load_dataset >>> from wildboar.tree import ProximityTreeClassifier >>> x, y = load_dataset("GunPoint") >>> f = ProximityTreeClassifier( ... n_pivot=10, ... metric_factories={ ... "rdtw": {"min_r": 0.1, "max_r": 0.25}, ... "msm": {"min_c": 0.1, "max_c": 100, "n": 20} ... }, ... "max_criterion="gini" ... ) >>> f.fit(x, y)
References
- Lucas, Benjamin, Ahmed Shifaz, Charlotte Pelletier, Lachlan O’Neill, Nayyar Zaidi, Bart Goethals, François Petitjean, and Geoffrey I. Webb. (2019)
Proximity forest: an effective and scalable distance-based classifier for time series. Data Mining and Knowledge Discovery
- Parameters:
n_pivot (int, optional) – The number of pivots to sample at each node.
criterion ({"entropy", "gini"}, optional) – The impurity criterion.
pivot_sample ({"label", "uniform"}, optional) – The pivot sampling method.
metric_sample ({"uniform", "weighted"}, optional) – The metric sampling method.
metric_factories ("default", list or dict, optional) –
The distance metrics.
If dict, a dictionary where key is:
if str, a named distance factory (See
_DISTANCE_FACTORIES.keys())if callable, a function returning a list of
DistanceMeasure-objects
and where value is a dict of parameters to the factory.
If list, a list of named factories or callables.
If “default”, use the parameterization of (Lucas et.al, 2019)
max_depth (int, optional) – The maximum tree depth.
min_samples_split (int, optional) – The minimum number of samples to consider a split.
min_samples_leaf (int, optional) – The minimum number of samples in a leaf.
min_impurity_decrease (float, optional) – The minimum impurity decrease to build a sub-tree.
class_weight (dict or "balanced", optional) –
Weights associated with the labels.
if dict, weights on the form {label: weight}.
if “balanced” each class weight inversely proportional to the class frequency.
if None, each class has equal weight.
random_state (int or RandomState) –
If int, random_state is the seed used by the random number generator
If RandomState instance, random_state is the random number generator
If None, the random number generator is the RandomState instance used by np.random.
- class wildboar.tree.RocketTreeClassifier(n_kernels=10, *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='entropy', sampling='normal', sampling_params=None, kernel_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, class_weight=None, random_state=None)[source]#
Bases:
FeatureTreeClassifierMixin,BaseRocketTreeA tree classifier that uses random convolutions as features.
- Parameters:
n_kernels (int, optional) – The number of kernels to sample at each node.
max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
min_samples_split (int, optional) – The minimum number of samples to split an internal node.
min_samples_leaf (int, optional) – The minimum number of samples in a leaf.
min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value.
criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split.
sampling ({"normal", "uniform", "shapelet"}, optional) –
The sampling of convolutional filters.
if “normal”, sample filter according to a normal distribution with
meanandscale.if “uniform”, sample filter according to a uniform distribution with
lowerandupper.if “shapelet”, sample filters as subsequences in the training data.
sampling_params (dict, optional) –
The parameters for the sampling.
- if “normal”,
{"mean": float, "scale": float}, defaults to {"mean": 0, "scale": 1}.
- if “normal”,
- if “uniform”,
{"lower": float, "upper": float}, defaults to {"lower": -1, "upper": 1}.
- if “uniform”,
kernel_size ((min_size, max_size) or array-like, optional) –
The kernel size.
if (min_size, max_size), all kernel sizes between
min_size * n_timestepandmax_size * n_timestepif array-like, all defined kernel sizes.
bias_prob (float, optional) – The probability of using a bias term.
normalize_prob (float, optional) – The probability of performing normalization.
padding_prob (float, optional) – The probability of padding with zeros.
class_weight (dict or "balanced", optional) –
Weights associated with the labels
if dict, weights on the form {label: weight}
if “balanced” each class weight inversely proportional to the class frequency
if None, each class has equal weight
random_state (int or RandomState) –
If int, random_state is the seed used by the random number generator
If RandomState instance, random_state is the random number generator
If None, the random number generator is the RandomState instance used by np.random.
- class wildboar.tree.RocketTreeRegressor(n_kernels=10, *, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, criterion='squared_error', sampling='normal', sampling_params=None, kernel_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, random_state=None)[source]#
Bases:
FeatureTreeRegressorMixin,BaseRocketTreeA tree regressor that uses random convolutions as features.
- Parameters:
n_kernels (int, optional) – The number of kernels to sample at each node.
max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
min_samples_split (int, optional) – The minimum number of samples to split an internal node.
min_samples_leaf (int, optional) – The minimum number of samples in a leaf.
min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value.
criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split.
sampling ({"normal", "uniform", "shapelet"}, optional) –
The sampling of convolutional filters.
if “normal”, sample filter according to a normal distribution with
meanandscale.if “uniform”, sample filter according to a uniform distribution with
lowerandupper.if “shapelet”, sample filters as subsequences in the training data.
sampling_params (dict, optional) –
The parameters for the sampling.
- if “normal”,
{"mean": float, "scale": float}, defaults to {"mean": 0, "scale": 1}.
- if “normal”,
- if “uniform”,
{"lower": float, "upper": float}, defaults to {"lower": -1, "upper": 1}.
- if “uniform”,
kernel_size ((min_size, max_size) or array-like, optional) –
The kernel size.
if (min_size, max_size), all kernel sizes between
min_size * n_timestepandmax_size * n_timestepif array-like, all defined kernel sizes.
bias_prob (float, optional) – The probability of using a bias term.
normalize_prob (float, optional) – The probability of performing normalization.
padding_prob (float, optional) – The probability of padding with zeros.
random_state (int or RandomState) –
If int, random_state is the seed used by the random number generator
If RandomState instance, random_state is the random number generator
If None, the random number generator is the RandomState instance used by np.random.
- class wildboar.tree.ShapeletTreeClassifier(*, n_shapelets='warn', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, min_shapelet_size=0.0, max_shapelet_size=1.0, alpha=None, metric='euclidean', metric_params=None, criterion='entropy', class_weight=None, random_state=None)[source]#
Bases:
DynamicTreeMixin,FeatureTreeClassifierMixin,BaseShapeletTreeA shapelet tree classifier.
See also
ShapeletTreeRegressorA shapelet tree regressor.
ExtraShapeletTreeClassifierAn extra random shapelet tree classifier.
- Parameters:
max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples
min_samples_split (int, optional) – The minimum number of samples to split an internal node
min_samples_leaf (int, optional) – The minimum number of samples in a leaf
criterion ({"entropy", "gini"}, optional) – The criterion used to evaluate the utility of a split
min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value
n_shapelets (int, optional) – The number of shapelets to sample at each node.
min_shapelet_size (float, optional) – The minimum length of a sampled shapelet expressed as a fraction, computed as
min(ceil(X.shape[-1] * min_shapelet_size), 2).max_shapelet_size (float, optional) – The maximum length of a sampled shapelet, expressed as a fraction, computed as
ceil(X.shape[-1] * max_shapelet_size).alpha (float, optional) –
Dynamically decrease the number of sampled shapelets at each node according to the current depth.
if \(alpha < 0\), the number of sampled shapelets decrease from
n_shapeletstowards 1 with increased depth.if \(alpha > 0\), the number of sampled shapelets increase from
1towardsn_shapeletswith increased depth.if
None, the number of sampled shapelets are the same independeth of depth.
metric ({"euclidean", "scaled_euclidean", "dtw", "scaled_dtw"}, optional) – Distance metric used to identify the best shapelet.
metric_params (dict, optional) – Parameters for the distance measure
class_weight (dict or "balanced", optional) –
Weights associated with the labels
if dict, weights on the form {label: weight}
if “balanced” each class weight inversely proportional to the class frequency
if None, each class has equal weight
random_state (int or RandomState) –
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used by np.random.
- class wildboar.tree.ShapeletTreeRegressor(*, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_impurity_decrease=0.0, n_shapelets='warn', min_shapelet_size=0, max_shapelet_size=1, alpha=None, metric='euclidean', metric_params=None, criterion='squared_error', random_state=None)[source]#
Bases:
DynamicTreeMixin,FeatureTreeRegressorMixin,BaseShapeletTreeA shapelet tree regressor.
- Parameters:
max_depth (int, optional) – The maximum depth of the tree. If None the tree is expanded until all leaves are pure or until all leaves contain less than min_samples_split samples
min_samples_split (int, optional) – The minimum number of samples to split an internal node
min_samples_leaf (int, optional) – The minimum number of samples in a leaf
criterion ({"squared_error"}, optional) –
The criterion used to evaluate the utility of a split
Deprecated since version 1.0: Criterion “mse” was deprecated in v1.1 and will be removed in version 1.2. Use criterion=”squared_error” which is equivalent.
min_impurity_decrease (float, optional) – A split will be introduced only if the impurity decrease is larger than or equal to this value
n_shapelets (int, optional) – The number of shapelets to sample at each node.
min_shapelet_size (float, optional) – The minimum length of a sampled shapelet expressed as a fraction, computed as min(ceil(X.shape[-1] * min_shapelet_size), 2).
max_shapelet_size (float, optional) – The maximum length of a sampled shapelet, expressed as a fraction, computed as ceil(X.shape[-1] * max_shapelet_size).
alpha (float, optional) –
Dynamically decrease the number of sampled shapelets at each node according to the current depth.
\[w = 1 - e^{-|alpha| * depth}\]if \(alpha < 0\), the number of sampled shapelets decrease from
n_shapeletstowards 1 with increased depth.\[n_shapelets * (1 - w)\]if \(alpha > 0\), the number of sampled shapelets increase from
1towardsn_shapeletswith increased depth.\[n_shapelets * w\]if
None, the number of sampled shapelets are the same independeth of depth.
metric (str, optional) –
Distance metric used to identify the best shapelet.
See
distance._SUBSEQUENCE_DISTANCE_MEASURE.keys()for a list of supported metrics.metric_params (dict, optional) –
Parameters for the distance measure.
Read more about the parameters in the User guide.
random_state (int or RandomState) –
If int, random_state is the seed used by the random number generator
If RandomState instance, random_state is the random number generator
If None, the random number generator is the RandomState instance used by np.random.