Ensemble estimators#
Shapelet forests#
Shapelet forests, implemented in ensemble.ShapeletForestClassifier and
ensemble.ShapeletForestRegressor, construct ensembles of shapelet tree
classifiers or regressors respectively. For a large variety of tasks, these
estimators are excellent baseline methods.
from wildboar.datasets import load_gun_point
from wildboar.ensemble import ShapeletForestClassifier
X_train, X_test, y_train, y_test = load_gun_point(merge_train_test=False)
clf = ShapeletForestClassifier(random_state=1)
clf.fit(X_train, y_train)
ShapeletForestClassifier(random_state=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
The ShapeletForestClassifier class includes the
n_jobs parameter, which determines the number of processor cores to be
allocated for model fitting and prediction. It is advisable to assign n_jobs
a value of -1 to utilize all available cores.
We can get the predictions by using the predict-function (or the predict_proba-function):
clf.predict(X_test)
array([1., 2., 2., ..., 2., 2., 1.], shape=(150,), dtype=float32)
The accuracy of the model is given by the score-function.
clf.score(X_test, y_test)
0.9866666666666667
Proximity forests#
Test ensemble.ProximityForestClassifier is an ensemble of highly
randomized Proximity Trees. Whereas conventional decision trees branch on
attribute values, and shapelet trees on distance thresholds, Proximity Trees is
k-branching tree that branches on proximity of time series to one of k
pivot time series.
from wildboar.datasets import load_gun_point
from wildboar.ensemble import ProximityForestClassifier
X_train, X_test, y_train, y_test = load_gun_point(merge_train_test=False)
clf = ProximityForestClassifier(random_state=1)
clf.fit(X_train, y_train)
ProximityForestClassifier(random_state=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
By default, ProximityForestClassifier uses the
distance measures suggested in the original paper [1]. Using these
distance measures, we get the following accuracy:
clf.score(X_test, y_test)
0.9666666666666667
We can specify only a single metric:
clf = ProximityForestClassifier(metric="euclidean", random_state=1)
clf.fit(X_train, y_train)
ProximityForestClassifier(metric='euclidean', random_state=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
This configuration gives the following accuracy:
clf.score(X_test, y_test)
0.8533333333333334
We can also specify more complex configurations by passing a dict or
list to the metric parameter. You can read more about metric
specification in the corresponding section.
clf = ProximityForestClassifier(
metric={
"ddtw": {"min_r": 0.01, "max_r": 0.1},
"msm": {"min_c": 0.1, "max_c": 100},
},
random_state=1,
)
clf.fit(X_train, y_train)
ProximityForestClassifier(metric={'ddtw': {'max_r': 0.1, 'min_r': 0.01},
'msm': {'max_c': 100, 'min_c': 0.1}},
random_state=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
This configuration gives the following accuracy:
clf.score(X_test, y_test)
0.9733333333333334
Elastic Ensemble#
The Elastic ensemble is a classifier first described by Lines and Bagnall (2015) [2]. The ensemble consists of one k-nearest neighbors classifier per distance metric, with the parameters of the metric optimized through leave one out cross-validation.
from wildboar.datasets import load_gun_point
from wildboar.ensemble import ElasticEnsembleClassifier
X_train, X_test, y_train, y_test = load_gun_point(merge_train_test=False)
clf = ElasticEnsembleClassifier()
clf.fit(X_train, y_train)
ElasticEnsembleClassifier()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
The default configuration uses all available elastic distances measures in Wildboar, which corresponds to a superset of the elastic metrics used by Lines and Bagnall (2015) [2] but with a smaller grid of metric parameters.
The result of the default configuration is:
clf.score(X_test, y_test)
0.9866666666666667
Similar to the Proximity Forest, we can specify a custom metric:
clf = ElasticEnsembleClassifier(
metric={
"ddtw": {"min_r": 0.01, "max_r": 0.1},
"msm": {"min_c": 0.1, "max_c": 100},
},
)
clf.fit(X_train, y_train)
ElasticEnsembleClassifier(metric={'ddtw': {'max_r': 0.1, 'min_r': 0.01},
'msm': {'max_c': 100, 'min_c': 0.1}})In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
This smaller configuration has an accuracy of:
clf.score(X_test, y_test)
1.0
Interval Forest#
The interval forest was first introduced by Deng et al. [4] and is
implemented in the class IntervalForestClassifier
It constructs a forest of interval-based decision trees where each node is
constructed using a value aggregate over a (possibly overlapping) interval. In
the default formulation a node uses either the mean, variance or slope of
the interval. But it is possible to consider other aggregation functions (in
Wildboar we call the functions summarization functions).
from wildboar.datasets import load_gun_point
from wildboar.ensemble import IntervalForestClassifier
X_train, X_test, y_train, y_test = load_gun_point(merge_train_test=False)
clf = IntervalForestClassifier(min_size=0.1, max_size=0.3, random_state=1)
clf.fit(X_train, y_train)
IntervalForestClassifier(max_size=0.3, min_size=0.1, random_state=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
The interval forest uses the default summarization functions mentioned above and sqrt(n_timestep) intervals. By default, we randomly select random intervals that are possibly overlapping. The accuracy is:
clf.score(X_test, y_test)
0.9733333333333334
We can also use non-overlapping intervals by setting the intervals parameter to “fixed”. We can sample a smaller set of intervals by setting the sample_size parameter to a float.
Warning
intervals="sample" was deprecated in version 1.3 and will be removed in
version 1.4. The equivalent functionality can be achieved by setting
intervals="fixed" and specifying sample_size as a float.
from wildboar.datasets import load_gun_point
from wildboar.ensemble import IntervalForestClassifier
X_train, X_test, y_train, y_test = load_gun_point(merge_train_test=False)
clf = IntervalForestClassifier(
intervals="fixed", n_intervals=30, sample_size=0.2, random_state=1
)
clf.fit(X_train, y_train)
IntervalForestClassifier(intervals='fixed', n_intervals=30, random_state=1,
sample_size=0.2)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
At each node in each tree, we sample 20% of the intervals. The accuracy is:
clf.score(X_test, y_test)
0.9466666666666667
We can also change the summarizer. By setting the summarizer parameter to
"catch22" we can sample from the full set of Catch22 [3] features.
X_train, X_test, y_train, y_test = load_gun_point(merge_train_test=False)
clf = IntervalForestClassifier(
summarizer="catch22",
intervals="random",
n_intervals=30,
random_state=1,
)
clf.fit(X_train, y_train)
IntervalForestClassifier(n_intervals=30, random_state=1, summarizer='catch22')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
Here, we sample 30 possibly overlapping intervals at each node and randomly selects one of the catch22 features to split the node. The accuracy for this configuration is:
clf.score(X_test, y_test)
0.9733333333333334