Ensemble estimators#
Shapelet forests#
Shapelet forests, implemented in ensemble.ShapeletForestClassifier
and
ensemble.ShapeletForestRegressor
, construct ensembles of shapelet tree
classifiers or regressors respectively. For a large variety of tasks, these
estimators are excellent baseline methods.
The ShapeletForestClassifier
class includes the
n_jobs parameter, which determines the number of processor cores to be
allocated for model fitting and prediction. It is advisable to assign n_jobs
a value of -1
to utilize all available cores.
We can get the predictions by using the predict-function (or the predict_proba-function):
clf.predict(X_test)
array([1., 2., 2., ..., 2., 2., 1.], shape=(150,), dtype=float32)
The accuracy of the model is given by the score-function.
clf.score(X_test, y_test)
0.9866666666666667
Proximity forests#
Test ensemble.ProximityForestClassifier
is an ensemble of highly
randomized Proximity Trees. Whereas conventional decision trees branch on
attribute values, and shapelet trees on distance thresholds, Proximity Trees is
k-branching tree that branches on proximity of time series to one of k
pivot time series.
By default, ProximityForestClassifier
uses the
distance measures suggested in the original paper [1]. Using these
distance measures, we get the following accuracy:
clf.score(X_test, y_test)
0.9666666666666667
We can specify only a single metric:
This configuration gives the following accuracy:
clf.score(X_test, y_test)
0.8533333333333334
We can also specify more complex configurations by passing a dict
or
list
to the metric parameter. You can read more about metric
specification in the corresponding section.
This configuration gives the following accuracy:
clf.score(X_test, y_test)
0.9733333333333334
Elastic Ensemble#
The Elastic ensemble is a classifier first described by Lines and Bagnall (2015) [2]. The ensemble consists of one k-nearest neighbors classifier per distance metric, with the parameters of the metric optimized through leave one out cross-validation.
The default configuration uses all available elastic distances measures in Wildboar, which corresponds to a superset of the elastic metrics used by Lines and Bagnall (2015) [2] but with a smaller grid of metric parameters.
The result of the default configuration is:
clf.score(X_test, y_test)
0.9866666666666667
Similar to the Proximity Forest, we can specify a custom metric:
This smaller configuration has an accuracy of:
clf.score(X_test, y_test)
1.0
Interval Forest#
The interval forest was first introduced by Deng et al. [4] and is
implemented in the class IntervalForestClassifier
It constructs a forest of interval-based decision trees where each node is
constructed using a value aggregate over a (possibly overlapping) interval. In
the default formulation a node uses either the mean, variance or slope of
the interval. But it is possible to consider other aggregation functions (in
Wildboar we call the functions summarization functions).
The interval forest uses the default summarization functions mentioned above and sqrt(n_timestep) intervals. By default, we randomly select random intervals that are possibly overlapping. The accuracy is:
clf.score(X_test, y_test)
0.9733333333333334
We can also use non-overlapping intervals by setting the intervals parameter to “fixed”. We can sample a smaller set of intervals by setting the sample_size parameter to a float.
Warning
intervals="sample"
was deprecated in version 1.3 and will be removed in
version 1.4. The equivalent functionality can be achieved by setting
intervals="fixed"
and specifying sample_size as a float.
At each node in each tree, we sample 20% of the intervals. The accuracy is:
clf.score(X_test, y_test)
0.9466666666666667
We can also change the summarizer. By setting the summarizer parameter to
"catch22"
we can sample from the full set of Catch22 [3] features.
Here, we sample 30 possibly overlapping intervals at each node and randomly selects one of the catch22 features to split the node. The accuracy for this configuration is:
clf.score(X_test, y_test)
0.9733333333333334