Ensemble estimators#

Shapelet forests#

Shapelet forests, implemented in ensemble.ShapeletForestClassifier and ensemble.ShapeletForestRegressor, construct ensembles of shapelet tree classifiers or regressors respectively. For a large variety of tasks, these estimators are excellent baseline methods.

The ShapeletForestClassifier class includes the n_jobs parameter, which determines the number of processor cores to be allocated for model fitting and prediction. It is advisable to assign n_jobs a value of -1 to utilize all available cores.

We can get the predictions by using the predict-function (or the predict_proba-function):

clf.predict(X_test)
array([1., 2., 2., ..., 2., 2., 1.], shape=(150,), dtype=float32)

The accuracy of the model is given by the score-function.

clf.score(X_test, y_test)
0.9866666666666667

Proximity forests#

Test ensemble.ProximityForestClassifier is an ensemble of highly randomized Proximity Trees. Whereas conventional decision trees branch on attribute values, and shapelet trees on distance thresholds, Proximity Trees is k-branching tree that branches on proximity of time series to one of k pivot time series.

By default, ProximityForestClassifier uses the distance measures suggested in the original paper [1]. Using these distance measures, we get the following accuracy:

clf.score(X_test, y_test)
0.9666666666666667

We can specify only a single metric:

This configuration gives the following accuracy:

clf.score(X_test, y_test)
0.8533333333333334

We can also specify more complex configurations by passing a dict or list to the metric parameter. You can read more about metric specification in the corresponding section.

This configuration gives the following accuracy:

clf.score(X_test, y_test)
0.9733333333333334

Elastic Ensemble#

The Elastic ensemble is a classifier first described by Lines and Bagnall (2015) [2]. The ensemble consists of one k-nearest neighbors classifier per distance metric, with the parameters of the metric optimized through leave one out cross-validation.

The default configuration uses all available elastic distances measures in Wildboar, which corresponds to a superset of the elastic metrics used by Lines and Bagnall (2015) [2] but with a smaller grid of metric parameters.

The result of the default configuration is:

clf.score(X_test, y_test)
0.9866666666666667

Similar to the Proximity Forest, we can specify a custom metric:

This smaller configuration has an accuracy of:

clf.score(X_test, y_test)
1.0

Interval Forest#

The interval forest was first introduced by Deng et al. [4] and is implemented in the class IntervalForestClassifier It constructs a forest of interval-based decision trees where each node is constructed using a value aggregate over a (possibly overlapping) interval. In the default formulation a node uses either the mean, variance or slope of the interval. But it is possible to consider other aggregation functions (in Wildboar we call the functions summarization functions).

The interval forest uses the default summarization functions mentioned above and sqrt(n_timestep) intervals. By default, we randomly select random intervals that are possibly overlapping. The accuracy is:

clf.score(X_test, y_test)
0.9733333333333334

We can also use non-overlapping intervals by setting the intervals parameter to “fixed”. We can sample a smaller set of intervals by setting the sample_size parameter to a float.

Warning

intervals="sample" was deprecated in version 1.3 and will be removed in version 1.4. The equivalent functionality can be achieved by setting intervals="fixed" and specifying sample_size as a float.

At each node in each tree, we sample 20% of the intervals. The accuracy is:

clf.score(X_test, y_test)
0.9466666666666667

We can also change the summarizer. By setting the summarizer parameter to "catch22" we can sample from the full set of Catch22 [3] features.

Here, we sample 30 possibly overlapping intervals at each node and randomly selects one of the catch22 features to split the node. The accuracy for this configuration is:

clf.score(X_test, y_test)
0.9733333333333334

References#