Dimension selection#

In multi-variate settings, it is often useful to reduce the number of dimensions of the time series. Wildboar supports dimension selection in the wildboar.dimension_selection module and implements a few strategies inspired by traditional feature selection.

Dimension variance threshold#

The simplest approach computes the variance between the pairwise distance between time series within each dimension and is used to filter dimensions where the time series have low or no variance.

We set the variance threshold to 9 to filter out any dimensions with a pairwise distance variance greater than 9.

t.get_dimensions()
array([ True, False,  True,  True])

The filter removes only the third dimension.

t.transform(X).shape
(300, 3, 65)

And the resulting transformation contains only the three remaining dimensions.

Sequential dimension selector#

Sequentially select a set of dimensions by adding (forward) or removing (backward) dimensions to greedily form a subset. At each iteration, the algorithm chooses the best dimension to add or remove based on the cross validation score of a classifier or regressor.

We select the dimensions that have the most predictive performance.

t.get_dimensions()
array([ True, False, False,  True])

The resulting transformation contains only those dimensions.

t.transform(X).shape
(300, 2, 65)

Using all dimensions, the Rocket classifier has an accuracy of 0.92.

By using the make_pipeline function from scikit-learn we can reduce the number of dimensions.

Using only the selected dimensions, the Rocket classifier instead has an accuracy of 0.98.