Dimension selection#

In multi-variate settings, it is often useful to reduce the number of dimensions of the time series. Wildboar supports dimension selection in the wildboar.dimension_selection module and implements a few strategies inspired by traditional feature selection.

Dimension variance threshold#

The simplest approach computes the variance between the pairwise distance between time series within each dimension and is used to filter dimensions where the time series have low or no variance.

from wildboar.datasets import load_ering
from wildboar.dimension_selection import DistanceVarianceThreshold

X, y = load_ering()
t = DistanceVarianceThreshold(threshold=9)
t.fit(X, y)

We set the variance threshold to 9 to filter out any dimensions with a pairwise distance variance greater than 9.

t.get_dimensions()
array([ True, False,  True,  True])

The filter removes only the third dimension.

t.transform(X).shape
(300, 3, 65)

And the resulting transformation contains only the three remaining dimensions.

Sequential dimension selector#

Sequentially select a set of dimensions by adding (forward) or removing (backward) dimensions to greedily form a subset. At each iteration, the algorithm chooses the best dimension to add or remove based on the cross validation score of a classifier or regressor.

from wildboar.datasets import load_ering
from wildboar.dimension_selection import SequentialDimensionSelector
from wildboar.distance import KNeighborsClassifier

X, y = load_ering()
t = SequentialDimensionSelector(KNeighborsClassifier(), n_dims=2)
t.fit(X, y)
SequentialDimensionSelector(estimator=KNeighborsClassifier(), n_dims=2)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

We select the dimensions that have the most predictive performance.

t.get_dimensions()
array([ True, False, False,  True])

The resulting transformation contains only those dimensions.

t.transform(X).shape
(300, 2, 65)
from wildboar.linear_model import RocketClassifier

X_train, X_test, y_train, y_test = load_ering(merge_train_test=False)

clf = RocketClassifier(random_state=2)
clf.fit(X_train, y_train)
RocketClassifier(random_state=2)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Using all dimensions, the Rocket classifier has an accuracy of 0.92.

By using the make_pipeline function from scikit-learn we can reduce the number of dimensions.

from sklearn.pipeline import make_pipeline

clf = make_pipeline(
    SequentialDimensionSelector(KNeighborsClassifier(), n_dims=3),
    RocketClassifier(random_state=2),
)
clf.fit(X_train, y_train)
Pipeline(steps=[('sequentialdimensionselector',
                 SequentialDimensionSelector(estimator=KNeighborsClassifier(),
                                             n_dims=3)),
                ('rocketclassifier', RocketClassifier(random_state=2))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Using only the selected dimensions, the Rocket classifier instead has an accuracy of 0.98.