################### Dimension selection ################### In multi-variate settings, it is often useful to reduce the number of dimensions of the time series. Wildboar supports dimension selection in the :mod:`wildboar.dimension_selection` module and implements a few strategies inspired by traditional feature selection. **************************** Dimension variance threshold **************************** The simplest approach computes the variance between the pairwise distance between time series within each dimension and is used to filter dimensions where the time series have low or no variance. .. execute:: :context: :show-return: from wildboar.datasets import load_ering from wildboar.dimension_selection import DistanceVarianceThreshold X, y = load_ering() t = DistanceVarianceThreshold(threshold=9) t.fit(X, y) We set the variance threshold to 9 to filter out any dimensions with a pairwise distance variance greater than 9. .. execute:: :context: :show-return: t.get_dimensions() The filter removes only the third dimension. .. execute:: :context: :show-return: t.transform(X).shape And the resulting transformation contains only the three remaining dimensions. ***************************** Sequential dimension selector ***************************** Sequentially select a set of dimensions by adding (forward) or removing (backward) dimensions to greedily form a subset. At each iteration, the algorithm chooses the best dimension to add or remove based on the cross validation score of a classifier or regressor. .. execute:: :context: :show-return: from wildboar.datasets import load_ering from wildboar.dimension_selection import SequentialDimensionSelector from wildboar.distance import KNeighborsClassifier X, y = load_ering() t = SequentialDimensionSelector(KNeighborsClassifier(), n_dims=2) t.fit(X, y) We select the dimensions that have the most predictive performance. .. execute:: :context: :show-return: t.get_dimensions() The resulting transformation contains only those dimensions. .. execute:: :context: :show-return: t.transform(X).shape .. execute:: :context: :show-return: from wildboar.linear_model import RocketClassifier X_train, X_test, y_train, y_test = load_ering(merge_train_test=False) clf = RocketClassifier(random_state=2) clf.fit(X_train, y_train) .. execute:: :context: :include-source: no :show-output: print(f""" Using all dimensions, the Rocket classifier has an accuracy of {clf.score(X_test, y_test):.2f}. """) By using the :func:`~sklearn.pipeline.make_pipeline` function from ``scikit-learn`` we can reduce the number of dimensions. .. execute:: :context: :show-return: from sklearn.pipeline import make_pipeline clf = make_pipeline( SequentialDimensionSelector(KNeighborsClassifier(), n_dims=3), RocketClassifier(random_state=2) ) clf.fit(X_train, y_train) .. execute:: :context: :include-source: no :show-output: print(f""" Using only the selected dimensions, the Rocket classifier instead has an accuracy of {clf.score(X_test, y_test):.2f}. """)