##################### Convolution transform ##################### Wildboar implements two convolutional transformation methods `Rocket` [#rocket]_ and `Hydra` [#hydra]_, described by Dempsar et al. Both algorithms employ random convolutional kernels, but in sligtly different manners. In `Rocket`, each kernel is applied to each time series and the maximum activation value and the average number of positive activations are recorded. In `Hydra`, the kernels are partitioned into groups and for each exponential dilation and padding combination each kernel is applied to each time series and the number of times and the number of times each kernel has the highest activation value and the lowest is recorded. Then the features corresponds to the number of times a kernel had the in-group highest activation and the average of the lowest activation. For the purpose of this example, we load the `MoteStrain` dataset for the UCR time series archive and split it into two parts: one for fitting the transformation and one for evaluating the predictive performance. .. execute:: :context: from wildboar.datasets import load_dataset from sklearn.model_selection import train_test_split X, y = load_dataset("MoteStrain") X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1) .. execute:: :context: :include-source: no :show-output: :card-width: 75% from wildboar.utils.plot import plot_time_domain n_samples, n_timestep = X_train.shape y_labels, counts = np.unique(y_train, return_counts=True) print(f""" The dataset contains {n_samples} samples with {n_timestep} time steps each. Of the samples, {counts[0]} is labeled as {y_labels[0]} and {counts[1]} labeled as {y_labels[1]}. Here, we plot the time series. """) plot_time_domain(X_train, y_train, cmap=None) ******************** Hydra transformation ******************** In Wildboar, we extensively utilize the functionalities of ``scikit-learn`` and can directly employ these features. We construct a pipeline wherein we initially transform each time series into the representation dictated by `Hydra` (utilizing the default parameters ``n_groups=64`` and ``n_kernels=8``). The subsequent stages of the pipeline include the application of a sparse scaler, which compensates for the sparsity induced by the transformation (it is important to note that we count the frequency of occurrences where a kernel exhibits the highest activation, and in numerous instances, a single kernel may never achieve this), and ultimately, the pipeline employs a standard Ridge classifier on the transformed data. .. execute:: :context: :show-return: from wildboar.datasets.preprocess import SparseScaler from wildboar.transform import HydraTransform from sklearn.pipeline import make_pipeline hydra = make_pipeline(HydraTransform(random_state=1), SparseScaler()) hydra.fit(X_train, y_train) We can inspect the resulting transformation by using the ``transform`` function. .. execute:: :context: :show-return: X_test_transform = hydra.transform(X_test) X_test_transform[0] .. execute:: :context: :include-source: no :show-output: _, n_features = X_test_transform.shape print(f""" The transformed array contains {n_features} features. """) We can use principal component analysis (:class:`~sklearn.decomposition.PCA`) to identify the combination of attributes that account for most of the variance in the data. .. execute:: :context: :include-source: no :show-source-link: :link-text: Download plot source import matplotlib.pylab as plt from sklearn.decomposition import PCA pca = PCA(n_components=2) X_test_pca = pca.fit_transform(X_test_transform) for label in [1, 2]: plt.scatter( X_test_pca[y_test == label, 0], X_test_pca[y_test == label, 1], label=f"Label {label}", ) plt.xlabel("Component 0") plt.ylabel("Component 1") plt.legend() .. execute:: :context: :include-source: no :show-output: evr = pca.explained_variance_ratio_ print(f""" The first two components explain {(100 * evr[0]):.2f} and {(100 * evr[1]):.2f} percent of the variance. """) **************** Rocket transform **************** The Rocket transformation employs a large, randomly generated set of `kernels` to enable the transformation process. By default, the parameter ``n_kernels`` is assigned the value of :math:`10000` kernels. Furthermore, we utilize the pipelines offered by ``scikit-learn`` to normalize the feature representation, ensuring a mean of zero and a standard deviation of one. .. execute:: :context: :show-return: from sklearn.preprocessing import StandardScaler from wildboar.transform import RocketTransform rocket = make_pipeline(RocketTransform(), StandardScaler()) rocket.fit(X_test, y_test) We can inspect the resulting transformation. .. execute:: :context: :show-return: X_test_transform = rocket.transform(X_test) X_test_transform[0] In contrast to Hydra whose transformation size depends on the number of time steps in the input, the Rocket transformation has a fixed size only dependent on the number of kernels. As such, the resulting transformation consists of :math:`10000` features. We can use principal component analysis (:class:`~sklearn.decomposition.PCA`) to identify the combination of attributes that account for most of the variance in the data. .. execute:: :context: :include-source: no :show-source-link: :link-text: Download plot source pca = PCA(n_components=2) X_test_pca = pca.fit_transform(X_test_transform) for label in [1, 2]: plt.scatter( X_test_pca[y_test == label, 0], X_test_pca[y_test == label, 1], label=f"Label {label}", ) plt.xlabel("Component 0") plt.ylabel("Component 1") plt.legend() .. execute:: :context: :include-source: no :show-output: evr = pca.explained_variance_ratio_ print(f""" The first two components explain {(100 * evr[0]):.2f} and {(100 * evr[1]):.2f} percent of the variance. """) .. [#rocket] Rocket .. [#hydra] Hydra