.. currentmodule:: wildboar ########################## Transform-based estimators ########################## Time series transformations are designed to convert time series data into traditional column-based feature matrices suitable for input into subsequent classifiers or regressors. Notable feature representations are Rocket and Hydra, which employ convolutional kernels; shapelet-based transformations, which utilize shapelet distances; and interval-based transformations, which calculate feature values for overlapping or non-overlapping intervals. Typically these transformations are used with a linear estimator such as :class:`~sklearn.linear_models.RidgeClassifierCV`. Through this section we will use the `TwoLeadECG` dataset. .. execute:: :context: reset from wildboar.datasets import load_two_lead_ecg X_train, X_test, y_train, y_test = load_two_lead_ecg(merge_train_test=False) ************************ Shapelet-based transform ************************ Shapelet-based transformation uses shapelets, i.e., discriminatory subsequence, and a distance metric to construct a feature representation. Random shapelet transform ========================= The simplest and often effective approach is to sample a large number of shapelets and include all without filtering in the transformation. This approach is implementer in :class:`~wildboar.linear_model.RandomShapeletClassifier`. .. execute:: :context: :show-return: from wildboar.linear_model import RandomShapeletClassifier clf = RandomShapeletClassifier(random_state=1) clf.fit(X_train, y_train) clf.score(X_test, y_test) We can change the ``metric`` (by default the `metric` is set to the Euclidean distance): .. execute:: :context: :show-return: clf = RandomShapeletClassifier(metric="scaled_manhattan", random_state=1) clf.fit(X_train, y_train) clf.score(X_test, y_test) We can also specify multiple metrics, see :ref:`metric specification guide ` for more information on how to format the metrics. We can also limit the size of shapelets. .. execute:: :context: :show-return: clf = RandomShapeletClassifier( metric={"scaled_manhattan": None, "manhattan": None}, max_shapelet_size=0.2, random_state=1, ) clf.fit(X_train, y_train) clf.score(X_test, y_test) In the example, each time series `i` is transformed into a new representation consisting of `n` rows and `n_shapelets` features. Here, the `i`-th time series is characterized by the minimum distance to each shapelet in the set `0, ..., n_shapelets`. By default, each feature is normalized to have a mean of zero and a standard deviation of one. However, this normalization can be disabled by setting the parameter ``normalize=False``. Dilated shapelet transform ========================== A more recent approach, described by Guillaume et al. 2021 [#dst]_, constructs a feature representation that incorporates not only the minimal distance but also the occurrence counts of shapelets and the index of the minimal distance. Consequently, each shapelet is characterized by a triad of features rather than a singular feature. Furthermore, the Dilated Shapelet Transform (DST) expands shapelets by inserting empty values, thereby increasing the "receptive field" of the shapelets. This method is implemented within Wildboar as :class:`~wildboar.linear_model.DilatedShapeletClassifier` .. execute:: :context: :show-return: from wildboar.linear_model import DilatedShapeletClassifier clf = DilatedShapeletClassifier(random_state=1) clf.fit(X_train, y_train) clf.score(X_test, y_test) The DST classifier supports only a single `metric`; however, multiple parameters are available for tuning. Specifically, the size of the shapelets can be adjusted. By default, shapelets of length `7`, `9`, and `11` are utilized. This can be modified using the parameters ``shapelet_size``, ``min_shapelet_size``, and ``max_shapelet_size``. For instance: .. execute:: :context: :show-return: clf = DilatedShapeletClassifier(shapelet_size=[7, 11], random_state=1) clf.fit(X_train, y_train) clf.score(X_test, y_test) If the parameters ``min_shapelet_size`` or ``max_shapelet_size`` are specified, all odd sizes ranging from ``n_timesteps * min_shapelet_size`` to ``n_timesteps * max_shapelet_size`` will be utilized. The likelihood of z-normalizing the shapelets can be adjusted by modifying the ``normalize_prob`` parameter. It defaults to `0.8`, indicating that 80 percent of the shapelets undergo normalization. .. execute:: :context: :show-return: clf = DilatedShapeletClassifier(normalize_prob=0.1, random_state=1) clf.fit(X_train, y_train) clf.score(X_test, y_test) We can also determine the occurrence threshold, that is, the threshold for ascertaining the occurrence counts, by modifying the ``lower`` and ``upper`` parameters. These parameters delineate the bounds within which the occurrence threshold is sampled. By default, it is sampled from the 5 to 10 percent smallest distances. .. execute:: :context: :show-return: clf = DilatedShapeletClassifier(lower=0.1, upper=0.3, random_state=1) clf.fit(X_train, y_train) clf.score(X_test, y_test) Castor transform ================ Castor (Competing diAlated Shapelet TransfORm) [#samsten]_ is a transformation technique for time series data. Analogous to Hydra, Castor enables shapelets to compete, and akin to DST, it utilizes the occurrence of shapelets. Castor is characterized by two principal parameters: the number of groups (``n_groups``) and the number of shapelets (``n_shapelets``). These parameters collectively define the dimensions of the transformed feature space. By convention, we employ `64` groups, each comprising `8` shapelets. .. execute:: :context: :show-return: from wildboar.linear_model import CastorClassifier clf = CastorClassifier(random_state=1) clf.fit(X_train, y_train) clf.score(X_test, y_test) Castor has several tunable parameters, with ``n_group`` and ``n_shapelets`` being the most influential in determining classification accuracy. Generally, increasing the values of ``n_groups`` and ``n_shapelets`` enhances accuracy, as these parameters adjust the level of competition among features. For instance, a configuration with ``n_groups=1`` and ``n_shapelets=1024`` results in maximal competition, leading to a feature representation that closely resembles a pattern dictionary. Conversely, setting ``n_groups=1024`` and ``n_shapelets=1`` eliminates competition, yielding a transformation akin to a traditional shapelet-based transform, such as the Dilated Shapelet Transform (DST). A recommended approach for parameter tuning is to incrementally double the values of both ``n_group`` and ``n_shapelets`` in successive iterations. .. warning:: For better performance with multivariate datasets, set ``n_shapelets`` to `n_shapelets * n_dims` to ensure feature variability. .. execute:: :context: :show-return: from wildboar.linear_model import CastorClassifier clf = CastorClassifier(n_groups=128, n_shapelets=16, random_state=1) clf.fit(X_train, y_train) clf.score(X_test, y_test) *************************** Convolution-based transform *************************** Rocket ====== Hydra ===== ********** References ********** .. [#wistuba] Wistuba, M., Grabocka, J. and Schmidt-Thieme, L., 2015. Ultra-fast shapelets for time series classification. arXiv preprint arXiv:1503.05018. .. [#rocket] Dempster, A., Petitjean, F. and Webb, G.I., 2020. ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels. Data Mining and Knowledge Discovery, 34(5), pp.1454-1495. .. [#hydra] Dempster, A., Schmidt, D.F. and Webb, G.I., 2023. Hydra: Competing convolutional kernels for fast and accurate time series classification. Data Mining and Knowledge Discovery, pp.1-27. .. [#dst] Guillaume, A., Vrain, C. and Elloumi, W., 2022, June. Random dilated shapelet transform: A new approach for time series shapelets. In International Conference on Pattern Recognition and Artificial Intelligence (pp. 653-664). Cham: Springer International Publishing. .. [#samsten] Samsten, I. and Lee, Z., 2024. Castor: Competing dilated shapelet transform. Forthcoming