Transform-based estimators#

Time series transformations are designed to convert time series data into traditional column-based feature matrices suitable for input into subsequent classifiers or regressors. Notable feature representations are Rocket and Hydra, which employ convolutional kernels; shapelet-based transformations, which utilize shapelet distances; and interval-based transformations, which calculate feature values for overlapping or non-overlapping intervals. Typically these transformations are used with a linear estimator such as RidgeClassifierCV.

Through this section we will use the TwoLeadECG dataset.

from wildboar.datasets import load_two_lead_ecg

X_train, X_test, y_train, y_test = load_two_lead_ecg(merge_train_test=False)

Shapelet-based transform#

Shapelet-based transformation uses shapelets, i.e., discriminatory subsequence, and a distance metric to construct a feature representation.

Random shapelet transform#

The simplest and often effective approach is to sample a large number of shapelets and include all without filtering in the transformation. This approach is implementer in RandomShapeletClassifier.

from wildboar.linear_model import RandomShapeletClassifier

clf = RandomShapeletClassifier(random_state=1)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)

0.990342405618964

We can change the metric (by default the metric is set to the Euclidean distance):

clf = RandomShapeletClassifier(metric="scaled_manhattan", random_state=1)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)

0.990342405618964

We can also specify multiple metrics, see metric specification guide for more information on how to format the metrics. We can also limit the size of shapelets.

In the example, each time series i is transformed into a new representation consisting of n rows and n_shapelets features. Here, the i-th time series is characterized by the minimum distance to each shapelet in the set 0, …, n_shapelets. By default, each feature is normalized to have a mean of zero and a standard deviation of one. However, this normalization can be disabled by setting the parameter normalize=False.

Dilated shapelet transform#

A more recent approach, described by Guillaume et al. 2021 [4], constructs a feature representation that incorporates not only the minimal distance but also the occurrence counts of shapelets and the index of the minimal distance. Consequently, each shapelet is characterized by a triad of features rather than a singular feature. Furthermore, the Dilated Shapelet Transform (DST) expands shapelets by inserting empty values, thereby increasing the “receptive field” of the shapelets. This method is implemented within Wildboar as DilatedShapeletClassifier

from wildboar.linear_model import DilatedShapeletClassifier

clf = DilatedShapeletClassifier(random_state=1)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)

0.9991220368744512

The DST classifier supports only a single metric; however, multiple parameters are available for tuning. Specifically, the size of the shapelets can be adjusted. By default, shapelets of length 7, 9, and 11 are utilized. This can be modified using the parameters shapelet_size, min_shapelet_size, and max_shapelet_size. For instance:

clf = DilatedShapeletClassifier(shapelet_size=[7, 11], random_state=1)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)

0.9991220368744512

If the parameters min_shapelet_size or max_shapelet_size are specified, all odd sizes ranging from n_timesteps * min_shapelet_size to n_timesteps * max_shapelet_size will be utilized.

The likelihood of z-normalizing the shapelets can be adjusted by modifying the normalize_prob parameter. It defaults to 0.8, indicating that 80 percent of the shapelets undergo normalization.

clf = DilatedShapeletClassifier(normalize_prob=0.1, random_state=1)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)

0.9920983318700615

We can also determine the occurrence threshold, that is, the threshold for ascertaining the occurrence counts, by modifying the lower and upper parameters. These parameters delineate the bounds within which the occurrence threshold is sampled. By default, it is sampled from the 5 to 10 percent smallest distances.

clf = DilatedShapeletClassifier(lower=0.1, upper=0.3, random_state=1)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)

0.9991220368744512

Castor transform#

Castor (Competing diAlated Shapelet TransfORm) [5] is a transformation technique for time series data. Analogous to Hydra, Castor enables shapelets to compete, and akin to DST, it utilizes the occurrence of shapelets. Castor is characterized by two principal parameters: the number of groups (n_groups) and the number of shapelets (n_shapelets). These parameters collectively define the dimensions of the transformed feature space. By convention, we employ 64 groups, each comprising 8 shapelets.

from wildboar.linear_model import CastorClassifier

clf = CastorClassifier(random_state=1)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)

1.0

Castor has several tunable parameters, with n_group and n_shapelets being the most influential in determining classification accuracy. Generally, increasing the values of n_groups and n_shapelets enhances accuracy, as these parameters adjust the level of competition among features. For instance, a configuration with n_groups=1 and n_shapelets=1024 results in maximal competition, leading to a feature representation that closely resembles a pattern dictionary. Conversely, setting n_groups=1024 and n_shapelets=1 eliminates competition, yielding a transformation akin to a traditional shapelet-based transform, such as the Dilated Shapelet Transform (DST). A recommended approach for parameter tuning is to incrementally double the values of both n_group and n_shapelets in successive iterations.

Warning

For better performance with multivariate datasets, set n_shapelets to n_shapelets * n_dims to ensure feature variability.

from wildboar.linear_model import CastorClassifier

clf = CastorClassifier(n_groups=128, n_shapelets=16, random_state=1)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)

1.0

Transform-based estimators#

Shapelet-based transform#

Random shapelet transform#

Dilated shapelet transform#

Castor transform#

Convolution-based transform#

Rocket#

Hydra#

References#

This Page