wildboar.transform#

Subpackages#

Submodules#

Package Contents#

Classes#

FeatureTransform

Transform a time series as a number of features

IntervalTransform

Embed a time series as a collection of features per interval.

MatrixProfileTransform

Transform each time series in a dataset to its MatrixProfile similarity self-join

PivotTransform

A transform using pivot time series and sampled distance metrics.

RandomShapeletTransform

Transform a time series to the distances to a selection of random shapelets.

RocketTransform

Transform a time series using random convolution features

class wildboar.transform.FeatureTransform(*, summarizer='catch22', n_jobs=None)[source]#

Bases: IntervalTransform

Transform a time series as a number of features

Parameters:
  • summarizer (str or list, optional) –

    The method to summarize each interval.

    • if str, the summarizer is determined by _SUMMARIZERS.keys().

    • if list, the summarizer is a list of functions f(x) -> float, where x is a numpy array.

    The default summarizer summarizes each time series using catch22-features

  • n_jobs (int, optional) – The number of cores to use on multi-core.

References

Lubba, Carl H., Sarab S. Sethi, Philip Knaute, Simon R. Schultz, Ben D. Fulcher, and Nick S. Jones.

catch22: Canonical time-series characteristics. Data Mining and Knowledge Discovery 33, no. 6 (2019): 1821-1852.

class wildboar.transform.IntervalTransform(n_intervals='sqrt', *, intervals='fixed', sample_size=0.5, min_size=0.0, max_size=1.0, summarizer='mean_var_slope', n_jobs=None, random_state=None)[source]#

Bases: wildboar.transform.base.BaseFeatureEngineerTransform

Embed a time series as a collection of features per interval.

Examples

>>> from wildboar.datasets import load_dataset
>>> x, y = load_dataset("GunPoint")
>>> t = IntervalTransform(n_intervals=10, summarizer="mean")
>>> t.fit_transform(x)

Each interval (15 timepoints) are transformed to their mean.

>>> t = IntervalTransform(n_intervals="sqrt", summarizer=[np.mean, np.std])
>>> t.fit_transform(x)

Each interval (150 // 12 timepoints) are transformed to two features. The mean and the standard deviation.

Parameters:
  • n_intervals (str, int or float, optional) –

    The number of intervals to use for the transform.

    • if “log”, the number of intervals is log2(n_timestep).

    • if “sqrt”, the number of intervals is sqrt(n_timestep).

    • if int, the number of intervals is n_intervals.

    • if float, the number of intervals is n_intervals * n_timestep, with 0 < n_intervals < 1.

  • intervals (str, optional) –

    The method for selecting intervals

    • if “fixed”, n_intervals non-overlapping intervals.

    • if “sample”, n_intervals * sample_size non-overlapping intervals.

    • if “random”, n_intervals possibly overlapping intervals of randomly sampled in [min_size * n_timestep, max_size * n_timestep]

  • sample_size (float, optional) – The sample size of fixed intervals if intervals="sample"

  • min_size (float, optional) – The minimum interval size if intervals="random"

  • max_size (float, optional) – The maximum interval size if intervals="random"

  • summarizer (str or list, optional) –

    The method to summarize each interval.

    • if str, the summarizer is determined by _SUMMARIZERS.keys().

    • if list, the summarizer is a list of functions f(x) -> float, where x is a numpy array.

    The default summarizer summarizes each interval as its mean, standard deviation and slope.

  • n_jobs (int, optional) – The number of cores to use on multi-core.

  • random_state (int or RandomState) –

    • If int, random_state is the seed used by the random number generator

    • If RandomState instance, random_state is the random number generator

    • If None, the random number generator is the RandomState instance used by np.random.

class wildboar.transform.MatrixProfileTransform(window=0.1, exclude=None, n_jobs=None)[source]#

Bases: sklearn.base.TransformerMixin, wildboar.base.BaseEstimator

Transform each time series in a dataset to its MatrixProfile similarity self-join

Examples

>>> from wildboar.datasets import load_two_lead_ecg()
>>> from wildboar.transform import MatrixProfileTransform
>>> x, y = load_two_lead_ecg()
>>> t = MatrixProfileTransform()
>>> t.fit_transform(x)
Parameters:
  • window (int or float, optional) –

  • size (the exact subsequence) –

  • 0.1 (by default) –

  • float (- if) –

  • n_timestep (a fraction of) –

  • int (- if) –

  • size

  • exclude (int or float, optional) –

    The size of the exclusion zone. The default exclusion zone is 0.2

    • if float, expressed as a fraction of the windows size

    • if int, exact size (0 < exclude)

  • n_jobs (int, optional) – The number of jobs to use when computing the

fit(x, y=None)[source]#

Fit the matrix profile. Sets the expected input dimensions

Parameters:
  • x (array-like of shape (n_samples, n_timesteps) or (n_samples, n_dims, n_timesteps)) – The samples

  • y (ignored) – The optional labels.

Returns:

self

Return type:

a fitted instance

transform(x)[source]#

Transform the samples to their MatrixProfile self-join.

Parameters:

x (array-like of shape (n_samples, n_timesteps) or (n_samples, n_dims, n_timesteps)) – The samples

Returns:

mp – The matrix matrix profile of each sample

Return type:

ndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timesteps)

class wildboar.transform.PivotTransform(n_pivots=1, *, metric_factories=None, random_state=None, n_jobs=None)[source]#

Bases: wildboar.transform.base.BaseFeatureEngineerTransform

A transform using pivot time series and sampled distance metrics.

Parameters:

metric_factories (dict, optional) –

The distance metrics. A dictionary where key is:

  • if str, a named distance factory (See _DISTANCE_FACTORIES.keys())

  • if callable, a function returning a list of DistanceMeasure-objects

and where value is a dict of parameters to the factory.

class wildboar.transform.RandomShapeletTransform(n_shapelets=1000, *, metric='euclidean', metric_params=None, min_shapelet_size=0, max_shapelet_size=1.0, n_jobs=None, random_state=None)[source]#

Bases: wildboar.transform.base.BaseFeatureEngineerTransform

Transform a time series to the distances to a selection of random shapelets.

embedding_[source]#

The underlying embedding object.

Type:

Embedding

References

Wistuba, Martin, Josif Grabocka, and Lars Schmidt-Thieme.

Ultra-fast shapelets for time series classification. arXiv preprint arXiv:1503.05018 (2015).

Parameters:
  • n_shapelets (int, optional) – The number of shapelets in the resulting transform

  • metric (str, optional) –

    Distance metric used to identify the best shapelet.

    See distance._SUBSEQUENCE_DISTANCE_MEASURE.keys() for a list of supported metrics.

  • metric_params (dict, optional) –

    Parameters for the distance measure.

    Read more about the parameters in the User guide.

  • min_shapelet_size (float, optional) – Minimum shapelet size.

  • max_shapelet_size (float, optional) – Maximum shapelet size.

  • n_jobs (int, optional) – The number of jobs to run in parallel. None means 1 and -1 means using all processors.

  • random_state (int or RandomState, optional) – The psudo-random number generator.

class wildboar.transform.RocketTransform(n_kernels=1000, *, sampling='normal', sampling_params=None, kernel_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, n_jobs=None, random_state=None)[source]#

Bases: wildboar.transform.base.BaseFeatureEngineerTransform

Transform a time series using random convolution features

embedding_[source]#

The underlying embedding

Type:

Embedding

References

Dempster, Angus, François Petitjean, and Geoffrey I. Webb.

ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels. Data Mining and Knowledge Discovery 34.5 (2020): 1454-1495.

Parameters:
  • n_kernels (int, optional) – The number of kernels.

  • n_jobs (int, optional) – The number of jobs to run in parallel. None means 1 and -1 means using all processors.

  • random_state (int or RandomState, optional) – The psuodo-random number generator.