wildboar.transform#

Transform raw time series to tabular representations.

Classes#

CastorTransform

Competing Dialated Shapelet Transform.

DerivativeTransform

Perform derivative transformation on time series data.

DiffTransform

A transformer that applies a difference transformation to time series data.

DilatedShapeletTransform

Dilated shapelet transform.

FeatureTransform

Transform a time series as a number of features.

FftTransform

Discrete Fourier Transform.

HydraTransform

A Dictionary based method using convolutional kernels.

IntervalTransform

Embed a time series as a collection of features per interval.

MatrixProfileTransform

Matrix profile transform.

PAA

Peicewise aggregate approximation.

PivotTransform

A transform using pivot time series and sampled distance metrics.

ProximityTransform

Transform time series based on class conditional pivots.

QuantTransform

Quant transformation

RandomShapeletTransform

Random shapelet tranform.

RocketTransform

Transform a time series using random convolution features.

SAX

Symbolic aggregate approximation.

ShapeletTransform

Shapelet Transform.

Functions#

convolve(X, kernel[, bias, dilation, stride, padding])

Apply 1D convolution over a time series.

piecewice_aggregate_approximation(x, *[, n_intervals, ...])

Peicewise aggregate approximation.

symbolic_aggregate_approximation(x, *[, n_intervals, ...])

Symbolic aggregate approximation.


class wildboar.transform.CastorTransform(n_groups=64, n_shapelets=8, *, metric='euclidean', metric_params=None, normalize_prob=0.8, shapelet_size=11, lower=0.05, upper=0.1, soft_min=True, soft_max=False, soft_threshold=True, ignore_y=False, random_state=None, n_jobs=None)[source]#

Competing Dialated Shapelet Transform.

Parameters:
n_groupsint, optional

The number of groups of dilated shapelets.

n_shapeletsint, optional

The number of dilated shapelets per group.

metricstr or callable, optional

The distance metric

See _METRICS.keys() for a list of supported metrics.

metric_paramsdict, optional

Parameters to the metric.

Read more about the parameters in the User guide.

normalize_probfloat, optional

The probability of standardizing a shapelet with zero mean and unit standard deviation.

shapelet_sizeint, optional

The length of the dilated shapelet.

lowerfloat, optional

The lower percentile to draw distance thresholds above.

upperfloat, optional

The upper percentile to draw distance thresholds below.

soft_minbool, optional

If True, use the sum of minimal distances. Otherwise, use the count of minimal distances.

soft_maxbool, optional

If True, use the sum of maximal distances. Otherwise, use the count of maximal distances.

soft_thresholdbool, optional

If True, count the time steps below the threshold for all shapelets. Otherwise, count the time steps below the threshold for the shapelet with the minimal distance.

ignore_ybool, optional

Ignore y and use the same sample which a shapelet is sampled from to estimate the distance threshold.

random_stateint or RandomState, optional

Controls the random sampling of kernels.

  • If int, random_state is the seed used by the random number generator.

  • If numpy.random.RandomState instance, random_state is the random number generator.

  • If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

n_jobsint, optional

The number of parallel jobs.

Notes

For better performance with multivariate datasets, set n_shapelets to n_shapelets * n_dims to ensure feature variability.

fit(x, y=None)[source]#

Fit the transform.

Parameters:
xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)

The time series dataset.

yNone, optional

For compatibility.

Returns:
BaseAttributeTransform

This object.

fit_transform(x, y=None)[source]#

Fit the embedding and return the transform of x.

Parameters:
xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)

The time series dataset.

yNone, optional

For compatibility.

Returns:
ndarray of shape (n_samples, n_outputs)

The embedding.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:
selfestimator instance

Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

transform(x)[source]#

Transform the dataset.

Parameters:
xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)

The time series dataset.

Returns:
ndarray of shape (n_samples, n_outputs)

The transformation.

class wildboar.transform.DerivativeTransform(method='slope')[source]#

Perform derivative transformation on time series data.

Parameters:
methodstr, optional

The method to use for the derivative transformation. Must be one of: “slope”, “central”, or “backward”.

  • “backward”, computes the derivative at each point using the difference between the current and previous elements.

  • “central”, computes the derivative at each point using the average of the differences between the next and previous elements.

  • “slope”, computes a smoothed derivative at each point by averaging the difference between the current and previous elements with half the difference between the next and previous elements.

fit(X, y=None)[source]#

Fit the model to the provided data.

Only performs input validation.

Parameters:
Xarray-like of shape (n_samples, n_dims, n_timesteps)

The input data to fit the model.

yarray-like, optional

Not used.

Returns:
object

Returns the instance itself.

fit_transform(X, y=None, **fit_params)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns:
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:
selfestimator instance

Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

transform(X)[source]#

Transform the input data using a derivative transformation.

Parameters:
Xarray-like of shape (n_samples, n_dims, n_timesteps)

The input data to be transformed.

Returns:
array

The transformed data.

class wildboar.transform.DiffTransform(order=1)[source]#

A transformer that applies a difference transformation to time series data.

Parameters:
orderint, optional

The order of the difference operation. Default is 1.

fit(X, y=None)[source]#

Fit the model to the provided data.

Parameters:
Xarray-like of shape (n_samples, n_dims, n_timesteps)

The input data to fit the model. Must have at least two timesteps.

yarray-like, optional

Not used.

Returns:
object

Returns the instance of the fitted model.

fit_transform(X, y=None, **fit_params)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns:
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:
selfestimator instance

Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

transform(X)[source]#

Transform the input data by computing the discrete differences.

Parameters:
Xarray-like of shape (n_samples, n_dims, n_timesteps)

The input data to be transformed.

Returns:
array_like

An array containing the discrete difference of the input data along the last axis.

class wildboar.transform.DilatedShapeletTransform(n_shapelets=1000, *, metric='euclidean', metric_params=None, normalize_prob=0.5, min_shapelet_size=None, max_shapelet_size=None, shapelet_size=None, lower=0.05, upper=0.1, ignore_y=False, random_state=None, n_jobs=None)[source]#

Dilated shapelet transform.

Transform time series to a representation consisting of three values per shapelet: minimum dilated distance, the index of the timestep that minimizes the distance and number of subsequences that are below a distance threshold.

Parameters:
n_shapeletsint, optional

The number of dilated shapelets.

metricstr or callable, optional

The distance metric

See _METRICS.keys() for a list of supported metrics.

metric_paramsdict, optional

Parameters to the metric.

Read more about the parameters in the User guide.

normalize_probfloat, optional

The probability of standardizing a shapelet with zero mean and unit standard deviation.

min_shapelet_sizefloat, optional

The minimum shapelet size. If None, use the discrete sizes in shapelet_size.

max_shapelet_sizefloat, optional

The maximum shapelet size. If None, use the discrete sizes in shapelet_size.

shapelet_sizearray-like, optional

The size of shapelets, by default [7, 9, 11].

lowerfloat, optional

The lower percentile to draw distance thresholds above.

upperfloat, optional

The upper percentile to draw distance thresholds below.

ignore_ybool, optional

Ignore y and use the same sample which a shapelet is sampled from to estimate the distance threshold.

random_stateint or RandomState, optional

Controls the random sampling of kernels.

  • If int, random_state is the seed used by the random number generator.

  • If numpy.random.RandomState instance, random_state is the random number generator.

  • If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

n_jobsint, optional

The number of parallel jobs.

References

Antoine Guillaume, Christel Vrain, Elloumi Wael

Random Dilated Shapelet Transform: A New Approach for Time Series Shapelets Pattern Recognition and Artificial Intelligence, 2022

fit(x, y=None)[source]#

Fit the transform.

Parameters:
xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)

The time series dataset.

yNone, optional

For compatibility.

Returns:
BaseAttributeTransform

This object.

fit_transform(x, y=None)[source]#

Fit the embedding and return the transform of x.

Parameters:
xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)

The time series dataset.

yNone, optional

For compatibility.

Returns:
ndarray of shape (n_samples, n_outputs)

The embedding.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:
selfestimator instance

Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

transform(x)[source]#

Transform the dataset.

Parameters:
xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)

The time series dataset.

Returns:
ndarray of shape (n_samples, n_outputs)

The transformation.

class wildboar.transform.FeatureTransform(*, summarizer='catch22', n_jobs=None)[source]#

Transform a time series as a number of features.

Parameters:
summarizerstr or list, optional

The method to summarize each interval.

  • if str, the summarizer is determined by _SUMMARIZERS.keys().

  • if list, the summarizer is a list of functions f(x) -> float, where x is a numpy array.

The default summarizer summarizes each time series using catch22-features.

n_jobsint, optional

The number of cores to use on multi-core.

Examples

>>> from wildboar.datasets import load_gun_point
>>> X, y = load_gun_point()
>>> X_t = FeatureTransform().fit_transform(X)
>>> X_t[0]
array([-5.19633603e-01, -6.51047206e-01,  1.90000000e+01,  4.80000000e+01,
        7.48441896e-01, -2.73293560e-05,  2.21476510e-01,  4.70000000e+01,
        4.00000000e-02,  0.00000000e+00,  2.70502518e+00,  2.60000000e+01,
        6.42857143e-01,  1.00000000e-01, -3.26666667e-01,  9.89974643e-01,
        2.90000000e+01,  1.31570726e+00,  1.50000000e-01,  8.50000000e-01,
        4.90873852e-02,  1.47311800e-01])
fit(x, y=None)[source]#

Fit the transform.

Parameters:
xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)

The time series dataset.

yNone, optional

For compatibility.

Returns:
BaseAttributeTransform

This object.

fit_transform(x, y=None)[source]#

Fit the embedding and return the transform of x.

Parameters:
xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)

The time series dataset.

yNone, optional

For compatibility.

Returns:
ndarray of shape (n_samples, n_outputs)

The embedding.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:
selfestimator instance

Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

transform(x)[source]#

Transform the dataset.

Parameters:
xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)

The time series dataset.

Returns:
ndarray of shape (n_samples, n_outputs)

The transformation.

class wildboar.transform.FftTransform(spectrum='amplitude')[source]#

Discrete Fourier Transform.

Parameters:
spectrum{“amplitude”, “phase”}, optional

The spectrum of FFT transformation.

fit(X, y=None)[source]#

Fit the estimator.

Parameters:
Xarray-like of shape (n_samples, n_dims, n_timesteps)

The samples.

yignore, optional

Ignored.

Returns:
self

This instance.

fit_transform(X, y=None, **fit_params)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns:
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:
selfestimator instance

Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

transform(X)[source]#

Transform the input.

Parameters:
Xarray-like of shape (n_samples, n_dims, n_timesteps)

The input samples.

Returns:
ndarray of shape (n_samples, n_dims, m_timesteps)

The transformed data. If n_timesteps is even m_timesteps is (n_timesteps/2) + 1; otherwise (n_timesteps + 1) / 2.

class wildboar.transform.HydraTransform(*, n_groups=64, n_kernels=8, kernel_size=9, sampling='normal', sampling_params=None, n_jobs=None, random_state=None)[source]#

A Dictionary based method using convolutional kernels.

Parameters:
n_groupsint, optional

The number of groups of kernels.

n_kernelsint, optional

The number of kernels per group.

kernel_sizeint, optional

The size of the kernel.

sampling{“normal”}, optional

The strategy for sampling kernels. By default kernel weights are sampled from a normal distribution with zero mean and unit standard deviation.

sampling_paramsdict, optional

Parameters to the sampling approach. The “normal” sampler accepts two parameters: mean and scale.

n_jobsint, optional

The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.

random_stateint or RandomState, optional

Controls the random sampling of kernels.

  • If int, random_state is the seed used by the random number generator.

  • If numpy.random.RandomState instance, random_state is the random number generator.

  • If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

Attributes:
embedding_Embedding

The underlying embedding

See also

HydraClassifier

A classifier using hydra transform.

Notes

The implementation does not implement the first order descrete differences described by Dempster et. al. (2023). If this is desired, one can use native scikit-learn functionalities and the DiffTransform:

>>> from sklearn.pipeline import make_pipeline, make_union
>>> from wildboar.transform import DiffTransform, HydraTransform
>>> dempster_hydra = make_union(
...     HydraTransform(n_groups=32),
...     make_pipeline(
...         DiffTransform(),
...         HydraTransform(n_groups=32)
...     )
... )

References

Dempster, A., Schmidt, D. F., & Webb, G. I. (2023).

Hydra: competing convolutional kernels for fast and accurate time series classification. Data Mining and Knowledge Discovery

Examples

>>> from wildboar.datasets import load_gun_point
>>> from wildboar.transform import HydraTransform
>>> X, y = load_gun_point()
>>> t = HydraTransform(n_groups=8, n_kernels=4, random_state=1)
>>> t.fit_transform(X)
fit(x, y=None)[source]#

Fit the transform.

Parameters:
xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)

The time series dataset.

yNone, optional

For compatibility.

Returns:
BaseAttributeTransform

This object.

fit_transform(x, y=None)[source]#

Fit the embedding and return the transform of x.

Parameters:
xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)

The time series dataset.

yNone, optional

For compatibility.

Returns:
ndarray of shape (n_samples, n_outputs)

The embedding.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:
selfestimator instance

Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

transform(x)[source]#

Transform the dataset.

Parameters:
xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)

The time series dataset.

Returns:
ndarray of shape (n_samples, n_outputs)

The transformation.

class wildboar.transform.IntervalTransform(n_intervals='sqrt', *, intervals='fixed', sample_size=None, depth=None, min_size=0.0, max_size=1.0, coverage_probability=None, variability=1, summarizer='mean_var_slope', summarizer_params=None, n_jobs=None, random_state=None)[source]#

Embed a time series as a collection of features per interval.

Parameters:
n_intervalsstr, int or float, optional

The number of intervals to use for the transform.

  • if “log2”, the number of intervals is log2(n_timestep).

  • if “sqrt”, the number of intervals is sqrt(n_timestep).

  • if int, the number of intervals is n_intervals.

  • if float, the number of intervals is n_intervals * n_timestep, with 0 < n_intervals < 1.

Deprecated since version 1.2: The option “log” has been renamed to “log2”.

intervalsstr, optional

The method for selecting intervals.

  • if “fixed”, n_intervals non-overlapping intervals.

  • if “dyadic”, `2**depth-1+2**depth-1-depth” intervals.

  • if “random”, n_intervals possibly overlapping intervals of randomly sampled in [min_size * n_timestep, max_size * n_timestep].

Read more in the User Guide

sample_sizefloat, optional

The sample size of fixed intervals if intervals=”fixed”.

depthint, optional

The maximum depth for dyadic intervals if intervals=”dyadic”.

min_sizefloat, optional

The minimum interval size if intervals=”random”. Ignored if coverage_probability is set.

max_sizefloat, optional

The maximum interval size if intervals=”random”. Ignored if coverage_probability is set.

coverage_probabilityfloat, optional

The probability that a time step is covered by an interval, in the range 0 < coverage_probability <= 1.

  • For larger coverage_probability, we get larger intervals.

  • For smaller coverage_probability, we get shorter intervals.

variabilityfloat, optional

Controls the shape of the Beta distribution used to sample intervals. Defaults to 1.

  • Higher variability creates more uniform intervals.

  • Lower variability creates more variable intervals sizes.

summarizerstr or list, optional

The method to summarize each interval.

  • if str, the summarizer is determined by _SUMMARIZERS.keys().

  • if list, the summarizer is a list of functions f(x) -> float, where x is a numpy array.

The default summarizer summarizes each interval as its mean, standard deviation and slope.

Read more in the User Guide

summarizer_paramsdict, optional

A dictionary of parameters to the summarizer.

n_jobsint, optional

The number of cores to use on multi-core.

random_stateint or RandomState, optional
  • If int, random_state is the seed used by the random number generator

  • If RandomState instance, random_state is the random number generator

  • If None, the random number generator is the RandomState instance used by np.random.

Notes

Parallelization depends on releasing the global interpreter lock (GIL). As such, custom functions as summarizers reduces the performance. Wildboar implements summarizers for taking the mean (“mean”), variance (“variance”) and slope (“slope”) as well as their combination (“mean_var_slope”) and the full suite of catch22 features (“catch22”). In the future, we will allow downstream projects to implement their own summarizers in Cython which will allow for releasing the GIL.

References

Lubba, Carl H., Sarab S. Sethi, Philip Knaute, Simon R. Schultz, Ben D. Fulcher, and Nick S. Jones.

catch22: Canonical time-series characteristics. Data Mining and Knowledge Discovery 33, no. 6 (2019): 1821-1852.

Examples

>>> from wildboar.datasets import load_dataset
>>> x, y = load_dataset("GunPoint")
>>> t = IntervalTransform(n_intervals=10, summarizer="mean")
>>> t.fit_transform(x)

Each interval (15 timepoints) are transformed to their mean.

>>> t = IntervalTransform(n_intervals="sqrt", summarizer=[np.mean, np.std])
>>> t.fit_transform(x)

Each interval (150 // 12 timepoints) are transformed to two features. The mean and the standard deviation.

fit(x, y=None)[source]#

Fit the transform.

Parameters:
xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)

The time series dataset.

yNone, optional

For compatibility.

Returns:
BaseAttributeTransform

This object.

fit_transform(x, y=None)[source]#

Fit the embedding and return the transform of x.

Parameters:
xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)

The time series dataset.

yNone, optional

For compatibility.

Returns:
ndarray of shape (n_samples, n_outputs)

The embedding.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:
selfestimator instance

Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

transform(x)[source]#

Transform the dataset.

Parameters:
xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)

The time series dataset.

Returns:
ndarray of shape (n_samples, n_outputs)

The transformation.

class wildboar.transform.MatrixProfileTransform(window=0.1, exclude=None, n_jobs=None)[source]#

Matrix profile transform.

Transform each time series in a dataset to its MatrixProfile similarity self-join.

Parameters:
windowint or float, optional

The subsequence size, by default 0.1.

  • if float, a fraction of n_timestep.

  • if int, the exact subsequence size.

excludeint or float, optional

The size of the exclusion zone. The default exclusion zone is 0.2.

  • if float, expressed as a fraction of the windows size.

  • if int, exact size (0 < exclude).

n_jobsint, optional

The number of jobs to use when computing the profile.

Examples

>>> from wildboar.datasets import load_two_lead_ecg()
>>> from wildboar.transform import MatrixProfileTransform
>>> x, y = load_two_lead_ecg()
>>> t = MatrixProfileTransform()
>>> t.fit_transform(x)
fit(x, y=None)[source]#

Fit the matrix profile.

Sets the expected input dimensions.

Parameters:
xarray-like of shape (n_samples, n_timesteps) or (n_samples, n_dims, n_timesteps)

The samples.

yignored

The optional labels.

Returns:
self

A fitted instance.

fit_transform(X, y=None, **fit_params)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns:
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:
selfestimator instance

Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

transform(x)[source]#

Transform the samples to their MatrixProfile self-join.

Parameters:
xarray-like of shape (n_samples, n_timesteps) or (n_samples, n_dims, n_timesteps)

The samples.

Returns:
ndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timesteps)

The matrix matrix profile of each sample.

class wildboar.transform.PAA(n_intervals='sqrt', window=None)[source]#

Peicewise aggregate approximation.

Parameters:
n_intervals{“sqrt”, “log2”}, int or float, optional

The number of intervals.

windowint, optional

The size of an interval. If window, is given then n_intervals is ignored.

fit_transform(X, y=None, **fit_params)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns:
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:
selfestimator instance

Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

class wildboar.transform.PivotTransform(n_pivots=100, *, metric='auto', metric_params=None, metric_sample=None, random_state=None, n_jobs=None)[source]#

A transform using pivot time series and sampled distance metrics.

Parameters:
n_pivotsint, optional

The number of pivot time series.

metric{‘auto’} or list, optional
  • If str, the metric to compute the distance.

  • If list, multiple metrics specified as a list of tuples, where the first

    element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument ‘r’ with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).

Read more about the metrics and their parameters in the User guide.

metric_paramsdict, optional

Parameters for the distance measure. Ignored unless metric is a string.

Read more about the parameters in the User guide.

metric_sample{“uniform”, “weighted”}, optional

If multiple metrics are specified this parameter controls how they are sampled. “uniform” samples each metric configuration with equal probability and “weighted” samples each metric with equal probability. By default, metric configurations are sampled with equal probability.

random_stateint or np.RandomState, optional

The random state.

n_jobsint, optional

The number of cores to use.

fit(x, y=None)[source]#

Fit the transform.

Parameters:
xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)

The time series dataset.

yNone, optional

For compatibility.

Returns:
BaseAttributeTransform

This object.

fit_transform(x, y=None)[source]#

Fit the embedding and return the transform of x.

Parameters:
xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)

The time series dataset.

yNone, optional

For compatibility.

Returns:
ndarray of shape (n_samples, n_outputs)

The embedding.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:
selfestimator instance

Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

transform(x)[source]#

Transform the dataset.

Parameters:
xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)

The time series dataset.

Returns:
ndarray of shape (n_samples, n_outputs)

The transformation.

class wildboar.transform.ProximityTransform(n_pivots=100, metric='auto', metric_params=None, metric_sample='weighted', random_state=None, n_jobs=None)[source]#

Transform time series based on class conditional pivots.

Parameters:
n_pivotsint, optional

The number of pivot time series per class.

metric{‘auto’} or list, optional
  • If str, the metric to compute the distance.

  • If list, multiple metrics specified as a list of tuples, where the first

    element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument ‘r’ with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).

Read more about the metrics and their parameters in the User guide.

metric_paramsdict, optional

Parameters for the distance measure. Ignored unless metric is a string.

Read more about the parameters in the User guide.

metric_sample{“uniform”, “weighted”}, optional

If multiple metrics are specified this parameter controls how they are sampled. “uniform” samples each metric configuration with equal probability and “weighted” samples each metric with equal probability. By default, metric configurations are sampled with equal probability.

random_stateint or np.RandomState, optional

The random state.

n_jobsint, optional

The number of cores to use.

fit_transform(X, y=None, **fit_params)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns:
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:
selfestimator instance

Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

class wildboar.transform.QuantTransform(depth='auto', v=4, n_jobs=None)[source]#

Quant transformation

Computes quantiles over a fixed set of intervals on input time series and their transformations, using these quantiles for classification.

The Quant transform performs the following steps:

  1. Computes quantiles over fixed, dyadic intervals on the input time series.

  2. Applies three transformations to the time series (first difference, second difference, and Fourier transform).

Parameters:
depth{“auto”} or int, optional

The maximal depth. If set to auto, the depth is min(log2(n_timestep) + 1, 6).

vint, optional

The proportion of quantiles per interval given as k = m/v were m is the length of the interval.

n_jobsint, optional

The number of parallel jobs.

Notes

The implementation differs to the original in the following ways:

  1. Does not apply smoothing to the first order difference.

  2. Does not subtract the mean from every second quantile.

  3. Does not apply 1-order differences if the time series are shorter than 2 timesteps.

  4. Does not apply 2-order differences if the time series are shorter than 3 timesteps.

References

Dempster, Angus, Daniel F. Schmidt, and Geoffrey I. Webb.

“Quant: A Minimalist Interval Method for Time Series Classification.” Data Mining and Knowledge Discovery 38, no. 4 (July 1, 2024): 2377–2402. https://doi.org/10.1007/s10618-024-01036-9.

fit(X, y=None)[source]#

Fit the transform.

Parameters:
Xarray-like of shape (n_samples, n_dims, n_timestep)

The input data.

yignored
Returns:
self

The fitted estimator.

fit_transform(X, y=None, **fit_params)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns:
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:
selfestimator instance

Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

transform(X)[source]#

Transform the input samples.

Parameters:
Xarray-like of shape (n_samples, n_dims, n_timestep)

The input data.

Returns:
ndarray of shape (n_samples, n_outputs)

The transformed samples.

class wildboar.transform.RandomShapeletTransform(n_shapelets=1000, *, metric='euclidean', metric_params=None, min_shapelet_size=0.0, max_shapelet_size=1.0, n_jobs=None, random_state=None)[source]#

Random shapelet tranform.

Transform a time series to the distances to a selection of random shapelets.

Parameters:
n_shapeletsint, optional

The number of shapelets in the resulting transform.

metricstr or list, optional
  • If str, the distance metric used to identify the best shapelet.

  • If list, multiple metrics specified as a list of tuples, where the first

    element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument ‘r’ with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).

Read more about the metrics and their parameters in the User guide.

metric_paramsdict, optional

Parameters for the distance measure. Ignored unless metric is a string.

Read more about the parameters in the User guide.

min_shapelet_sizefloat, optional

Minimum shapelet size.

max_shapelet_sizefloat, optional

Maximum shapelet size.

n_jobsint, optional

The number of jobs to run in parallel. None means 1 and -1 means using all processors.

random_stateint or RandomState, optional
  • If int, random_state is the seed used by the random number generator

  • If RandomState instance, random_state is the random number generator

  • If None, the random number generator is the RandomState instance used

    by np.random.

Attributes:
embedding_Embedding

The underlying embedding object.

References

Wistuba, Martin, Josif Grabocka, and Lars Schmidt-Thieme.

Ultra-fast shapelets for time series classification. arXiv preprint arXiv:1503.05018 (2015).

Examples

Transform each time series to the minimum DTW distance to each shapelet

>>> from wildboar.dataset import load_gunpoint()
>>> from wildboar.transform import RandomShapeletTransform
>>> t = RandomShapeletTransform(metric="dtw")
>>> t.fit_transform(X)

Transform each time series to the either the minimum DTW distance, with r randomly set set between 0 and 1 or ERP distance with g between 0 and 1.

>>> t = RandomShapeletTransform(
...     metric=[
...         ("dtw", dict(min_r=0.0, max_r=1.0)),
...         ("erp", dict(min_g=0.0, max_g=1.0)),
...     ]
... )
>>> t.fit_transform(X)
fit(x, y=None)[source]#

Fit the transform.

Parameters:
xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)

The time series dataset.

yNone, optional

For compatibility.

Returns:
BaseAttributeTransform

This object.

fit_transform(x, y=None)[source]#

Fit the embedding and return the transform of x.

Parameters:
xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)

The time series dataset.

yNone, optional

For compatibility.

Returns:
ndarray of shape (n_samples, n_outputs)

The embedding.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:
selfestimator instance

Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

transform(x)[source]#

Transform the dataset.

Parameters:
xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)

The time series dataset.

Returns:
ndarray of shape (n_samples, n_outputs)

The transformation.

class wildboar.transform.RocketTransform(n_kernels=1000, *, sampling='normal', sampling_params=None, kernel_size=None, min_size=None, max_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, n_jobs=None, random_state=None)[source]#

Transform a time series using random convolution features.

Parameters:
n_kernelsint, optional

The number of kernels to sample at each node.

sampling{“normal”, “uniform”, “shapelet”}, optional

The sampling of convolutional filters.

  • if “normal”, sample filter according to a normal distribution with mean and scale.

  • if “uniform”, sample filter according to a uniform distribution with lower and upper.

  • if “shapelet”, sample filters as subsequences in the training data.

sampling_paramsdict, optional

Parameters for the sampling strategy.

  • if “normal”, {"mean": float, "scale": float}, defaults to {"mean": 0, "scale": 1}.

  • if “uniform”, {"lower": float, "upper": float}, defaults to {"lower": -1, "upper": 1}.

kernel_sizearray-like, optional

The kernel size, by default [7, 11, 13].

min_sizefloat, optional

The minimum timestep size used for generating kernel sizes, If set, kernel_size is ignored.

max_sizefloat, optional

The maximum timestep size used for generating kernel sizes, If set, kernel_size is ignored.

bias_probfloat, optional

The probability of using the bias term.

normalize_probfloat, optional

The probability of performing normalization.

padding_probfloat, optional

The probability of padding with zeros.

n_jobsint, optional

The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.

random_stateint or RandomState, optional

Controls the random resampling of the original dataset.

  • If int, random_state is the seed used by the random number generator.

  • If numpy.random.RandomState instance, random_state is the random number generator.

  • If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

Attributes:
embedding_Embedding

The underlying embedding

References

Dempster, Angus, François Petitjean, and Geoffrey I. Webb.

ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels. Data Mining and Knowledge Discovery 34.5 (2020): 1454-1495.

Examples

>>> from wildboar.datasets import load_gun_point
>>> from wildboar.transform import RocketTransform
>>> X, y = load_gun_point()
>>> t = RocketTransform(n_kernels=10, random_state=1)
>>> t.fit_transform(X)
array([[0.51333333, 5.11526939, 0.47333333, ..., 2.04712544, 0.24      ,
        0.82912261],
       [0.52666667, 5.26611524, 0.54      , ..., 1.98047216, 0.24      ,
        0.81260641],
       [0.54666667, 4.71210092, 0.35333333, ..., 2.28841158, 0.25333333,
        0.82203705],
       ...,
       [0.54666667, 4.72938203, 0.45333333, ..., 2.53756324, 0.24666667,
        0.8380654 ],
       [0.68666667, 3.80533684, 0.26      , ..., 2.41709413, 0.25333333,
        0.65634235],
       [0.66      , 3.94724793, 0.32666667, ..., 1.85575661, 0.25333333,
        0.67630249]])
fit(x, y=None)[source]#

Fit the transform.

Parameters:
xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)

The time series dataset.

yNone, optional

For compatibility.

Returns:
BaseAttributeTransform

This object.

fit_transform(x, y=None)[source]#

Fit the embedding and return the transform of x.

Parameters:
xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)

The time series dataset.

yNone, optional

For compatibility.

Returns:
ndarray of shape (n_samples, n_outputs)

The embedding.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:
selfestimator instance

Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

transform(x)[source]#

Transform the dataset.

Parameters:
xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)

The time series dataset.

Returns:
ndarray of shape (n_samples, n_outputs)

The transformation.

class wildboar.transform.SAX(*, n_intervals='sqrt', window=None, n_bins=4, binning='normal', estimate='deprecated', scale=True)[source]#

Symbolic aggregate approximation.

Parameters:
n_intervalsstr, optional

The number of intervals to use for the transform.

  • if “log2”, the number of intervals is log2(n_timestep).

  • if “sqrt”, the number of intervals is sqrt(n_timestep).

  • if int, the number of intervals is n_intervals.

  • if float, the number of intervals is n_intervals * n_timestep, with

    0 < n_intervals < 1.

windowint, optional

The window size. If window is set, the value of n_intervals has no effect.

n_binsint, optional

The number of bins.

binningstr, optional

The bin construction. By default the bins are defined according to the normal distribution. Possible values are “normal” for normally distributed bins or “uniform” for uniformly distributed bins.

estimatebool, optional

Estimate the distribution parameters for the binning from data.

If estimate=False, it is assumed that each time series is preprocessed using:

  • datasets.preprocess.normalize when binning=”normal”.

  • datasets.preprocess.minmax_scale. when binning=”uniform”.

scalebool, optional

Ensure that the input is correctly scaled.

If scale=False, it is assumed that each time series is preprocessed using:

  • datasets.preprocess.normalize when binning=”normal”.

  • datasets.preprocess.minmax_scale when binning=”uniform”.

fit_transform(X, y=None, **fit_params)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns:
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:
selfestimator instance

Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

class wildboar.transform.ShapeletTransform(n_shapelets='auto', *, metric='euclidean', metric_params=None, strategy='random', shapelet_size=0.1, sample_size=1.0, min_shapelet_size=0.0, max_shapelet_size=1.0, coverage_probability=None, variability=None, random_state=None, n_jobs=None)[source]#

Shapelet Transform.

Transform a time series to the distances to a selection of shapelets. The transform is unsupervised if strategy=”random” and supervised if strategy=”best”.

Parameters:
n_shapeletsint or {“log2”, “sqrt”, “auto”}, optional

The number of shapelets in the resulting transform.

  • if, “auto” the number of shapelets depend on the value of strategy. For “best” the number is 1; and for “random” it is 1000.

  • if, “log2”, the number of shaplets is the log2 of the total possible number of shapelets.

  • if, “sqrt”, the number of shaplets is the square root of the total possible number of shapelets.

metricstr or list, optional
  • If str, the distance metric used to identify the best shapelet.

  • If list, multiple metrics specified as a list of tuples, where the first

    element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specify a grid over the argument ‘r’ with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).

Read more about the metrics and their parameters in the User guide.

Warning

Multiple metrics are only supported if strategy=”random”.

metric_paramsdict, optional

Parameters for the distance measure. Ignored unless metric is a string.

Read more about the parameters in the User guide.

strategy{“best”, “random”}, optional

The strategy for selecting shapelets.

  • If “random”, n_shapelets shapelets are randomly selected in the range defined by min_shapelet_size and max_shapelet_size

  • If “best”, n_shapelets shapelets are selected per input sample of the size determined by shapelet_size.

If strategy is set to “best”, the transformation is supervised and requires y.

shapelet_sizeint, float or array-like, optional

The shapelet size if strategy=”best”.

  • If int, the exact shapelet size.

  • If float, a fraction of the number of input timestep.

  • If array-like, a list of float or int.

sample_sizefloat, optional

The size of the sample to determine the shapelets, if shapelet_size=”best”.

min_shapelet_sizefloat, optional

Minimum shapelet size.

max_shapelet_sizefloat, optional

Maximum shapelet size.

coverage_probabilityfloat, optional

The probability that a time step is covered by a shapelet, in the range 0 < coverage_probability <= 1.

  • For larger coverage_probability, we get larger shapelets.

  • For smaller coverage_probability, we get shorter shapelets.

variabilityfloat, optional

Controls the shape of the Beta distribution used to sample shapelets. Defaults to 1.

  • Higher variability creates more uniform intervals.

  • Lower variability creates more variable intervals sizes.

random_stateint or RandomState, optional
  • If int, random_state is the seed used by the random number generator

  • If RandomState instance, random_state is the random number generator

  • If None, the random number generator is the RandomState instance used

    by np.random.

n_jobsint, optional

The number of jobs to run in parallel. None means 1 and -1 means using all processors.

Attributes:
embedding_Embedding

The underlying embedding object.

References

Wistuba, Martin, Josif Grabocka, and Lars Schmidt-Thieme.

Ultra-fast shapelets for time series classification. arXiv preprint arXiv:1503.05018 (2015).

Examples

Transform each time series to the minimum DTW distance to each shapelet

>>> from wildboar.dataset import load_gunpoint()
>>> from wildboar.transform import ShapeletTransform
>>> t = ShapeletTransform(metric="dtw")
>>> t.fit_transform(X)

Transform each time series to the either the minimum DTW distance, with r randomly set set between 0 and 1 or ERP distance with g between 0 and 1.

>>> t = ShapeletTransform(
...     metric=[
...         ("dtw", dict(min_r=0.0, max_r=1.0)),
...         ("erp", dict(min_g=0.0, max_g=1.0)),
...     ]
... )
>>> t.fit_transform(X)

Transform each time series to the scaled euclidean distance between the most promising shapelet of size 38:

>>> t = ShapeletTransform(strategy="best", shapelet_size=38)
>>> t.fit_transform(X, y)
fit(x, y=None)[source]#

Fit the transform.

Parameters:
xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)

The time series dataset.

yNone, optional

For compatibility.

Returns:
BaseAttributeTransform

This object.

fit_transform(x, y=None)[source]#

Fit the embedding and return the transform of x.

Parameters:
xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)

The time series dataset.

yNone, optional

For compatibility.

Returns:
ndarray of shape (n_samples, n_outputs)

The embedding.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:
selfestimator instance

Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

transform(x)[source]#

Transform the dataset.

Parameters:
xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)

The time series dataset.

Returns:
ndarray of shape (n_samples, n_outputs)

The transformation.

wildboar.transform.convolve(X, kernel, bias=0.0, *, dilation=1, stride=1, padding=0)[source]#

Apply 1D convolution over a time series.

Parameters:
Xarray-like of shape (n_samples, n_timestep)

The input.

kernelarray-like of shape (kernel_size, )

The kernel.

biasfloat, optional

The bias.

dilationint, optional

The spacing between kernel elements.

strideint, optional

The stride of the convolving kernel.

paddingint, optional

Implicit padding on both sides of the input time series.

Returns:
ndarray of shape (n_samples, output_size)

The result of the convolution, where output_size is given by::

floor(
    ((X.shape[1] + 2 * padding) - (kernel.shape[0] - 1 * dilation + 1)) / stride
    + 1
).
wildboar.transform.piecewice_aggregate_approximation(x, *, n_intervals='sqrt', window=None)[source]#

Peicewise aggregate approximation.

Parameters:
xarray-like of shape (n_samples, n_timestep)

The input data.

n_intervalsstr, optional

The number of intervals to use for the transform.

  • if “log2”, the number of intervals is log2(n_timestep).

  • if “sqrt”, the number of intervals is sqrt(n_timestep).

  • if int, the number of intervals is n_intervals.

  • if float, the number of intervals is n_intervals * n_timestep, with

    0 < n_intervals < 1.

windowint, optional

The window size. If window is set, the value of n_intervals has no effect.

Returns:
ndarray of shape (n_samples, n_intervals)

The symbolic aggregate approximation.

wildboar.transform.symbolic_aggregate_approximation(x, *, n_intervals='sqrt', window=None, n_bins=4, binning='normal')[source]#

Symbolic aggregate approximation.

Parameters:
xarray-like of shape (n_samples, n_timestep)

The input data.

n_intervalsstr, optional

The number of intervals to use for the transform.

  • if “log2”, the number of intervals is log2(n_timestep).

  • if “sqrt”, the number of intervals is sqrt(n_timestep).

  • if int, the number of intervals is n_intervals.

  • if float, the number of intervals is n_intervals * n_timestep, with

    0 < n_intervals < 1.

windowint, optional

The window size. If window is set, the value of n_intervals has no effect.

n_binsint, optional

The number of bins.

binningstr, optional

The bin construction. By default the bins are defined according to the normal distribution. Possible values are "normal" for normally distributed bins or "uniform" for uniformly distributed bins.

Returns:
ndarray of shape (n_samples, n_intervals)

The symbolic aggregate approximation.