wildboar.transform
#
Transform raw time series to tabular representations.
Classes#
Competing Dialated Shapelet Transform. |
|
Perform derivative transformation on time series data. |
|
A transformer that applies a difference transformation to time series data. |
|
Dilated shapelet transform. |
|
Transform a time series as a number of features. |
|
Discrete Fourier Transform. |
|
A Dictionary based method using convolutional kernels. |
|
Embed a time series as a collection of features per interval. |
|
Matrix profile transform. |
|
Peicewise aggregate approximation. |
|
A transform using pivot time series and sampled distance metrics. |
|
Transform time series based on class conditional pivots. |
|
Quant transformation |
|
Random shapelet tranform. |
|
Transform a time series using random convolution features. |
|
Symbolic aggregate approximation. |
|
Shapelet Transform. |
Functions#
|
Apply 1D convolution over a time series. |
|
Peicewise aggregate approximation. |
|
Symbolic aggregate approximation. |
- class wildboar.transform.CastorTransform(n_groups=64, n_shapelets=8, *, metric='euclidean', metric_params=None, normalize_prob=0.8, shapelet_size=11, lower=0.05, upper=0.1, soft_min=True, soft_max=False, soft_threshold=True, ignore_y=False, random_state=None, n_jobs=None)[source]#
Competing Dialated Shapelet Transform.
- Parameters:
- n_groupsint, optional
The number of groups of dilated shapelets.
- n_shapeletsint, optional
The number of dilated shapelets per group.
- metricstr or callable, optional
The distance metric
See
_METRICS.keys()
for a list of supported metrics.- metric_paramsdict, optional
Parameters to the metric.
Read more about the parameters in the User guide.
- normalize_probfloat, optional
The probability of standardizing a shapelet with zero mean and unit standard deviation.
- shapelet_sizeint, optional
The length of the dilated shapelet.
- lowerfloat, optional
The lower percentile to draw distance thresholds above.
- upperfloat, optional
The upper percentile to draw distance thresholds below.
- soft_minbool, optional
If True, use the sum of minimal distances. Otherwise, use the count of minimal distances.
- soft_maxbool, optional
If True, use the sum of maximal distances. Otherwise, use the count of maximal distances.
- soft_thresholdbool, optional
If True, count the time steps below the threshold for all shapelets. Otherwise, count the time steps below the threshold for the shapelet with the minimal distance.
- ignore_ybool, optional
Ignore y and use the same sample which a shapelet is sampled from to estimate the distance threshold.
- random_stateint or RandomState, optional
Controls the random sampling of kernels.
If int, random_state is the seed used by the random number generator.
If
numpy.random.RandomState
instance, random_state is the random number generator.If None, the random number generator is the
numpy.random.RandomState
instance used bynumpy.random
.
- n_jobsint, optional
The number of parallel jobs.
Notes
For better performance with multivariate datasets, set n_shapelets to n_shapelets * n_dims to ensure feature variability.
- fit(x, y=None)[source]#
Fit the transform.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)
The time series dataset.
- yNone, optional
For compatibility.
- Returns:
- BaseAttributeTransform
This object.
- fit_transform(x, y=None)[source]#
Fit the embedding and return the transform of x.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)
The time series dataset.
- yNone, optional
For compatibility.
- Returns:
- ndarray of shape (n_samples, n_outputs)
The embedding.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.transform.DerivativeTransform(method='slope')[source]#
Perform derivative transformation on time series data.
- Parameters:
- methodstr, optional
The method to use for the derivative transformation. Must be one of: “slope”, “central”, or “backward”.
“backward”, computes the derivative at each point using the difference between the current and previous elements.
“central”, computes the derivative at each point using the average of the differences between the next and previous elements.
“slope”, computes a smoothed derivative at each point by averaging the difference between the current and previous elements with half the difference between the next and previous elements.
- fit(X, y=None)[source]#
Fit the model to the provided data.
Only performs input validation.
- Parameters:
- Xarray-like of shape (n_samples, n_dims, n_timesteps)
The input data to fit the model.
- yarray-like, optional
Not used.
- Returns:
- object
Returns the instance itself.
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.transform.DiffTransform(order=1)[source]#
A transformer that applies a difference transformation to time series data.
- Parameters:
- orderint, optional
The order of the difference operation. Default is 1.
- fit(X, y=None)[source]#
Fit the model to the provided data.
- Parameters:
- Xarray-like of shape (n_samples, n_dims, n_timesteps)
The input data to fit the model. Must have at least two timesteps.
- yarray-like, optional
Not used.
- Returns:
- object
Returns the instance of the fitted model.
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.transform.DilatedShapeletTransform(n_shapelets=1000, *, metric='euclidean', metric_params=None, normalize_prob=0.5, min_shapelet_size=None, max_shapelet_size=None, shapelet_size=None, lower=0.05, upper=0.1, ignore_y=False, random_state=None, n_jobs=None)[source]#
Dilated shapelet transform.
Transform time series to a representation consisting of three values per shapelet: minimum dilated distance, the index of the timestep that minimizes the distance and number of subsequences that are below a distance threshold.
- Parameters:
- n_shapeletsint, optional
The number of dilated shapelets.
- metricstr or callable, optional
The distance metric
See
_METRICS.keys()
for a list of supported metrics.- metric_paramsdict, optional
Parameters to the metric.
Read more about the parameters in the User guide.
- normalize_probfloat, optional
The probability of standardizing a shapelet with zero mean and unit standard deviation.
- min_shapelet_sizefloat, optional
The minimum shapelet size. If None, use the discrete sizes in shapelet_size.
- max_shapelet_sizefloat, optional
The maximum shapelet size. If None, use the discrete sizes in shapelet_size.
- shapelet_sizearray-like, optional
The size of shapelets, by default [7, 9, 11].
- lowerfloat, optional
The lower percentile to draw distance thresholds above.
- upperfloat, optional
The upper percentile to draw distance thresholds below.
- ignore_ybool, optional
Ignore y and use the same sample which a shapelet is sampled from to estimate the distance threshold.
- random_stateint or RandomState, optional
Controls the random sampling of kernels.
If int, random_state is the seed used by the random number generator.
If
numpy.random.RandomState
instance, random_state is the random number generator.If None, the random number generator is the
numpy.random.RandomState
instance used bynumpy.random
.
- n_jobsint, optional
The number of parallel jobs.
References
- Antoine Guillaume, Christel Vrain, Elloumi Wael
Random Dilated Shapelet Transform: A New Approach for Time Series Shapelets Pattern Recognition and Artificial Intelligence, 2022
- fit(x, y=None)[source]#
Fit the transform.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)
The time series dataset.
- yNone, optional
For compatibility.
- Returns:
- BaseAttributeTransform
This object.
- fit_transform(x, y=None)[source]#
Fit the embedding and return the transform of x.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)
The time series dataset.
- yNone, optional
For compatibility.
- Returns:
- ndarray of shape (n_samples, n_outputs)
The embedding.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.transform.FeatureTransform(*, summarizer='catch22', n_jobs=None)[source]#
Transform a time series as a number of features.
- Parameters:
- summarizerstr or list, optional
The method to summarize each interval.
if str, the summarizer is determined by _SUMMARIZERS.keys().
if list, the summarizer is a list of functions f(x) -> float, where x is a numpy array.
The default summarizer summarizes each time series using catch22-features.
- n_jobsint, optional
The number of cores to use on multi-core.
Examples
>>> from wildboar.datasets import load_gun_point >>> X, y = load_gun_point() >>> X_t = FeatureTransform().fit_transform(X) >>> X_t[0] array([-5.19633603e-01, -6.51047206e-01, 1.90000000e+01, 4.80000000e+01, 7.48441896e-01, -2.73293560e-05, 2.21476510e-01, 4.70000000e+01, 4.00000000e-02, 0.00000000e+00, 2.70502518e+00, 2.60000000e+01, 6.42857143e-01, 1.00000000e-01, -3.26666667e-01, 9.89974643e-01, 2.90000000e+01, 1.31570726e+00, 1.50000000e-01, 8.50000000e-01, 4.90873852e-02, 1.47311800e-01])
- fit(x, y=None)[source]#
Fit the transform.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)
The time series dataset.
- yNone, optional
For compatibility.
- Returns:
- BaseAttributeTransform
This object.
- fit_transform(x, y=None)[source]#
Fit the embedding and return the transform of x.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)
The time series dataset.
- yNone, optional
For compatibility.
- Returns:
- ndarray of shape (n_samples, n_outputs)
The embedding.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.transform.FftTransform(spectrum='amplitude')[source]#
Discrete Fourier Transform.
- Parameters:
- spectrum{“amplitude”, “phase”}, optional
The spectrum of FFT transformation.
- fit(X, y=None)[source]#
Fit the estimator.
- Parameters:
- Xarray-like of shape (n_samples, n_dims, n_timesteps)
The samples.
- yignore, optional
Ignored.
- Returns:
- self
This instance.
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.transform.HydraTransform(*, n_groups=64, n_kernels=8, kernel_size=9, sampling='normal', sampling_params=None, n_jobs=None, random_state=None)[source]#
A Dictionary based method using convolutional kernels.
- Parameters:
- n_groupsint, optional
The number of groups of kernels.
- n_kernelsint, optional
The number of kernels per group.
- kernel_sizeint, optional
The size of the kernel.
- sampling{“normal”}, optional
The strategy for sampling kernels. By default kernel weights are sampled from a normal distribution with zero mean and unit standard deviation.
- sampling_paramsdict, optional
Parameters to the sampling approach. The “normal” sampler accepts two parameters: mean and scale.
- n_jobsint, optional
The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.
- random_stateint or RandomState, optional
Controls the random sampling of kernels.
If int, random_state is the seed used by the random number generator.
If
numpy.random.RandomState
instance, random_state is the random number generator.If None, the random number generator is the
numpy.random.RandomState
instance used bynumpy.random
.
- Attributes:
- embedding_Embedding
The underlying embedding
See also
HydraClassifier
A classifier using hydra transform.
Notes
The implementation does not implement the first order descrete differences described by Dempster et. al. (2023). If this is desired, one can use native scikit-learn functionalities and the
DiffTransform
:>>> from sklearn.pipeline import make_pipeline, make_union >>> from wildboar.transform import DiffTransform, HydraTransform >>> dempster_hydra = make_union( ... HydraTransform(n_groups=32), ... make_pipeline( ... DiffTransform(), ... HydraTransform(n_groups=32) ... ) ... )
References
- Dempster, A., Schmidt, D. F., & Webb, G. I. (2023).
Hydra: competing convolutional kernels for fast and accurate time series classification. Data Mining and Knowledge Discovery
Examples
>>> from wildboar.datasets import load_gun_point >>> from wildboar.transform import HydraTransform >>> X, y = load_gun_point() >>> t = HydraTransform(n_groups=8, n_kernels=4, random_state=1) >>> t.fit_transform(X)
- fit(x, y=None)[source]#
Fit the transform.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)
The time series dataset.
- yNone, optional
For compatibility.
- Returns:
- BaseAttributeTransform
This object.
- fit_transform(x, y=None)[source]#
Fit the embedding and return the transform of x.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)
The time series dataset.
- yNone, optional
For compatibility.
- Returns:
- ndarray of shape (n_samples, n_outputs)
The embedding.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.transform.IntervalTransform(n_intervals='sqrt', *, intervals='fixed', sample_size=None, depth=None, min_size=0.0, max_size=1.0, coverage_probability=None, variability=1, summarizer='mean_var_slope', summarizer_params=None, n_jobs=None, random_state=None)[source]#
Embed a time series as a collection of features per interval.
- Parameters:
- n_intervalsstr, int or float, optional
The number of intervals to use for the transform.
if “log2”, the number of intervals is log2(n_timestep).
if “sqrt”, the number of intervals is sqrt(n_timestep).
if int, the number of intervals is n_intervals.
if float, the number of intervals is n_intervals * n_timestep, with 0 < n_intervals < 1.
Deprecated since version 1.2: The option “log” has been renamed to “log2”.
- intervalsstr, optional
The method for selecting intervals.
if “fixed”, n_intervals non-overlapping intervals.
if “dyadic”, `2**depth-1+2**depth-1-depth” intervals.
if “random”, n_intervals possibly overlapping intervals of randomly sampled in [min_size * n_timestep, max_size * n_timestep].
Read more in the User Guide
- sample_sizefloat, optional
The sample size of fixed intervals if intervals=”fixed”.
- depthint, optional
The maximum depth for dyadic intervals if intervals=”dyadic”.
- min_sizefloat, optional
The minimum interval size if intervals=”random”. Ignored if coverage_probability is set.
- max_sizefloat, optional
The maximum interval size if intervals=”random”. Ignored if coverage_probability is set.
- coverage_probabilityfloat, optional
The probability that a time step is covered by an interval, in the range 0 < coverage_probability <= 1.
For larger coverage_probability, we get larger intervals.
For smaller coverage_probability, we get shorter intervals.
- variabilityfloat, optional
Controls the shape of the Beta distribution used to sample intervals. Defaults to 1.
Higher variability creates more uniform intervals.
Lower variability creates more variable intervals sizes.
- summarizerstr or list, optional
The method to summarize each interval.
if str, the summarizer is determined by _SUMMARIZERS.keys().
if list, the summarizer is a list of functions f(x) -> float, where x is a numpy array.
The default summarizer summarizes each interval as its mean, standard deviation and slope.
Read more in the User Guide
- summarizer_paramsdict, optional
A dictionary of parameters to the summarizer.
- n_jobsint, optional
The number of cores to use on multi-core.
- random_stateint or RandomState, optional
If int, random_state is the seed used by the random number generator
If RandomState instance, random_state is the random number generator
If None, the random number generator is the RandomState instance used by np.random.
Notes
Parallelization depends on releasing the global interpreter lock (GIL). As such, custom functions as summarizers reduces the performance. Wildboar implements summarizers for taking the mean (“mean”), variance (“variance”) and slope (“slope”) as well as their combination (“mean_var_slope”) and the full suite of catch22 features (“catch22”). In the future, we will allow downstream projects to implement their own summarizers in Cython which will allow for releasing the GIL.
References
- Lubba, Carl H., Sarab S. Sethi, Philip Knaute, Simon R. Schultz, Ben D. Fulcher, and Nick S. Jones.
catch22: Canonical time-series characteristics. Data Mining and Knowledge Discovery 33, no. 6 (2019): 1821-1852.
Examples
>>> from wildboar.datasets import load_dataset >>> x, y = load_dataset("GunPoint") >>> t = IntervalTransform(n_intervals=10, summarizer="mean") >>> t.fit_transform(x)
Each interval (15 timepoints) are transformed to their mean.
>>> t = IntervalTransform(n_intervals="sqrt", summarizer=[np.mean, np.std]) >>> t.fit_transform(x)
Each interval (150 // 12 timepoints) are transformed to two features. The mean and the standard deviation.
- fit(x, y=None)[source]#
Fit the transform.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)
The time series dataset.
- yNone, optional
For compatibility.
- Returns:
- BaseAttributeTransform
This object.
- fit_transform(x, y=None)[source]#
Fit the embedding and return the transform of x.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)
The time series dataset.
- yNone, optional
For compatibility.
- Returns:
- ndarray of shape (n_samples, n_outputs)
The embedding.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.transform.MatrixProfileTransform(window=0.1, exclude=None, n_jobs=None)[source]#
Matrix profile transform.
Transform each time series in a dataset to its MatrixProfile similarity self-join.
- Parameters:
- windowint or float, optional
The subsequence size, by default 0.1.
if float, a fraction of n_timestep.
if int, the exact subsequence size.
- excludeint or float, optional
The size of the exclusion zone. The default exclusion zone is 0.2.
if float, expressed as a fraction of the windows size.
if int, exact size (0 < exclude).
- n_jobsint, optional
The number of jobs to use when computing the profile.
Examples
>>> from wildboar.datasets import load_two_lead_ecg() >>> from wildboar.transform import MatrixProfileTransform >>> x, y = load_two_lead_ecg() >>> t = MatrixProfileTransform() >>> t.fit_transform(x)
- fit(x, y=None)[source]#
Fit the matrix profile.
Sets the expected input dimensions.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps) or (n_samples, n_dims, n_timesteps)
The samples.
- yignored
The optional labels.
- Returns:
- self
A fitted instance.
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- transform(x)[source]#
Transform the samples to their MatrixProfile self-join.
- Parameters:
- xarray-like of shape (n_samples, n_timesteps) or (n_samples, n_dims, n_timesteps)
The samples.
- Returns:
- ndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timesteps)
The matrix matrix profile of each sample.
- class wildboar.transform.PAA(n_intervals='sqrt', window=None)[source]#
Peicewise aggregate approximation.
- Parameters:
- n_intervals{“sqrt”, “log2”}, int or float, optional
The number of intervals.
- windowint, optional
The size of an interval. If window, is given then n_intervals is ignored.
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.transform.PivotTransform(n_pivots=100, *, metric='auto', metric_params=None, metric_sample=None, random_state=None, n_jobs=None)[source]#
A transform using pivot time series and sampled distance metrics.
- Parameters:
- n_pivotsint, optional
The number of pivot time series.
- metric{‘auto’} or list, optional
If str, the metric to compute the distance.
- If list, multiple metrics specified as a list of tuples, where the first
element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument ‘r’ with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).
Read more about the metrics and their parameters in the User guide.
- metric_paramsdict, optional
Parameters for the distance measure. Ignored unless metric is a string.
Read more about the parameters in the User guide.
- metric_sample{“uniform”, “weighted”}, optional
If multiple metrics are specified this parameter controls how they are sampled. “uniform” samples each metric configuration with equal probability and “weighted” samples each metric with equal probability. By default, metric configurations are sampled with equal probability.
- random_stateint or np.RandomState, optional
The random state.
- n_jobsint, optional
The number of cores to use.
- fit(x, y=None)[source]#
Fit the transform.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)
The time series dataset.
- yNone, optional
For compatibility.
- Returns:
- BaseAttributeTransform
This object.
- fit_transform(x, y=None)[source]#
Fit the embedding and return the transform of x.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)
The time series dataset.
- yNone, optional
For compatibility.
- Returns:
- ndarray of shape (n_samples, n_outputs)
The embedding.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.transform.ProximityTransform(n_pivots=100, metric='auto', metric_params=None, metric_sample='weighted', random_state=None, n_jobs=None)[source]#
Transform time series based on class conditional pivots.
- Parameters:
- n_pivotsint, optional
The number of pivot time series per class.
- metric{‘auto’} or list, optional
If str, the metric to compute the distance.
- If list, multiple metrics specified as a list of tuples, where the first
element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument ‘r’ with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).
Read more about the metrics and their parameters in the User guide.
- metric_paramsdict, optional
Parameters for the distance measure. Ignored unless metric is a string.
Read more about the parameters in the User guide.
- metric_sample{“uniform”, “weighted”}, optional
If multiple metrics are specified this parameter controls how they are sampled. “uniform” samples each metric configuration with equal probability and “weighted” samples each metric with equal probability. By default, metric configurations are sampled with equal probability.
- random_stateint or np.RandomState, optional
The random state.
- n_jobsint, optional
The number of cores to use.
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.transform.QuantTransform(depth='auto', v=4, n_jobs=None)[source]#
Quant transformation
Computes quantiles over a fixed set of intervals on input time series and their transformations, using these quantiles for classification.
The Quant transform performs the following steps:
Computes quantiles over fixed, dyadic intervals on the input time series.
Applies three transformations to the time series (first difference, second difference, and Fourier transform).
- Parameters:
- depth{“auto”} or int, optional
The maximal depth. If set to auto, the depth is min(log2(n_timestep) + 1, 6).
- vint, optional
The proportion of quantiles per interval given as k = m/v were m is the length of the interval.
- n_jobsint, optional
The number of parallel jobs.
Notes
The implementation differs to the original in the following ways:
Does not apply smoothing to the first order difference.
Does not subtract the mean from every second quantile.
Does not apply 1-order differences if the time series are shorter than 2 timesteps.
Does not apply 2-order differences if the time series are shorter than 3 timesteps.
References
- Dempster, Angus, Daniel F. Schmidt, and Geoffrey I. Webb.
“Quant: A Minimalist Interval Method for Time Series Classification.” Data Mining and Knowledge Discovery 38, no. 4 (July 1, 2024): 2377–2402. https://doi.org/10.1007/s10618-024-01036-9.
- fit(X, y=None)[source]#
Fit the transform.
- Parameters:
- Xarray-like of shape (n_samples, n_dims, n_timestep)
The input data.
- yignored
- Returns:
- self
The fitted estimator.
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.transform.RandomShapeletTransform(n_shapelets=1000, *, metric='euclidean', metric_params=None, min_shapelet_size=0.0, max_shapelet_size=1.0, n_jobs=None, random_state=None)[source]#
Random shapelet tranform.
Transform a time series to the distances to a selection of random shapelets.
- Parameters:
- n_shapeletsint, optional
The number of shapelets in the resulting transform.
- metricstr or list, optional
If str, the distance metric used to identify the best shapelet.
- If list, multiple metrics specified as a list of tuples, where the first
element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specifiy a grid over the argument ‘r’ with 10 values in the range 0 to 1, we would give the following specification:
dict(min_r=0, max_r=1, num_r=10)
.
Read more about the metrics and their parameters in the User guide.
- metric_paramsdict, optional
Parameters for the distance measure. Ignored unless metric is a string.
Read more about the parameters in the User guide.
- min_shapelet_sizefloat, optional
Minimum shapelet size.
- max_shapelet_sizefloat, optional
Maximum shapelet size.
- n_jobsint, optional
The number of jobs to run in parallel. None means 1 and -1 means using all processors.
- random_stateint or RandomState, optional
If int, random_state is the seed used by the random number generator
If RandomState instance, random_state is the random number generator
- If None, the random number generator is the RandomState instance used
by np.random.
- Attributes:
- embedding_Embedding
The underlying embedding object.
References
- Wistuba, Martin, Josif Grabocka, and Lars Schmidt-Thieme.
Ultra-fast shapelets for time series classification. arXiv preprint arXiv:1503.05018 (2015).
Examples
Transform each time series to the minimum DTW distance to each shapelet
>>> from wildboar.dataset import load_gunpoint() >>> from wildboar.transform import RandomShapeletTransform >>> t = RandomShapeletTransform(metric="dtw") >>> t.fit_transform(X)
Transform each time series to the either the minimum DTW distance, with r randomly set set between 0 and 1 or ERP distance with g between 0 and 1.
>>> t = RandomShapeletTransform( ... metric=[ ... ("dtw", dict(min_r=0.0, max_r=1.0)), ... ("erp", dict(min_g=0.0, max_g=1.0)), ... ] ... ) >>> t.fit_transform(X)
- fit(x, y=None)[source]#
Fit the transform.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)
The time series dataset.
- yNone, optional
For compatibility.
- Returns:
- BaseAttributeTransform
This object.
- fit_transform(x, y=None)[source]#
Fit the embedding and return the transform of x.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)
The time series dataset.
- yNone, optional
For compatibility.
- Returns:
- ndarray of shape (n_samples, n_outputs)
The embedding.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.transform.RocketTransform(n_kernels=1000, *, sampling='normal', sampling_params=None, kernel_size=None, min_size=None, max_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, n_jobs=None, random_state=None)[source]#
Transform a time series using random convolution features.
- Parameters:
- n_kernelsint, optional
The number of kernels to sample at each node.
- sampling{“normal”, “uniform”, “shapelet”}, optional
The sampling of convolutional filters.
if “normal”, sample filter according to a normal distribution with
mean
andscale
.if “uniform”, sample filter according to a uniform distribution with
lower
andupper
.if “shapelet”, sample filters as subsequences in the training data.
- sampling_paramsdict, optional
Parameters for the sampling strategy.
if “normal”,
{"mean": float, "scale": float}
, defaults to{"mean": 0, "scale": 1}
.if “uniform”,
{"lower": float, "upper": float}
, defaults to{"lower": -1, "upper": 1}
.
- kernel_sizearray-like, optional
The kernel size, by default
[7, 11, 13]
.- min_sizefloat, optional
The minimum timestep size used for generating kernel sizes, If set,
kernel_size
is ignored.- max_sizefloat, optional
The maximum timestep size used for generating kernel sizes, If set,
kernel_size
is ignored.- bias_probfloat, optional
The probability of using the bias term.
- normalize_probfloat, optional
The probability of performing normalization.
- padding_probfloat, optional
The probability of padding with zeros.
- n_jobsint, optional
The number of jobs to run in parallel. A value of
None
means using a single core and a value of-1
means using all cores. Positive integers mean the exact number of cores.- random_stateint or RandomState, optional
Controls the random resampling of the original dataset.
If
int
,random_state
is the seed used by the random number generator.If
numpy.random.RandomState
instance,random_state
is the random number generator.If
None
, the random number generator is thenumpy.random.RandomState
instance used bynumpy.random
.
- Attributes:
- embedding_Embedding
The underlying embedding
References
- Dempster, Angus, François Petitjean, and Geoffrey I. Webb.
ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels. Data Mining and Knowledge Discovery 34.5 (2020): 1454-1495.
Examples
>>> from wildboar.datasets import load_gun_point >>> from wildboar.transform import RocketTransform >>> X, y = load_gun_point() >>> t = RocketTransform(n_kernels=10, random_state=1) >>> t.fit_transform(X) array([[0.51333333, 5.11526939, 0.47333333, ..., 2.04712544, 0.24 , 0.82912261], [0.52666667, 5.26611524, 0.54 , ..., 1.98047216, 0.24 , 0.81260641], [0.54666667, 4.71210092, 0.35333333, ..., 2.28841158, 0.25333333, 0.82203705], ..., [0.54666667, 4.72938203, 0.45333333, ..., 2.53756324, 0.24666667, 0.8380654 ], [0.68666667, 3.80533684, 0.26 , ..., 2.41709413, 0.25333333, 0.65634235], [0.66 , 3.94724793, 0.32666667, ..., 1.85575661, 0.25333333, 0.67630249]])
- fit(x, y=None)[source]#
Fit the transform.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)
The time series dataset.
- yNone, optional
For compatibility.
- Returns:
- BaseAttributeTransform
This object.
- fit_transform(x, y=None)[source]#
Fit the embedding and return the transform of x.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)
The time series dataset.
- yNone, optional
For compatibility.
- Returns:
- ndarray of shape (n_samples, n_outputs)
The embedding.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.transform.SAX(*, n_intervals='sqrt', window=None, n_bins=4, binning='normal', estimate='deprecated', scale=True)[source]#
Symbolic aggregate approximation.
- Parameters:
- n_intervalsstr, optional
The number of intervals to use for the transform.
if “log2”, the number of intervals is log2(n_timestep).
if “sqrt”, the number of intervals is sqrt(n_timestep).
if int, the number of intervals is n_intervals.
- if float, the number of intervals is n_intervals * n_timestep, with
0 < n_intervals < 1.
- windowint, optional
The window size. If window is set, the value of n_intervals has no effect.
- n_binsint, optional
The number of bins.
- binningstr, optional
The bin construction. By default the bins are defined according to the normal distribution. Possible values are “normal” for normally distributed bins or “uniform” for uniformly distributed bins.
- estimatebool, optional
Estimate the distribution parameters for the binning from data.
If estimate=False, it is assumed that each time series is preprocessed using:
datasets.preprocess.normalize
when binning=”normal”.datasets.preprocess.minmax_scale
. when binning=”uniform”.
- scalebool, optional
Ensure that the input is correctly scaled.
If scale=False, it is assumed that each time series is preprocessed using:
datasets.preprocess.normalize
when binning=”normal”.datasets.preprocess.minmax_scale
when binning=”uniform”.
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.transform.ShapeletTransform(n_shapelets='auto', *, metric='euclidean', metric_params=None, strategy='random', shapelet_size=0.1, sample_size=1.0, min_shapelet_size=0.0, max_shapelet_size=1.0, coverage_probability=None, variability=None, random_state=None, n_jobs=None)[source]#
Shapelet Transform.
Transform a time series to the distances to a selection of shapelets. The transform is unsupervised if strategy=”random” and supervised if strategy=”best”.
- Parameters:
- n_shapeletsint or {“log2”, “sqrt”, “auto”}, optional
The number of shapelets in the resulting transform.
if, “auto” the number of shapelets depend on the value of strategy. For “best” the number is 1; and for “random” it is 1000.
if, “log2”, the number of shaplets is the log2 of the total possible number of shapelets.
if, “sqrt”, the number of shaplets is the square root of the total possible number of shapelets.
- metricstr or list, optional
If str, the distance metric used to identify the best shapelet.
- If list, multiple metrics specified as a list of tuples, where the first
element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specify a grid over the argument ‘r’ with 10 values in the range 0 to 1, we would give the following specification:
dict(min_r=0, max_r=1, num_r=10)
.
Read more about the metrics and their parameters in the User guide.
Warning
Multiple metrics are only supported if strategy=”random”.
- metric_paramsdict, optional
Parameters for the distance measure. Ignored unless metric is a string.
Read more about the parameters in the User guide.
- strategy{“best”, “random”}, optional
The strategy for selecting shapelets.
If “random”, n_shapelets shapelets are randomly selected in the range defined by min_shapelet_size and max_shapelet_size
If “best”, n_shapelets shapelets are selected per input sample of the size determined by shapelet_size.
If strategy is set to “best”, the transformation is supervised and requires y.
- shapelet_sizeint, float or array-like, optional
The shapelet size if strategy=”best”.
If int, the exact shapelet size.
If float, a fraction of the number of input timestep.
If array-like, a list of float or int.
- sample_sizefloat, optional
The size of the sample to determine the shapelets, if shapelet_size=”best”.
- min_shapelet_sizefloat, optional
Minimum shapelet size.
- max_shapelet_sizefloat, optional
Maximum shapelet size.
- coverage_probabilityfloat, optional
The probability that a time step is covered by a shapelet, in the range 0 < coverage_probability <= 1.
For larger coverage_probability, we get larger shapelets.
For smaller coverage_probability, we get shorter shapelets.
- variabilityfloat, optional
Controls the shape of the Beta distribution used to sample shapelets. Defaults to 1.
Higher variability creates more uniform intervals.
Lower variability creates more variable intervals sizes.
- random_stateint or RandomState, optional
If int, random_state is the seed used by the random number generator
If RandomState instance, random_state is the random number generator
- If None, the random number generator is the RandomState instance used
by np.random.
- n_jobsint, optional
The number of jobs to run in parallel. None means 1 and -1 means using all processors.
- Attributes:
- embedding_Embedding
The underlying embedding object.
References
- Wistuba, Martin, Josif Grabocka, and Lars Schmidt-Thieme.
Ultra-fast shapelets for time series classification. arXiv preprint arXiv:1503.05018 (2015).
Examples
Transform each time series to the minimum DTW distance to each shapelet
>>> from wildboar.dataset import load_gunpoint() >>> from wildboar.transform import ShapeletTransform >>> t = ShapeletTransform(metric="dtw") >>> t.fit_transform(X)
Transform each time series to the either the minimum DTW distance, with r randomly set set between 0 and 1 or ERP distance with g between 0 and 1.
>>> t = ShapeletTransform( ... metric=[ ... ("dtw", dict(min_r=0.0, max_r=1.0)), ... ("erp", dict(min_g=0.0, max_g=1.0)), ... ] ... ) >>> t.fit_transform(X)
Transform each time series to the scaled euclidean distance between the most promising shapelet of size 38:
>>> t = ShapeletTransform(strategy="best", shapelet_size=38) >>> t.fit_transform(X, y)
- fit(x, y=None)[source]#
Fit the transform.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)
The time series dataset.
- yNone, optional
For compatibility.
- Returns:
- BaseAttributeTransform
This object.
- fit_transform(x, y=None)[source]#
Fit the embedding and return the transform of x.
- Parameters:
- xarray-like of shape (n_samples, n_timestep) or (n_samples, n_dimensions, n_timestep)
The time series dataset.
- yNone, optional
For compatibility.
- Returns:
- ndarray of shape (n_samples, n_outputs)
The embedding.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- wildboar.transform.convolve(X, kernel, bias=0.0, *, dilation=1, stride=1, padding=0)[source]#
Apply 1D convolution over a time series.
- Parameters:
- Xarray-like of shape (n_samples, n_timestep)
The input.
- kernelarray-like of shape (kernel_size, )
The kernel.
- biasfloat, optional
The bias.
- dilationint, optional
The spacing between kernel elements.
- strideint, optional
The stride of the convolving kernel.
- paddingint, optional
Implicit padding on both sides of the input time series.
- Returns:
- ndarray of shape (n_samples, output_size)
The result of the convolution, where output_size is given by::
floor( ((X.shape[1] + 2 * padding) - (kernel.shape[0] - 1 * dilation + 1)) / stride + 1 ).
- wildboar.transform.piecewice_aggregate_approximation(x, *, n_intervals='sqrt', window=None)[source]#
Peicewise aggregate approximation.
- Parameters:
- xarray-like of shape (n_samples, n_timestep)
The input data.
- n_intervalsstr, optional
The number of intervals to use for the transform.
if “log2”, the number of intervals is
log2(n_timestep)
.if “sqrt”, the number of intervals is
sqrt(n_timestep)
.if int, the number of intervals is
n_intervals
.- if float, the number of intervals is
n_intervals * n_timestep
, with 0 < n_intervals < 1
.
- if float, the number of intervals is
- windowint, optional
The window size. If
window
is set, the value ofn_intervals
has no effect.
- Returns:
- ndarray of shape (n_samples, n_intervals)
The symbolic aggregate approximation.
- wildboar.transform.symbolic_aggregate_approximation(x, *, n_intervals='sqrt', window=None, n_bins=4, binning='normal')[source]#
Symbolic aggregate approximation.
- Parameters:
- xarray-like of shape (n_samples, n_timestep)
The input data.
- n_intervalsstr, optional
The number of intervals to use for the transform.
if “log2”, the number of intervals is
log2(n_timestep)
.if “sqrt”, the number of intervals is
sqrt(n_timestep)
.if int, the number of intervals is
n_intervals
.- if float, the number of intervals is
n_intervals * n_timestep
, with 0 < n_intervals < 1
.
- if float, the number of intervals is
- windowint, optional
The window size. If
window
is set, the value ofn_intervals
has no effect.- n_binsint, optional
The number of bins.
- binningstr, optional
The bin construction. By default the bins are defined according to the normal distribution. Possible values are
"normal"
for normally distributed bins or"uniform"
for uniformly distributed bins.
- Returns:
- ndarray of shape (n_samples, n_intervals)
The symbolic aggregate approximation.