wildboar.datasets.preprocess
#
Utilities for preprocessing time series.
Classes#
Interpolate missing (np.nan) values. |
|
Scale each time series by its maximum absolute value. |
|
Normalize time series, ensuring that each value within a specified minimum and maximum range. |
|
Standardize time series with zero mean and unit standard deviation. |
|
A transformer that truncates the input data based on the end of series indicators. |
Functions#
|
Interpolate the given time series using the specified method. |
|
Scale each time series by its maximum absolute value. |
|
Scale x along the time dimension. |
|
Get a named preprocessor. |
|
Scale x along the time dimension. |
|
Truncate x to the shortest sequence. |
- class wildboar.datasets.preprocess.Interpolate(method='linear')[source]#
Interpolate missing (np.nan) values.
- Parameters:
- methodstr, optional
The interpolation method to use. Default is “linear”.
Notes
If scipy < 1.4, valid method values include “linear”, “pchip”, and “cubic”. Otherwise, method also supports “akima” and “makima”.
- fit(X, y=None)[source]#
Fit the model to the provided data.
- Parameters:
- Xarray-like of shape (n_samples, n_dims, n_timestep) or (n_samples, n_timestep)
The input data to fit the model.
- yarray-like, optional
The target values. Ignored.
- Returns:
- object
Returns the instance of the fitted model.
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- transform(X)[source]#
Transform the data using the specified interpolation method.
- Parameters:
- Xarray-like of shape (n_samples, n_dims, n_timestep) or (n_samples, n_timestep)
The input data to be transformed.
- Returns:
- ndarray of shape (n_samples, n_dims, n_timestep) or (n_samples, n_timestep)
The transformed data after applying the interpolation method.
- class wildboar.datasets.preprocess.MaxAbsScale[source]#
Scale each time series by its maximum absolute value.
- fit(X, y=None)[source]#
Fit the model to the provided data.
- Parameters:
- Xarray-like of shape (n_samples, n_dims, n_timestep) or (n_samples, n_timestep)
The input data to fit the model.
- yarray-like, optional
The target values. Ignored.
- Returns:
- object
Returns the instance of the fitted model.
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- transform(X)[source]#
Transform the data using the specified interpolation method.
- Parameters:
- Xarray-like of shape (n_samples, n_dims, n_timestep) or (n_samples, n_timestep)
The input data to be transformed.
- Returns:
- ndarray of shape (n_samples, n_dims, n_timestep) or (n_samples, n_timestep)
The transformed data after applying the interpolation method.
- class wildboar.datasets.preprocess.MinMaxScale(min=0, max=1)[source]#
Normalize time series, ensuring that each value within a specified minimum and maximum range.
- Parameters:
- minfloat, optional
The minimum value.
- maxfloat, optional
The maximum value.
Examples
>>> from wildboar.datasets import load_gun_point >>> from wildboar.datasets.preprocess import MinMaxScale >>> X, _ = load_gun_point() >>> MinMaxScale().fit_transform(X).shape (200, 150)
- fit(X, y=None)[source]#
Fit the model to the provided data.
- Parameters:
- Xarray-like of shape (n_samples, n_dims, n_timestep) or (n_samples, n_timestep)
The input data to fit the model.
- yarray-like, optional
The target values. Ignored.
- Returns:
- object
Returns the instance of the fitted model.
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- transform(X)[source]#
Transform the data using the specified interpolation method.
- Parameters:
- Xarray-like of shape (n_samples, n_dims, n_timestep) or (n_samples, n_timestep)
The input data to be transformed.
- Returns:
- ndarray of shape (n_samples, n_dims, n_timestep) or (n_samples, n_timestep)
The transformed data after applying the interpolation method.
- class wildboar.datasets.preprocess.Standardize[source]#
Standardize time series with zero mean and unit standard deviation.
- fit(X, y=None)[source]#
Fit the model to the provided data.
- Parameters:
- Xarray-like of shape (n_samples, n_dims, n_timestep) or (n_samples, n_timestep)
The input data to fit the model.
- yarray-like, optional
The target values. Ignored.
- Returns:
- object
Returns the instance of the fitted model.
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- transform(X)[source]#
Transform the data using the specified interpolation method.
- Parameters:
- Xarray-like of shape (n_samples, n_dims, n_timestep) or (n_samples, n_timestep)
The input data to be transformed.
- Returns:
- ndarray of shape (n_samples, n_dims, n_timestep) or (n_samples, n_timestep)
The transformed data after applying the interpolation method.
- class wildboar.datasets.preprocess.Truncate[source]#
A transformer that truncates the input data based on the end of series indicators.
- fit(X, y=None)[source]#
Fit the model to the provided data.
- Parameters:
- Xarray-like of shape (n_samples, n_dims, n_timestep) or (n_samples, n_timestep)
The input data to fit the model.
- yarray-like, optional
The target values. Ignored.
- Returns:
- object
Returns the instance of the fitted model.
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- wildboar.datasets.preprocess.interpolate(X, method='linear')[source]#
Interpolate the given time series using the specified method.
- Parameters:
- Xarray-like of shape (n_samples, n_dims, n_timestep) or (n_samples, n_timestep)
The input data to be interpolated. It can be of any shape but must have at least dimension.
- methodstr, optional
The interpolation method to use. Default is “linear”.
- Returns:
- ndarray
The interpolated data.
Notes
If scipy < 1.4, valid method values include “linear”, “pchip”, and “cubic”. Otherwise, method also supports “akima” and “makima”.
- wildboar.datasets.preprocess.maxabs_scale(x)[source]#
Scale each time series by its maximum absolute value.
- Parameters:
- xndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The samples.
- Returns:
- ndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The transformed samples.
- wildboar.datasets.preprocess.minmax_scale(x, min=0, max=1)[source]#
Scale x along the time dimension.
Each time series is scaled such that each value is between min and max.
- Parameters:
- xndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The samples.
- minfloat, optional
The minimum value.
- maxfloat, optional
The maximum value.
- Returns:
- ndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The transformed samples.
- wildboar.datasets.preprocess.named_preprocess(name)[source]#
Get a named preprocessor.
- Parameters:
- namestr
The name of the preprocessor.
- Returns:
- callable
The preprocessor function.
- wildboar.datasets.preprocess.standardize(x)[source]#
Scale x along the time dimension.
The resulting array will have zero mean and unit standard deviation.
- Parameters:
- xndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The samples.
- Returns:
- ndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The standardized samples.