wildboar.datasets.preprocess#

Utilities for preprocessing time series.

Classes#

Interpolate

Interpolate missing (np.nan) values.

MaxAbsScale

Scale each time series by its maximum absolute value.

MinMaxScale

Normalize time series, ensuring that each value within a specified minimum and maximum range.

Standardize

Standardize time series with zero mean and unit standard deviation.

Truncate

A transformer that truncates the input data based on the end of series indicators.

Functions#

interpolate(X[, method])

Interpolate the given time series using the specified method.

maxabs_scale(x)

Scale each time series by its maximum absolute value.

minmax_scale(x[, min, max])

Scale x along the time dimension.

named_preprocess(name)

Get a named preprocessor.

standardize(x)

Scale x along the time dimension.

truncate(x)

Truncate x to the shortest sequence.


class wildboar.datasets.preprocess.Interpolate(method='linear')[source]#

Interpolate missing (np.nan) values.

Parameters:
methodstr, optional

The interpolation method to use. Default is “linear”.

Notes

If scipy < 1.4, valid method values include “linear”, “pchip”, and “cubic”. Otherwise, method also supports “akima” and “makima”.

fit(X, y=None)[source]#

Fit the model to the provided data.

Parameters:
Xarray-like of shape (n_samples, n_dims, n_timestep) or (n_samples, n_timestep)

The input data to fit the model.

yarray-like, optional

The target values. Ignored.

Returns:
object

Returns the instance of the fitted model.

fit_transform(X, y=None, **fit_params)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns:
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:
selfestimator instance

Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

transform(X)[source]#

Transform the data using the specified interpolation method.

Parameters:
Xarray-like of shape (n_samples, n_dims, n_timestep) or (n_samples, n_timestep)

The input data to be transformed.

Returns:
ndarray of shape (n_samples, n_dims, n_timestep) or (n_samples, n_timestep)

The transformed data after applying the interpolation method.

class wildboar.datasets.preprocess.MaxAbsScale[source]#

Scale each time series by its maximum absolute value.

fit(X, y=None)[source]#

Fit the model to the provided data.

Parameters:
Xarray-like of shape (n_samples, n_dims, n_timestep) or (n_samples, n_timestep)

The input data to fit the model.

yarray-like, optional

The target values. Ignored.

Returns:
object

Returns the instance of the fitted model.

fit_transform(X, y=None, **fit_params)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns:
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:
selfestimator instance

Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

transform(X)[source]#

Transform the data using the specified interpolation method.

Parameters:
Xarray-like of shape (n_samples, n_dims, n_timestep) or (n_samples, n_timestep)

The input data to be transformed.

Returns:
ndarray of shape (n_samples, n_dims, n_timestep) or (n_samples, n_timestep)

The transformed data after applying the interpolation method.

class wildboar.datasets.preprocess.MinMaxScale(min=0, max=1)[source]#

Normalize time series, ensuring that each value within a specified minimum and maximum range.

Parameters:
minfloat, optional

The minimum value.

maxfloat, optional

The maximum value.

Examples

>>> from wildboar.datasets import load_gun_point
>>> from wildboar.datasets.preprocess import MinMaxScale
>>> X, _ = load_gun_point()
>>> MinMaxScale().fit_transform(X).shape
(200, 150)
fit(X, y=None)[source]#

Fit the model to the provided data.

Parameters:
Xarray-like of shape (n_samples, n_dims, n_timestep) or (n_samples, n_timestep)

The input data to fit the model.

yarray-like, optional

The target values. Ignored.

Returns:
object

Returns the instance of the fitted model.

fit_transform(X, y=None, **fit_params)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns:
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:
selfestimator instance

Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

transform(X)[source]#

Transform the data using the specified interpolation method.

Parameters:
Xarray-like of shape (n_samples, n_dims, n_timestep) or (n_samples, n_timestep)

The input data to be transformed.

Returns:
ndarray of shape (n_samples, n_dims, n_timestep) or (n_samples, n_timestep)

The transformed data after applying the interpolation method.

class wildboar.datasets.preprocess.Standardize[source]#

Standardize time series with zero mean and unit standard deviation.

fit(X, y=None)[source]#

Fit the model to the provided data.

Parameters:
Xarray-like of shape (n_samples, n_dims, n_timestep) or (n_samples, n_timestep)

The input data to fit the model.

yarray-like, optional

The target values. Ignored.

Returns:
object

Returns the instance of the fitted model.

fit_transform(X, y=None, **fit_params)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns:
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:
selfestimator instance

Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

transform(X)[source]#

Transform the data using the specified interpolation method.

Parameters:
Xarray-like of shape (n_samples, n_dims, n_timestep) or (n_samples, n_timestep)

The input data to be transformed.

Returns:
ndarray of shape (n_samples, n_dims, n_timestep) or (n_samples, n_timestep)

The transformed data after applying the interpolation method.

class wildboar.datasets.preprocess.Truncate[source]#

A transformer that truncates the input data based on the end of series indicators.

fit(X, y=None)[source]#

Fit the model to the provided data.

Parameters:
Xarray-like of shape (n_samples, n_dims, n_timestep) or (n_samples, n_timestep)

The input data to fit the model.

yarray-like, optional

The target values. Ignored.

Returns:
object

Returns the instance of the fitted model.

fit_transform(X, y=None, **fit_params)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns:
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:
selfestimator instance

Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

transform(X)[source]#

Transform the input data X according to the fitted model.

Parameters:
Xarray-like

Input data to transform.

Returns:
array-like

Transformed input data.

wildboar.datasets.preprocess.interpolate(X, method='linear')[source]#

Interpolate the given time series using the specified method.

Parameters:
Xarray-like of shape (n_samples, n_dims, n_timestep) or (n_samples, n_timestep)

The input data to be interpolated. It can be of any shape but must have at least dimension.

methodstr, optional

The interpolation method to use. Default is “linear”.

Returns:
ndarray

The interpolated data.

Notes

If scipy < 1.4, valid method values include “linear”, “pchip”, and “cubic”. Otherwise, method also supports “akima” and “makima”.

wildboar.datasets.preprocess.maxabs_scale(x)[source]#

Scale each time series by its maximum absolute value.

Parameters:
xndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)

The samples.

Returns:
ndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)

The transformed samples.

wildboar.datasets.preprocess.minmax_scale(x, min=0, max=1)[source]#

Scale x along the time dimension.

Each time series is scaled such that each value is between min and max.

Parameters:
xndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)

The samples.

minfloat, optional

The minimum value.

maxfloat, optional

The maximum value.

Returns:
ndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)

The transformed samples.

wildboar.datasets.preprocess.named_preprocess(name)[source]#

Get a named preprocessor.

Parameters:
namestr

The name of the preprocessor.

Returns:
callable

The preprocessor function.

wildboar.datasets.preprocess.standardize(x)[source]#

Scale x along the time dimension.

The resulting array will have zero mean and unit standard deviation.

Parameters:
xndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)

The samples.

Returns:
ndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)

The standardized samples.

wildboar.datasets.preprocess.truncate(x)[source]#

Truncate x to the shortest sequence.

Parameters:
xndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)

The samples.

Returns:
ndarray of shape (n_samples, n_shortest) or (n_samples, n_dims, n_shortest)

The truncated samples.