wildboar.datasets.preprocess#

Utilities for preprocessing time series.

Module Contents#

Functions#

maxabs_scale(x)

Scale each time series by its maximum absolute value.

minmax_scale(x[, min, max])

Scale x along the time dimension.

named_preprocess(name)

Get a named preprocessor.

standardize(x)

Scale x along the time dimension.

truncate(x[, n_shortest])

Truncate x to the shortest sequence.

wildboar.datasets.preprocess.maxabs_scale(x)[source]#

Scale each time series by its maximum absolute value.

Parameters:
xndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)

The samples.

Returns:
ndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)

The transformed samples.

wildboar.datasets.preprocess.minmax_scale(x, min=0, max=1)[source]#

Scale x along the time dimension.

Each time series is scaled such that each value is between min and max.

Parameters:
xndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)

The samples.

minfloat, optional

The minimum value.

maxfloat, optional

The maximum value.

Returns:
ndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)

The transformed samples.

wildboar.datasets.preprocess.named_preprocess(name)[source]#

Get a named preprocessor.

Parameters:
namestr

The name of the preprocessor.

Returns:
callable

The preprocessor function.

wildboar.datasets.preprocess.standardize(x)[source]#

Scale x along the time dimension.

The resulting array will have zero mean and unit standard deviation.

Parameters:
xndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)

The samples.

Returns:
ndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)

The standardized samples.

wildboar.datasets.preprocess.truncate(x, n_shortest=None)[source]#

Truncate x to the shortest sequence.

Parameters:
xndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)

The samples.

n_shortestint, optional

The maximum size.

Returns:
ndarray of shape (n_samples, n_shortest) or (n_samples, n_dims, n_shortest)

The truncated samples.