wildboar.utils#

Utility functions.

Submodules#

Package Contents#

Functions#

check_X_y(x, y, *[, dtype, order, copy, ensure_2d, ...])

Check both X and y.

check_array(array, *[, dtype, order, copy, ravel_1d, ...])

Input validation on time-series.

wildboar.utils.check_X_y(x, y, *, dtype=float, order='C', copy=False, ensure_2d=True, ensure_ts_array=False, allow_3d=False, allow_nd=False, force_all_finite=True, multi_output=False, ensure_min_samples=1, ensure_min_timesteps=1, ensure_min_dims=1, allow_eos=False, y_numeric=False, y_contiguous=True, estimator=None)[source]#

Check both X and y.

Parameters:
xarray-like

The samples.

yarray-like

The labels.

dtypedtype, optional

The data type for X.

order{“C”, “F”}, optional

The order of data in memory.

copybool, optional

Force a copy of X.

ensure_2dbool, optional

Ensure that the array is 2d, i.e., (n_samples, n_timesteps).

ensure_ts_arraybool, optional

Ensure that the array is a valid time series array.

allow_3dbool, optional

Allow X to be 3d, i.e., (n_samples, n_dimensions, n_timesteps).

allow_ndbool, optional

Allow X to have 2 or more dimensions.

force_all_finitebool, optional

Require every value in X to be finite.

multi_outputbool, optional

Allow y to be a multi output array.

ensure_min_samplesint, optional

Require X to have at least this many samples.

ensure_min_timestepsint, optional

Require X to have at least this many timesteps.

ensure_min_dimsint, optional

Require X to have at least this many dimensions.

allow_eosbool, optional

Allow X to be of variale length.

y_numericbool, optional

Ensure that y is numeric with dtype float.

y_contiguousbool, optional

Ensure that y is memory contiguous.

estimatorobject, optional

An estimator object for error reporting.

Returns:
Xndarray

The validated array X.

yndarray

The validated array y.

wildboar.utils.check_array(array, *, dtype='numeric', order='C', copy=False, ravel_1d=False, ensure_2d=True, ensure_ts_array=False, allow_3d=False, allow_nd=False, allow_eos=False, force_all_finite=True, ensure_min_samples=1, ensure_min_timesteps=1, ensure_min_dims=1, estimator=None, input_name='')[source]#

Input validation on time-series.

Delegate array validation to scikit-learn sklearn.utils.validation.check_array with wildboar defaults and conventions.

  • we optionally allow end-of-sequence identifiers

  • by default we convert arrays to c-order

  • we optionally specifically allow for 3d-arrays

  • we never allow for sparse arrays

By default, the input is checked to be a non-empty 2D array in c-order containing only finite values, with at least 1 sample, 1 timestep and 1 dimension. If the dtype of the array is object, attempt converting to float, raising on failure.

Parameters:
arrayobject

Input object to check / convert.

dtype‘numeric’, type, list of type or None, optional

Data type of result. If None, the dtype of the input is preserved. If “numeric”, dtype is preserved unless array.dtype is object. If dtype is a list of types, conversion on the first type is only performed if the dtype of the input is not in the list.

order{‘F’, ‘C’, ‘T’} or None, optional

Whether an array will be forced to be fortran or c-style. When order is None, then if copy=False, nothing is ensured about the memory layout of the output array; otherwise (copy=True) the memory layout of the returned array is kept as close as possible to the original array.

copybool, optional

Whether a forced copy will be triggered. If copy=False, a copy might be triggered by a conversion.

ravel_1dbool, optional

Whether to ravel 1d arrays or column vectors, it the array is neither an error is raised.

ensure_2dbool, optional

Whether to raise a value error if array is not 2D.

allow_3dbool, optional

Whether to allow array.ndim == 3.

allow_ndbool, optional

Whether to allow array.ndim > 2.

allow_eosbool, optional

Whether to raise an error on wildboar.utils.variable_len.eos in the array.

force_all_finitebool or ‘allow-nan’, default=True

Whether to raise an error on np.inf, np.nan, pd.NA in array. The possibilities are:

  • True: Force all values of array to be finite.

  • False: accepts np.inf, np.nan, pd.NA in array.

  • ‘allow-nan’: accepts only np.nan and pd.NA values in array. Values cannot be infinite.

ensure_min_samplesint, optional

Make sure that the array has a minimum number of samples in its first axis (rows for a 2D array). Setting to 0 disables this check.

ensure_min_timestepsint, optional

Make sure that the 2D array has some minimum number of timesteps (columns). The default value of 1 rejects empty datasets. This check is only enforced when the input data has effectively 2 dimensions or is originally 1D and ensure_2d is True. Setting to 0 disables this check.

ensure_min_dimsint, optional

Make sure that the array has a minimum number of dimensions. Setting to 0 disables this check.

estimatorstr or estimator instance, default=None

If passed, include the name of the estimator in warning messages.

input_namestr, default=””

The data name used to construct the error message.

Returns:
object

The converted and validated array.