Time series#

A time series is a (temporally) ordered sequence of real values. A univariate time series has a single dimension, whereas a multivariate time series has multiple dimensions. In Wildboar, time series are represented by Numpy-arrays. A single univariate time series is represented as an array of shape (1, n_timestep) (or (n_timestep, )). Wildboar represents a multivariate time series as an array of shape (1, n_dims, n_timestep) (or (n_dims, n_timestep) depending on context). A dataset of time series is an array of n_samples, i.e., for univariate time series an array of shape (n_samples, n_timestep) (or (n_samples, 1, n_timestep)) and a multivarete time series is represented as an array of shape (n_samples, n_dims, n_timestep).

Most algorithms in wildboar assumes that the time series are of equal length and without missing values. However, some datasets contain both missing values and/or have time series or dimensions of unequal length. Wildboar represents the End-of-Sequence identifier as wildboar.eos and value is missing by numpy.nan. The EoS value is a valid IEEE754 NaN value, and will be treated as True by numpy.isnan, whereas wildboar.iseos will treat numpy.nan as False.

Note

By having EoS treated as NaN, we can ignore it and just treat them as missing values.

>>> import numpy as np
>>> import wildboar as wb
>>> t1 = np.array([1, 2, 3, 1, 1, 1], dtype=float)
>>> t2 = np.array([1, 2, 3, 1, wb.eos, wb.eos], dtype=float)
>>> t3 = np.array([1, np.nan, 3, 3, 3, 3], dtype=float)
>>> x = np.vstack([t1, t2, t3]) # dataset of (3, 6)

In the example, we create a dataset with 3 samples, where each sample has 6 timestep, and:

  • x[0] has no missing values and is of length n_samples.

  • x[1] has no missing values and is of length 4 (np.min(np.nonzero(wb.iseos(x[1]))[-1])).

  • x[2] has a single missing value at index 2.

Variable length time series#

Warning

Support for variable length time series is not stable and the API will change