Time series#
A time series is a (temporally) ordered sequence of real values. A univariate
time series has a single dimension, whereas a multivariate time series has
multiple dimensions. In Wildboar, time series are represented by Numpy-arrays.
A single univariate time series is represented as an array of shape
(1, n_timestep) (or (n_timestep, )). Wildboar represents a
multivariate time series as an array of shape (1, n_dims, n_timestep)
(or (n_dims, n_timestep) depending on context). A dataset of time
series is an array of n_samples, i.e., for univariate time series an array
of shape (n_samples, n_timestep) (or (n_samples, 1,
n_timestep)) and a multivarete time series is represented as an array of shape
(n_samples, n_dims, n_timestep).
Most algorithms in wildboar assumes that the time series are of equal length and
without missing values. However, some datasets contain both missing values
and/or have time series or dimensions of unequal length. Wildboar represents the
End-of-Sequence identifier as wildboar.eos and value is missing by
numpy.nan. The EoS value is a valid IEEE754 NaN value, and will be
treated as True by numpy.isnan, whereas wildboar.iseos
will treat numpy.nan as False.
Note
By having EoS treated as NaN, we can ignore it and just treat them
as missing values.
>>> import numpy as np
>>> import wildboar as wb
>>> t1 = np.array([1, 2, 3, 1, 1, 1], dtype=float)
>>> t2 = np.array([1, 2, 3, 1, wb.eos, wb.eos], dtype=float)
>>> t3 = np.array([1, np.nan, 3, 3, 3, 3], dtype=float)
>>> x = np.vstack([t1, t2, t3]) # dataset of (3, 6)
In the example, we create a dataset with 3 samples, where each sample has 6 timestep, and:
x[0]has no missing values and is of lengthn_samples.x[1]has no missing values and is of length 4 (np.min(np.nonzero(wb.iseos(x[1]))[-1])).x[2]has a single missing value at index 2.
Variable length time series#
Warning
Support for variable length time series is not stable and the API will change