Time series#

A time series is a (temporally) ordered sequence of real values. A univariate time series has a single dimension, whereas a multivariate time series has multiple dimensions. In wildboar, time series are represented by Numpy-arrays. A single univariate time series is represented as an array of shape (1, n_timestep) (or (n_timestep, )). A multivariate time series is represented as an array of shape (1, n_dims, n_timestep) (or (n_dims, n_timestep) depending on context). A dataset of time series is an array of n_samples, i.e., for univariate time series an array of shape (n_samples, n_timestep) (or (n_samples, 1, n_timestep)) and a multivaraite time series is represented as an array of shape (n_samples, n_dims, n_timestep).

Most algorithms in wildboar assumes that the time series are of equal length and without missing values. However, some datasets contain both missing values and/or have time series or dimensions of unequal length. In wildboar, the end-of-sequence in wildboar is represented as -np.inf and value-missing by np.nan.

>>> import numpy as np
>>> t1 = np.array([1, 2, 3, 1, 1, 1], dtype=float)
>>> t2 = np.array([1, 2, 3, 1, -np.inf, -np.inf], dtype=float)
>>> t3 = np.array([1, np.nan, 3, 3, 3, 3], dtype=float)
>>> x = np.vstack([t1, t2, t3]) # dataset of (3, 6)

In the example, we create a dataset with 3 samples, where each sample has 6 timestep. x[0] has no missing values and is of length n_samples. x[1] has no missing values and is of length 4 (np.min(np.nonzero(np.isneginf(x[1]))[-1])). x[1] has a single missing value at index 2.