wildboar tutorial#
Machine learning#
In general, the machine learning problem setting consists of n samples of data an the goal is to predict properties of unknown data.
wildboar, in particular consider machine learning problems in which the samples are data series, e.g., time series or
otherwise temporally or logically ordered data.
Note
For solving general machine learning problems with Python, consider using scikit-learn
Similar to general machine learning problems, temporal machine learning consider problems that fall into different categories
Supervised learning, in which the data series are labeled with additional information. The additional information can be either numerical or nominal
In classification problems the time series belong to one of two or more labels and the goal is to learn a function that can label unlabeled time series.
In regression problems the time series are labeled with a numerical attribute and the task is to assigned a new numerical value to an unlabeled time series.
Loading an example dataset#
Wildboar bundles a few standard datasets (no https) from the time series community.
In the example, we load the dataset synthetic_control and the TwoLeadECG dataset.
>>> from wildboar.datasets import load_synthetic_control, load_two_lead_ecg
>>> x, y = load_synthetic_control()
>>> x_train, x_test, y_train, y_test = load_two_lead_ecg(merge_train_test=False)
The datasets are Numpy ndarray with x.ndim==2 and y.ndim==1. We can get the number of samples and time points.
>>> n_samples, n_timestep = x.shape
Note
By setting merge_train_test to False, the original training and testing splits from the UCR repository are preserved.
A more robust and reliable method for splitting the datasets into training and testing partitions is to use the model selection functions from scikit-learn.
>> from sklearn.model_selection import train_test_split
>> x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
Learning and predicting#
All estimators in wildboar implements the same interface as all estimators of scikit-learn. We can fit an estimator to an input dataset and predict the label of a new sample.
An example of a temporal estimator is the wildboar.ensemble.ShapeletForestClassifier which implements a random shapelet forest classifier.
>>> from wildboar.ensemble import ShapeletForestClassifier
>>> clf = ShapeletForestClassifier()
>>> clf.fit(x_train, y_train)
The classifier (clf) is fitted using the training samples, i.e., a model is inferred.
>>> clf.predict(x_test[-1:, :])
array([6.])
Note
The predict function expects an ndarray of shape (n_samples, n_timestep), where n_timestep is the size
of training timestep.
Model persistence#
All wildboar models can be persisted to disk using pickle
>>> import pickle
>>> repr = pickle.dumps(clf) # clf fitted earlier
>>> clf_ = pickle.loads(repr)
>>> clf_.predict(x_test[-1:, :])
array([6.])
Note
Models persisted using an older versions of wildboar is not guaranteed to work when using a newer version (or vice versa).