`wildboar.distance`#

Submodules#

wildboar.distance.dtw

Package Contents#

Functions#

`distance`(x, y, *[, dim, sample, metric, ...])	Computes the distance between x and the samples of y
`matches`(x, y, threshold, *[, dim, sample, metric, ...])	Return the positions in x (one list per sample) where x is closer than threshold.

wildboar.distance.distance(x, y, *, dim=0, sample=None, metric='euclidean', metric_params=None, subsequence_distance=True, return_index=False)#

Computes the distance between x and the samples of y

Parameters:

x (array-like of shape (x_timestep, )) – A 1-dimensional float array
y (array-like of shape (n_samples, n_timesteps) or (n_samples, n_dims, n_timesteps)) –
dim (int, optional) – The time series dimension to search
sample (int or array-like, optional) –
The samples to compare to
- if sample=None the distances to all samples in data is returned
- if sample is an int the distance to that sample is returned
- if sample is an array-like the distance to all samples in sample are returned
- if n_samples=1, samples is an int or len(samples)==1 a scalar is returned
- otherwise an array is returned
metric ({'euclidean', 'scaled_euclidean', 'dtw', 'scaled_dtw'} or callable, optional) –
The distance metric
- if str use optimized implementations of the named distance measure
- if callable a function taking two arrays as input
metric_params (dict, optional) –
Parameters to the metric
- ’euclidean’ and ‘scaled_euclidean’ take no parameters
- ’dtw’ and ‘scaled_dtw’ take a single paramater ‘r’. If ‘r’ <= 1 it is interpreted as a fraction of the time series length. If > 1 it is interpreted as an exact time warping window. Use ‘r’ == 0 for a widow size of exactly 1.
subsequence_distance (bool, optional) –
- if True, compute the minimum subsequence distance
- if False, compute the distance between two arrays of the same length unless the specified metric support unaligned arrays
return_index (bool, optional) –
- if True return the index of the best match. If there are many equally good matches, the first match is returned.

Returns:

dist (float, ndarray) – The smallest distance to each time series
indices (int, ndarray) – The start position of the best match in each time series

See also

matches: find shapelets within a threshold

Examples

>>> from wildboar.datasets import load_two_lead_ecg
>>> x, y = load_two_lead_ecg()
>>> _, i = distance(x[0, 10:20], x, sample=[0, 1, 2, 3, 5, 10],
...                 metric="scaled_euclidean", return_index=True)
>>> i
[10 29  9 72 20 30]

wildboar.distance.matches(x, y, threshold, *, dim=0, sample=None, metric='euclidean', metric_params=None, return_distance=False)#

Return the positions in x (one list per sample) where x is closer than threshold.

Parameters:

x (array-like of shape (x_timestep, )) – A 1-dimensional float array
y (array-like of shape (n_samples, n_timesteps) or (n_samples, n_dims, n_timesteps)) – The collection of samples
threshold (float) – The maximum threshold to consider a match
dim (int, optional) – The time series dimension to search
sample (int or array-like, optional) –
The samples to compare to
- if sample=None the distances to all samples in data is returned
- if sample is an int the distance to that sample is returned
- if sample is an array-like the distance to all samples in sample are returned
- if n_samples=1, samples is an int or len(samples)==1 a scalar is returned
- otherwise an array is returned
metric ({'euclidean', 'scaled_euclidean'}, optional) – The distance metric
metric_params (dict, optional) – Parameters to the metric
return_distance (bool, optional) –
- if true return the distance of the best match.

Returns:

dist (list) – The distances of the matching positions
matches (list) – The start position of the matches in each time series

Warning

‘scaled_dtw’ is not supported.

wildboar.distance#

Submodules#

Package Contents#

Functions#

`wildboar.distance`#