wildboar.distance#
Submodules#
Package Contents#
Functions#
|
Computes the distance between x and the samples of y |
|
Return the positions in x (one list per sample) where x is closer than threshold. |
- wildboar.distance.distance(x, y, *, dim=0, sample=None, metric='euclidean', metric_params=None, subsequence_distance=True, return_index=False)#
Computes the distance between x and the samples of y
- Parameters:
x (array-like of shape (x_timestep, )) – A 1-dimensional float array
y (array-like of shape (n_samples, n_timesteps) or (n_samples, n_dims, n_timesteps)) –
dim (int, optional) – The time series dimension to search
sample (int or array-like, optional) –
The samples to compare to
if
sample=Nonethe distances to all samples in data is returnedif sample is an int the distance to that sample is returned
if sample is an array-like the distance to all samples in sample are returned
if
n_samples=1,samplesis an int orlen(samples)==1a scalar is returnedotherwise an array is returned
metric ({'euclidean', 'scaled_euclidean', 'dtw', 'scaled_dtw'} or callable, optional) –
The distance metric
if str use optimized implementations of the named distance measure
if callable a function taking two arrays as input
metric_params (dict, optional) –
Parameters to the metric
’euclidean’ and ‘scaled_euclidean’ take no parameters
’dtw’ and ‘scaled_dtw’ take a single paramater ‘r’. If ‘r’ <= 1 it is interpreted as a fraction of the time series length. If > 1 it is interpreted as an exact time warping window. Use ‘r’ == 0 for a widow size of exactly 1.
subsequence_distance (bool, optional) –
if True, compute the minimum subsequence distance
if False, compute the distance between two arrays of the same length unless the specified metric support unaligned arrays
return_index (bool, optional) –
if True return the index of the best match. If there are many equally good matches, the first match is returned.
- Returns:
dist (float, ndarray) – The smallest distance to each time series
indices (int, ndarray) – The start position of the best match in each time series
See also
matchesfind shapelets within a threshold
Examples
>>> from wildboar.datasets import load_two_lead_ecg >>> x, y = load_two_lead_ecg() >>> _, i = distance(x[0, 10:20], x, sample=[0, 1, 2, 3, 5, 10], ... metric="scaled_euclidean", return_index=True) >>> i [10 29 9 72 20 30]
- wildboar.distance.matches(x, y, threshold, *, dim=0, sample=None, metric='euclidean', metric_params=None, return_distance=False)#
Return the positions in x (one list per sample) where x is closer than threshold.
- Parameters:
x (array-like of shape (x_timestep, )) – A 1-dimensional float array
y (array-like of shape (n_samples, n_timesteps) or (n_samples, n_dims, n_timesteps)) – The collection of samples
threshold (float) – The maximum threshold to consider a match
dim (int, optional) – The time series dimension to search
sample (int or array-like, optional) –
The samples to compare to
if
sample=Nonethe distances to all samples in data is returnedif sample is an int the distance to that sample is returned
if sample is an array-like the distance to all samples in sample are returned
if
n_samples=1,samplesis an int orlen(samples)==1a scalar is returnedotherwise an array is returned
metric ({'euclidean', 'scaled_euclidean'}, optional) – The distance metric
metric_params (dict, optional) – Parameters to the metric
return_distance (bool, optional) –
if true return the distance of the best match.
- Returns:
dist (list) – The distances of the matching positions
matches (list) – The start position of the matches in each time series
Warning
‘scaled_dtw’ is not supported.