`wildboar.distance`#

Submodules#

wildboar.distance.dtw

Package Contents#

Functions#

`matrix_profile`(x[, y, window, dim, exclude, n_jobs, ...])	Compute the matrix profile.
`paired_distance`(x, y, *[, dim, metric, metric_params, ...])	Compute the distance between the i:th time series
`paired_subsequence_distance`(y, x, *[, dim, metric, ...])	Compute the minimum subsequence distance between the i:th subsequence and time
`paired_subsequence_match`(y, x[, threshold, dim, ...])	Compute the minimum subsequence distance between the i:th subsequence and time
`pairwise_distance`(x[, y, dim, metric, metric_params, ...])	Compute the distance between subsequences and time series
`pairwise_subsequence_distance`(y, x, *[, dim, metric, ...])	Compute the minimum subsequence distance between subsequences and time series
`subsequence_match`(y, x[, threshold, dim, metric, ...])	Find the positions where the distance is less than the threshold between the

wildboar.distance.matrix_profile(x, y=None, *, window=5, dim=0, exclude=None, n_jobs=-1, return_index=False)[source]#

Compute the matrix profile.

If only x is given, compute the similarity self-join of every subsequence in x of size window to its nearest neighbor in x excluding trivial matches according to the exclude parameter.
If both x and y are given, compute the similarity join of every subsequenec in y of size window to its nearest neighbor in x excluding matches according to the exclude parameter.

Parameters:

x (array-like of shape (n_timestep, ), (n_samples, xn_timestep) or (n_samples, n_dim, xn_timestep)) – The first time series
y (array-like of shape (n_timestep, ), (n_samples, yn_timestep) or (n_samples, n_dim, yn_timestep), optional) – The optional second time series. y is broadcast to the shape of x if possible.
window (int or float, optional) –
The subsequence size, by default 5
- if float, a fraction of y.shape[-1]
- if int, the exact subsequence size
dim (int, optional) – The dim to compute the matrix profile for, by default 0
exclude (int or float, optional) –
The size of the exclusion zone. The default exclusion zone is 0.2 for similarity self-join and 0.0 for similarity join.
- if float, expressed as a fraction of the windows size
- if int, exact size (0 >= exclude < window)
n_jobs (int, optional) – The number of jobs to use when computing the
return_index (bool, optional) – Return the matrix profile index

Returns:

mp (ndarray of shape (profile_size, ) or (n_samples, profile_size)) – The matrix profile
mpi (ndarray of shape (profile_size, ) or (n_samples, profile_size), optional) – The matrix profile index

Notes

The profile_size depends on the input.

If y is None´, `profile_size is x.shape[-1] - window + 1
If y is not None, profile_size is y.shape[-1] - window + 1

References

Yeh, C. C. M. et al. (2016).: Matrix profile I: All pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In 2016 IEEE 16th international conference on data mining (ICDM)

wildboar.distance.paired_distance(x, y, *, dim=0, metric='euclidean', metric_params=None, n_jobs=None)[source]#

Compute the distance between the i:th time series

Parameters:

x (ndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)) – The input data. y will be broadcast to the shape of x if possible.
y (: ndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)) – The input data
dim (int, optional) –
The dim to compute distance

metricstr or callable, optional
The distance metric

See _DISTANCE_MEASURE.keys() for a list of supported metrics.
metric_params (dict, optional) –
Parameters to the metric.

Read more about the parameters in the User guide.
n_jobs (int, optional) – The number of parallel jobs.

Returns:

distance – The distances. Return depends on input:

if ndim > 1, return an ndarray of shape (n_samples, )
if ndim == 1, return ndarray of shape (n_matches, ) or None

Return type:

ndarray

wildboar.distance.paired_subsequence_distance(y, x, *, dim=0, metric='euclidean', metric_params=None, return_index=False, n_jobs=None)[source]#

Compute the minimum subsequence distance between the i:th subsequence and time series

Parameters:

y (list or ndarray of shape (n_samples, m_timestep)) –
Input time series.
- if list, a list of array-like of shape (m_timestep, )
x (ndarray of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)) – The input data
dim (int, optional) – The dim to search for shapelets
metric (str or callable, optional) –
The distance metric

See _SUBSEQUENCE_DISTANCE_MEASURE.keys() for a list of supported metrics.
metric_params (dict, optional) –
Parameters to the metric.

Read more about the parameters in the User guide.
return_index (bool, optional) –
- if True return the index of the best match. If there are many equally good matches, the first match is returned.
n_jobs (int, optional) – The number of parallel jobs to run. Ignored

Returns:

dist (float, ndarray) – An array of shape (n_samples, ) with the minumum distance between the i:th subsequence and the i:th sample
indices (int, ndarray, optional) – An array of shape (n_samples, ) with the index of the best matching position of the i:th subsequence and the i:th sample

wildboar.distance.paired_subsequence_match(y, x, threshold=None, *, dim=0, metric='euclidean', metric_params=None, max_matches=None, return_distance=False, n_jobs=None)[source]#

Compute the minimum subsequence distance between the i:th subsequence and time series

Parameters:

y (list or ndarray of shape (n_samples, n_timestep)) –
Input time series.
- if list, a list of array-like of shape (n_timestep, ) with length n_samples
x (ndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)) – The input data
threshold (float) – The distance threshold used to consider a subsequence matching. If no threshold is selected, max_matches defaults to 10.
dim (int, optional) – The dim to search for shapelets
metric (str or callable, optional) –
The distance metric

See _SUBSEQUENCE_DISTANCE_MEASURE.keys() for a list of supported metrics.
metric_params (dict, optional) –
Parameters to the metric.

Read more about the parameters in the User guide.
max_matches (int, optional) –
Return the top max_matches matches below threshold.
- If a threshold is given, the default behaviour is to return all matching indices in the order of occurrence .
- If no threshold is given, the default behaviour is to return the top 10 matching indicies ordered by distance
- If both threshold and max_matches are given the top matches are returned ordered by distance.
return_distance (bool, optional) –
- if True, return the distance of the match
n_jobs (int, optional) – The number of parallel jobs to run. Ignored

Returns:

indicies (ndarray) – The start index of matching subsequences. Return depends on input:
- if x.ndim > 1, return an ndarray of shape (n_samples, )
- if x.ndim == 1, return ndarray of shape (n_matches, ) or None
For each sample, the ndarray contains the .
distance (ndarray, optional) – The distances of matching subsequences. Return depends on input:
- if x.ndim > 1, return an ndarray of shape (n_samples, )
- if x.ndim == 1, return ndarray of shape (n_matches, ) or None

wildboar.distance.pairwise_distance(x, y=None, *, dim=0, metric='euclidean', metric_params=None, n_jobs=None)[source]#

Compute the distance between subsequences and time series

Parameters:

x (ndarray of shape (n_timestep, ), (x_samples, n_timestep) or (x_samples, n_dims, n_timestep)) – The input data
y (ndarray of shape (n_timestep, ), (y_samples, n_timestep) or (y_samples, n_dims, n_timestep), optional) – The input data
dim (int, optional) –
The dim to compute distance

metricstr or callable, optional
The distance metric

See _DISTANCE_MEASURE.keys() for a list of supported metrics.
metric_params (dict, optional) –
Parameters to the metric.

Read more about the parameters in the User guide.
n_jobs (int, optional) – The number of parallel jobs.

Returns:

dist – The distances. Return depends on input.

if x.ndim > 1 and y is None, return array of shape (x_samples, x_samples)
if x.ndim > 1 and y.ndim > 1, return array of shape (x_samples, y_samples)
if x.ndim == 1 and y.ndim > 1, return array of shape (y_samples, )
if y.ndim == 1 and x.ndim > 1, return array of shape (x_samples, )
if x.ndim == 1 and y.ndim == 1, return scalar

Return type:

float or ndarray

wildboar.distance.pairwise_subsequence_distance(y, x, *, dim=0, metric='euclidean', metric_params=None, return_index=False, n_jobs=None)[source]#

Compute the minimum subsequence distance between subsequences and time series

Parameters:

y (list or ndarray of shape (n_subsequences, n_timestep)) –
Input time series.
- if list, a list of array-like of shape (n_timestep, )
x (ndarray of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)) – The input data
dim (int, optional) –
The dim to search for subsequence

metricstr or callable, optional
The distance metric

See _SUBSEQUENCE_DISTANCE_MEASURE.keys() for a list of supported metrics.
metric_params (dict, optional) –
Parameters to the metric.

Read more about the parameters in the User guide.
return_index (bool, optional) –
- if True return the index of the best match. If there are many equally good matches, the first match is returned.

Returns:

dist (float, ndarray) – The minumum distance. Return depends on input:
- if len(y) > 1 and x.ndim > 1, return an array of shape (n_samples, n_subsequences).
- if len(y) == 1, return an array of shape (n_samples, ).
- if x.ndim == 1, return an array of shape (n_subsequences, ).
- if x.ndim == 1 and len(y) == 1, return scalar.
indices (int, ndarray, optional) – The start index of the minumum distance. Return dependes on input:
- if len(y) > 1 and x.ndim > 1, return an array of shape (n_samples, n_subsequences).
- if len(y) == 1, return an array of shape (n_samples, ).
- if x.ndim == 1, return an array of shape (n_subsequences, ).
- if x.ndim == 1 and len(y) == 1, return scalar.

wildboar.distance.subsequence_match(y, x, threshold=None, *, dim=0, metric='euclidean', metric_params=None, max_matches=None, exclude=None, return_distance=False, n_jobs=None)[source]#

Find the positions where the distance is less than the threshold between the subsequence and all time series.

If a threshold is given, the default behaviour is to return all matching indices in the order of occurrence
If no threshold is given, the default behaviour is to return the top 10 matching indicies ordered by distance
If both threshold and max_matches are given, the top matches are returned ordered by distance.

Parameters:

y (array-like of shape (yn_timestep, )) – The subsequence
x (ndarray of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)) – The input data
threshold (str, float or callable, optional) –
The distance threshold used to consider a subsequence matching. If no threshold is selected, max_matches defaults to 10.
- if float, return all matches closer than threshold
- if callable, return all matches closer than the treshold computed by the threshold function, given all distances to the subsequence
- if str, return all matches according to the named threshold.
dim (int, optional) – The dim to search for shapelets
metric (str or callable, optional) –
The distance metric

See _SUBSEQUENCE_DISTANCE_MEASURE.keys() for a list of supported metrics.
metric_params (dict, optional) –
Parameters to the metric.

Read more about the parameters in the User guide.
max_matches (int, optional) – Return the top max_matches matches below threshold.
exclude (float or int, optional) –
Exclude trivial matches in the vicinity of the match.
- if float, the exclusion zone is computed as math.ceil(exclude * y.size)
- if int, the exclusion zone is exact
A match is considered trivial if a match with lower distance is within exclude timesteps of another match with higher distance.
return_distance (bool, optional) –
- if True, return the distance of the match
n_jobs (int, optional) – The number of parallel jobs to run. Ignored

Returns:

indicies (ndarray) – The start index of matching subsequences. Return depends on input:
- if x.ndim > 1, return an ndarray of shape (n_samples, )
- if x.ndim == 1, return ndarray of shape (n_matches, ) or None
For each sample, the ndarray contains the .
distance (ndarray, optional) – The distances of matching subsequences. Return depends on input:
- if x.ndim > 1, return an ndarray of shape (n_samples, )
- if x.ndim == 1, return ndarray of shape (n_matches, ) or None

wildboar.distance#

Submodules#

Package Contents#

Functions#

`wildboar.distance`#