wildboar.distance#
Submodules#
Package Contents#
Functions#
|
Compute the matrix profile. |
|
Compute the distance between the i:th time series |
|
Compute the minimum subsequence distance between the i:th subsequence and time |
|
Compute the minimum subsequence distance between the i:th subsequence and time |
|
Compute the distance between subsequences and time series |
|
Compute the minimum subsequence distance between subsequences and time series |
|
Find the positions where the distance is less than the threshold between the |
- wildboar.distance.matrix_profile(x, y=None, *, window=5, dim=0, exclude=None, n_jobs=-1, return_index=False)[source]#
Compute the matrix profile.
If only
xis given, compute the similarity self-join of every subsequence inxof sizewindowto its nearest neighbor in x excluding trivial matches according to theexcludeparameter.If both
xandyare given, compute the similarity join of every subsequenec inyof sizewindowto its nearest neighbor inxexcluding matches according to theexcludeparameter.
- Parameters:
x (array-like of shape (n_timestep, ), (n_samples, xn_timestep) or (n_samples, n_dim, xn_timestep)) – The first time series
y (array-like of shape (n_timestep, ), (n_samples, yn_timestep) or (n_samples, n_dim, yn_timestep), optional) – The optional second time series. y is broadcast to the shape of x if possible.
window (int or float, optional) –
The subsequence size, by default 5
if float, a fraction of y.shape[-1]
if int, the exact subsequence size
dim (int, optional) – The dim to compute the matrix profile for, by default 0
exclude (int or float, optional) –
The size of the exclusion zone. The default exclusion zone is 0.2 for similarity self-join and 0.0 for similarity join.
if float, expressed as a fraction of the windows size
if int, exact size (0 >= exclude < window)
n_jobs (int, optional) – The number of jobs to use when computing the
return_index (bool, optional) – Return the matrix profile index
- Returns:
mp (ndarray of shape (profile_size, ) or (n_samples, profile_size)) – The matrix profile
mpi (ndarray of shape (profile_size, ) or (n_samples, profile_size), optional) – The matrix profile index
Notes
The profile_size depends on the input.
If y is None´, `profile_size is
x.shape[-1] - window + 1If y is not None, profile_size is
y.shape[-1] - window + 1
References
- Yeh, C. C. M. et al. (2016).
Matrix profile I: All pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In 2016 IEEE 16th international conference on data mining (ICDM)
- wildboar.distance.paired_distance(x, y, *, dim=0, metric='euclidean', metric_params=None, n_jobs=None)[source]#
Compute the distance between the i:th time series
- Parameters:
x (ndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)) – The input data. y will be broadcast to the shape of x if possible.
y (: ndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)) – The input data
dim (int, optional) –
The dim to compute distance
- metricstr or callable, optional
The distance metric
See
_DISTANCE_MEASURE.keys()for a list of supported metrics.
metric_params (dict, optional) –
Parameters to the metric.
Read more about the parameters in the User guide.
n_jobs (int, optional) – The number of parallel jobs.
- Returns:
distance – The distances. Return depends on input:
if ndim > 1, return an ndarray of shape (n_samples, )
if ndim == 1, return ndarray of shape (n_matches, ) or None
- Return type:
ndarray
- wildboar.distance.paired_subsequence_distance(y, x, *, dim=0, metric='euclidean', metric_params=None, return_index=False, n_jobs=None)[source]#
Compute the minimum subsequence distance between the i:th subsequence and time series
- Parameters:
y (list or ndarray of shape (n_samples, m_timestep)) –
Input time series.
if list, a list of array-like of shape (m_timestep, )
x (ndarray of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)) – The input data
dim (int, optional) – The dim to search for shapelets
metric (str or callable, optional) –
The distance metric
See
_SUBSEQUENCE_DISTANCE_MEASURE.keys()for a list of supported metrics.metric_params (dict, optional) –
Parameters to the metric.
Read more about the parameters in the User guide.
return_index (bool, optional) –
if True return the index of the best match. If there are many equally good matches, the first match is returned.
n_jobs (int, optional) – The number of parallel jobs to run. Ignored
- Returns:
dist (float, ndarray) – An array of shape (n_samples, ) with the minumum distance between the i:th subsequence and the i:th sample
indices (int, ndarray, optional) – An array of shape (n_samples, ) with the index of the best matching position of the i:th subsequence and the i:th sample
- wildboar.distance.paired_subsequence_match(y, x, threshold=None, *, dim=0, metric='euclidean', metric_params=None, max_matches=None, return_distance=False, n_jobs=None)[source]#
Compute the minimum subsequence distance between the i:th subsequence and time series
- Parameters:
y (list or ndarray of shape (n_samples, n_timestep)) –
Input time series.
if list, a list of array-like of shape (n_timestep, ) with length n_samples
x (ndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)) – The input data
threshold (float) – The distance threshold used to consider a subsequence matching. If no threshold is selected, max_matches defaults to 10.
dim (int, optional) – The dim to search for shapelets
metric (str or callable, optional) –
The distance metric
See
_SUBSEQUENCE_DISTANCE_MEASURE.keys()for a list of supported metrics.metric_params (dict, optional) –
Parameters to the metric.
Read more about the parameters in the User guide.
max_matches (int, optional) –
Return the top max_matches matches below threshold.
If a threshold is given, the default behaviour is to return all matching indices in the order of occurrence .
If no threshold is given, the default behaviour is to return the top 10 matching indicies ordered by distance
If both threshold and max_matches are given the top matches are returned ordered by distance.
return_distance (bool, optional) –
if True, return the distance of the match
n_jobs (int, optional) – The number of parallel jobs to run. Ignored
- Returns:
indicies (ndarray) – The start index of matching subsequences. Return depends on input:
if x.ndim > 1, return an ndarray of shape (n_samples, )
if x.ndim == 1, return ndarray of shape (n_matches, ) or None
For each sample, the ndarray contains the .
distance (ndarray, optional) – The distances of matching subsequences. Return depends on input:
if x.ndim > 1, return an ndarray of shape (n_samples, )
if x.ndim == 1, return ndarray of shape (n_matches, ) or None
- wildboar.distance.pairwise_distance(x, y=None, *, dim=0, metric='euclidean', metric_params=None, n_jobs=None)[source]#
Compute the distance between subsequences and time series
- Parameters:
x (ndarray of shape (n_timestep, ), (x_samples, n_timestep) or (x_samples, n_dims, n_timestep)) – The input data
y (ndarray of shape (n_timestep, ), (y_samples, n_timestep) or (y_samples, n_dims, n_timestep), optional) – The input data
dim (int, optional) –
The dim to compute distance
- metricstr or callable, optional
The distance metric
See
_DISTANCE_MEASURE.keys()for a list of supported metrics.
metric_params (dict, optional) –
Parameters to the metric.
Read more about the parameters in the User guide.
n_jobs (int, optional) – The number of parallel jobs.
- Returns:
dist – The distances. Return depends on input.
if x.ndim > 1 and y is None, return array of shape (x_samples, x_samples)
if x.ndim > 1 and y.ndim > 1, return array of shape (x_samples, y_samples)
if x.ndim == 1 and y.ndim > 1, return array of shape (y_samples, )
if y.ndim == 1 and x.ndim > 1, return array of shape (x_samples, )
if x.ndim == 1 and y.ndim == 1, return scalar
- Return type:
float or ndarray
- wildboar.distance.pairwise_subsequence_distance(y, x, *, dim=0, metric='euclidean', metric_params=None, return_index=False, n_jobs=None)[source]#
Compute the minimum subsequence distance between subsequences and time series
- Parameters:
y (list or ndarray of shape (n_subsequences, n_timestep)) –
Input time series.
if list, a list of array-like of shape (n_timestep, )
x (ndarray of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)) – The input data
dim (int, optional) –
The dim to search for subsequence
- metricstr or callable, optional
The distance metric
See
_SUBSEQUENCE_DISTANCE_MEASURE.keys()for a list of supported metrics.
metric_params (dict, optional) –
Parameters to the metric.
Read more about the parameters in the User guide.
return_index (bool, optional) –
if True return the index of the best match. If there are many equally good matches, the first match is returned.
- Returns:
dist (float, ndarray) – The minumum distance. Return depends on input:
if len(y) > 1 and x.ndim > 1, return an array of shape (n_samples, n_subsequences).
if len(y) == 1, return an array of shape (n_samples, ).
if x.ndim == 1, return an array of shape (n_subsequences, ).
if x.ndim == 1 and len(y) == 1, return scalar.
indices (int, ndarray, optional) – The start index of the minumum distance. Return dependes on input:
if len(y) > 1 and x.ndim > 1, return an array of shape (n_samples, n_subsequences).
if len(y) == 1, return an array of shape (n_samples, ).
if x.ndim == 1, return an array of shape (n_subsequences, ).
if x.ndim == 1 and len(y) == 1, return scalar.
- wildboar.distance.subsequence_match(y, x, threshold=None, *, dim=0, metric='euclidean', metric_params=None, max_matches=None, exclude=None, return_distance=False, n_jobs=None)[source]#
Find the positions where the distance is less than the threshold between the subsequence and all time series.
If a threshold is given, the default behaviour is to return all matching indices in the order of occurrence
If no threshold is given, the default behaviour is to return the top 10 matching indicies ordered by distance
If both threshold and max_matches are given, the top matches are returned ordered by distance.
- Parameters:
y (array-like of shape (yn_timestep, )) – The subsequence
x (ndarray of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)) – The input data
threshold (str, float or callable, optional) –
The distance threshold used to consider a subsequence matching. If no threshold is selected, max_matches defaults to 10.
if float, return all matches closer than threshold
if callable, return all matches closer than the treshold computed by the threshold function, given all distances to the subsequence
if str, return all matches according to the named threshold.
dim (int, optional) – The dim to search for shapelets
metric (str or callable, optional) –
The distance metric
See
_SUBSEQUENCE_DISTANCE_MEASURE.keys()for a list of supported metrics.metric_params (dict, optional) –
Parameters to the metric.
Read more about the parameters in the User guide.
max_matches (int, optional) – Return the top max_matches matches below threshold.
exclude (float or int, optional) –
Exclude trivial matches in the vicinity of the match.
if float, the exclusion zone is computed as
math.ceil(exclude * y.size)if int, the exclusion zone is exact
A match is considered trivial if a match with lower distance is within exclude timesteps of another match with higher distance.
return_distance (bool, optional) –
if True, return the distance of the match
n_jobs (int, optional) – The number of parallel jobs to run. Ignored
- Returns:
indicies (ndarray) – The start index of matching subsequences. Return depends on input:
if x.ndim > 1, return an ndarray of shape (n_samples, )
if x.ndim == 1, return ndarray of shape (n_matches, ) or None
For each sample, the ndarray contains the .
distance (ndarray, optional) – The distances of matching subsequences. Return depends on input:
if x.ndim > 1, return an ndarray of shape (n_samples, )
if x.ndim == 1, return ndarray of shape (n_matches, ) or None