wildboar.distance#

Submodules#

Package Contents#

Functions#

matrix_profile(x[, y, window, dim, exclude, n_jobs, ...])

Compute the matrix profile.

paired_distance(x, y, *[, dim, metric, metric_params, ...])

Compute the distance between the i:th time series

paired_subsequence_distance(y, x, *[, dim, metric, ...])

Compute the minimum subsequence distance between the i:th subsequence and time

paired_subsequence_match(y, x[, threshold, dim, ...])

Compute the minimum subsequence distance between the i:th subsequence and time

pairwise_distance(x[, y, dim, metric, metric_params, ...])

Compute the distance between subsequences and time series

pairwise_subsequence_distance(y, x, *[, dim, metric, ...])

Compute the minimum subsequence distance between subsequences and time series

subsequence_match(y, x[, threshold, dim, metric, ...])

Find the positions where the distance is less than the threshold between the

wildboar.distance.matrix_profile(x, y=None, *, window=5, dim=0, exclude=None, n_jobs=-1, return_index=False)[source]#

Compute the matrix profile.

  • If only x is given, compute the similarity self-join of every subsequence in x of size window to its nearest neighbor in x excluding trivial matches according to the exclude parameter.

  • If both x and y are given, compute the similarity join of every subsequenec in y of size window to its nearest neighbor in x excluding matches according to the exclude parameter.

Parameters:
  • x (array-like of shape (n_timestep, ), (n_samples, xn_timestep) or (n_samples, n_dim, xn_timestep)) – The first time series

  • y (array-like of shape (n_timestep, ), (n_samples, yn_timestep) or (n_samples, n_dim, yn_timestep), optional) – The optional second time series. y is broadcast to the shape of x if possible.

  • window (int or float, optional) –

    The subsequence size, by default 5

    • if float, a fraction of y.shape[-1]

    • if int, the exact subsequence size

  • dim (int, optional) – The dim to compute the matrix profile for, by default 0

  • exclude (int or float, optional) –

    The size of the exclusion zone. The default exclusion zone is 0.2 for similarity self-join and 0.0 for similarity join.

    • if float, expressed as a fraction of the windows size

    • if int, exact size (0 >= exclude < window)

  • n_jobs (int, optional) – The number of jobs to use when computing the

  • return_index (bool, optional) – Return the matrix profile index

Returns:

  • mp (ndarray of shape (profile_size, ) or (n_samples, profile_size)) – The matrix profile

  • mpi (ndarray of shape (profile_size, ) or (n_samples, profile_size), optional) – The matrix profile index

Notes

The profile_size depends on the input.

  • If y is None´, `profile_size is x.shape[-1] - window + 1

  • If y is not None, profile_size is y.shape[-1] - window + 1

References

Yeh, C. C. M. et al. (2016).

Matrix profile I: All pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In 2016 IEEE 16th international conference on data mining (ICDM)

wildboar.distance.paired_distance(x, y, *, dim=0, metric='euclidean', metric_params=None, n_jobs=None)[source]#

Compute the distance between the i:th time series

Parameters:
  • x (ndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)) – The input data. y will be broadcast to the shape of x if possible.

  • y (: ndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)) – The input data

  • dim (int, optional) –

    The dim to compute distance

    metricstr or callable, optional

    The distance metric

    See _DISTANCE_MEASURE.keys() for a list of supported metrics.

  • metric_params (dict, optional) –

    Parameters to the metric.

    Read more about the parameters in the User guide.

  • n_jobs (int, optional) – The number of parallel jobs.

Returns:

distance – The distances. Return depends on input:

  • if ndim > 1, return an ndarray of shape (n_samples, )

  • if ndim == 1, return ndarray of shape (n_matches, ) or None

Return type:

ndarray

wildboar.distance.paired_subsequence_distance(y, x, *, dim=0, metric='euclidean', metric_params=None, return_index=False, n_jobs=None)[source]#

Compute the minimum subsequence distance between the i:th subsequence and time series

Parameters:
  • y (list or ndarray of shape (n_samples, m_timestep)) –

    Input time series.

    • if list, a list of array-like of shape (m_timestep, )

  • x (ndarray of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)) – The input data

  • dim (int, optional) – The dim to search for shapelets

  • metric (str or callable, optional) –

    The distance metric

    See _SUBSEQUENCE_DISTANCE_MEASURE.keys() for a list of supported metrics.

  • metric_params (dict, optional) –

    Parameters to the metric.

    Read more about the parameters in the User guide.

  • return_index (bool, optional) –

    • if True return the index of the best match. If there are many equally good matches, the first match is returned.

  • n_jobs (int, optional) – The number of parallel jobs to run. Ignored

Returns:

  • dist (float, ndarray) – An array of shape (n_samples, ) with the minumum distance between the i:th subsequence and the i:th sample

  • indices (int, ndarray, optional) – An array of shape (n_samples, ) with the index of the best matching position of the i:th subsequence and the i:th sample

wildboar.distance.paired_subsequence_match(y, x, threshold=None, *, dim=0, metric='euclidean', metric_params=None, max_matches=None, return_distance=False, n_jobs=None)[source]#

Compute the minimum subsequence distance between the i:th subsequence and time series

Parameters:
  • y (list or ndarray of shape (n_samples, n_timestep)) –

    Input time series.

    • if list, a list of array-like of shape (n_timestep, ) with length n_samples

  • x (ndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)) – The input data

  • threshold (float) – The distance threshold used to consider a subsequence matching. If no threshold is selected, max_matches defaults to 10.

  • dim (int, optional) – The dim to search for shapelets

  • metric (str or callable, optional) –

    The distance metric

    See _SUBSEQUENCE_DISTANCE_MEASURE.keys() for a list of supported metrics.

  • metric_params (dict, optional) –

    Parameters to the metric.

    Read more about the parameters in the User guide.

  • max_matches (int, optional) –

    Return the top max_matches matches below threshold.

    • If a threshold is given, the default behaviour is to return all matching indices in the order of occurrence .

    • If no threshold is given, the default behaviour is to return the top 10 matching indicies ordered by distance

    • If both threshold and max_matches are given the top matches are returned ordered by distance.

  • return_distance (bool, optional) –

    • if True, return the distance of the match

  • n_jobs (int, optional) – The number of parallel jobs to run. Ignored

Returns:

  • indicies (ndarray) – The start index of matching subsequences. Return depends on input:

    • if x.ndim > 1, return an ndarray of shape (n_samples, )

    • if x.ndim == 1, return ndarray of shape (n_matches, ) or None

    For each sample, the ndarray contains the .

  • distance (ndarray, optional) – The distances of matching subsequences. Return depends on input:

    • if x.ndim > 1, return an ndarray of shape (n_samples, )

    • if x.ndim == 1, return ndarray of shape (n_matches, ) or None

wildboar.distance.pairwise_distance(x, y=None, *, dim=0, metric='euclidean', metric_params=None, n_jobs=None)[source]#

Compute the distance between subsequences and time series

Parameters:
  • x (ndarray of shape (n_timestep, ), (x_samples, n_timestep) or (x_samples, n_dims, n_timestep)) – The input data

  • y (ndarray of shape (n_timestep, ), (y_samples, n_timestep) or (y_samples, n_dims, n_timestep), optional) – The input data

  • dim (int, optional) –

    The dim to compute distance

    metricstr or callable, optional

    The distance metric

    See _DISTANCE_MEASURE.keys() for a list of supported metrics.

  • metric_params (dict, optional) –

    Parameters to the metric.

    Read more about the parameters in the User guide.

  • n_jobs (int, optional) – The number of parallel jobs.

Returns:

dist – The distances. Return depends on input.

  • if x.ndim > 1 and y is None, return array of shape (x_samples, x_samples)

  • if x.ndim > 1 and y.ndim > 1, return array of shape (x_samples, y_samples)

  • if x.ndim == 1 and y.ndim > 1, return array of shape (y_samples, )

  • if y.ndim == 1 and x.ndim > 1, return array of shape (x_samples, )

  • if x.ndim == 1 and y.ndim == 1, return scalar

Return type:

float or ndarray

wildboar.distance.pairwise_subsequence_distance(y, x, *, dim=0, metric='euclidean', metric_params=None, return_index=False, n_jobs=None)[source]#

Compute the minimum subsequence distance between subsequences and time series

Parameters:
  • y (list or ndarray of shape (n_subsequences, n_timestep)) –

    Input time series.

    • if list, a list of array-like of shape (n_timestep, )

  • x (ndarray of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)) – The input data

  • dim (int, optional) –

    The dim to search for subsequence

    metricstr or callable, optional

    The distance metric

    See _SUBSEQUENCE_DISTANCE_MEASURE.keys() for a list of supported metrics.

  • metric_params (dict, optional) –

    Parameters to the metric.

    Read more about the parameters in the User guide.

  • return_index (bool, optional) –

    • if True return the index of the best match. If there are many equally good matches, the first match is returned.

Returns:

  • dist (float, ndarray) – The minumum distance. Return depends on input:

    • if len(y) > 1 and x.ndim > 1, return an array of shape (n_samples, n_subsequences).

    • if len(y) == 1, return an array of shape (n_samples, ).

    • if x.ndim == 1, return an array of shape (n_subsequences, ).

    • if x.ndim == 1 and len(y) == 1, return scalar.

  • indices (int, ndarray, optional) – The start index of the minumum distance. Return dependes on input:

    • if len(y) > 1 and x.ndim > 1, return an array of shape (n_samples, n_subsequences).

    • if len(y) == 1, return an array of shape (n_samples, ).

    • if x.ndim == 1, return an array of shape (n_subsequences, ).

    • if x.ndim == 1 and len(y) == 1, return scalar.

wildboar.distance.subsequence_match(y, x, threshold=None, *, dim=0, metric='euclidean', metric_params=None, max_matches=None, exclude=None, return_distance=False, n_jobs=None)[source]#

Find the positions where the distance is less than the threshold between the subsequence and all time series.

  • If a threshold is given, the default behaviour is to return all matching indices in the order of occurrence

  • If no threshold is given, the default behaviour is to return the top 10 matching indicies ordered by distance

  • If both threshold and max_matches are given, the top matches are returned ordered by distance.

Parameters:
  • y (array-like of shape (yn_timestep, )) – The subsequence

  • x (ndarray of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)) – The input data

  • threshold (str, float or callable, optional) –

    The distance threshold used to consider a subsequence matching. If no threshold is selected, max_matches defaults to 10.

    • if float, return all matches closer than threshold

    • if callable, return all matches closer than the treshold computed by the threshold function, given all distances to the subsequence

    • if str, return all matches according to the named threshold.

  • dim (int, optional) – The dim to search for shapelets

  • metric (str or callable, optional) –

    The distance metric

    See _SUBSEQUENCE_DISTANCE_MEASURE.keys() for a list of supported metrics.

  • metric_params (dict, optional) –

    Parameters to the metric.

    Read more about the parameters in the User guide.

  • max_matches (int, optional) – Return the top max_matches matches below threshold.

  • exclude (float or int, optional) –

    Exclude trivial matches in the vicinity of the match.

    • if float, the exclusion zone is computed as math.ceil(exclude * y.size)

    • if int, the exclusion zone is exact

    A match is considered trivial if a match with lower distance is within exclude timesteps of another match with higher distance.

  • return_distance (bool, optional) –

    • if True, return the distance of the match

  • n_jobs (int, optional) – The number of parallel jobs to run. Ignored

Returns:

  • indicies (ndarray) – The start index of matching subsequences. Return depends on input:

    • if x.ndim > 1, return an ndarray of shape (n_samples, )

    • if x.ndim == 1, return ndarray of shape (n_matches, ) or None

    For each sample, the ndarray contains the .

  • distance (ndarray, optional) – The distances of matching subsequences. Return depends on input:

    • if x.ndim > 1, return an ndarray of shape (n_samples, )

    • if x.ndim == 1, return ndarray of shape (n_matches, ) or None