`wildboar.distance`#

Fast distance computations.

The wildboar.distance module includes functions for computing paired and pairwise distances between time series and between time series and subsequences.

See the User Guide for more details and examples.

Classes#

`KMeans`	KMeans clustering with support for DTW and weighted DTW.
`KMedoids`	KMedoid algorithm.
`KNeighborsClassifier`	Classifier implementing k-nearest neighbors.
`MDS`	Multidimensional scaling.

Functions#

`argmin_distance`(x[, y, dim, k, metric, metric_params, ...])	Find the indicies of the samples with the lowest distance in Y.
`argmin_subsequence_distance`(y, x, *[, dim, k, metric, ...])	Compute the k:th closest subsequences.
`distance_profile`(y, x, *[, dilation, padding, dim, ...])	Compute the distance profile.
`matrix_profile`(X[, Y, dim, window, exclude, kind, ...])	Compute the matrix profile of every subsequence in X.
`paired_distance`(x, y, *[, dim, metric, metric_params, ...])	Compute the distance between the i:th time series.
`paired_matrix_profile`(X[, Y, window, dim, exclude, ...])	Compute the matrix profile.
`paired_subsequence_distance`(y, x, *[, dim, metric, ...])	Minimum subsequence distance between the i:th subsequence and time series.
`paired_subsequence_match`(y, x[, threshold, dim, ...])	Find matching subsequnces.
`pairwise_distance`(x[, y, dim, metric, metric_params, ...])	Compute the distance between every time series in X and Y.
`pairwise_subsequence_distance`(y, x, *[, dim, metric, ...])	Minimum subsequence distance between subsequences and time series.
`subsequence_match`(y, x[, threshold, dim, metric, ...])	Find matching subsequnces.

class wildboar.distance.KMeans(n_clusters=8, *, metric='euclidean', r=1.0, g=None, init='random', n_init='auto', max_iter=300, tol=0.001, verbose=0, random_state=None)[source]#

KMeans clustering with support for DTW and weighted DTW.

Parameters:

n_clustersint, optional: The number of clusters.
metric{“euclidean”, “dtw”}, optional: The metric.
rfloat, optional: The size of the warping window.
gfloat, optional: SoftDTW penalty. If None, traditional DTW is used.
init{“random”}, optional: Cluster initialization. If “random”, randomly initialize n_clusters.
n_init“auto” or int, optional: Number times the algorithm is re-initialized with new centroids.
max_iterint, optional: The maximum number of iterations for a single run of the algorithm.
tolfloat, optional: Relative tolerance to declare convergence of two consecutive iterations.
verboseint, optional: Print diagnostic messages during convergence.
random_stateRandomState or int, optional: Determines random number generation for centroid initialization and barycentering when fitting with metric=”dtw”.

Attributes:

n_iter_int: The number of iterations before convergence.
cluster_centers_ndarray of shape (n_clusters, n_timestep): The cluster centers.
labels_ndarray of shape (n_samples, ): The cluster assignment.

fit(x, y=None)[source]#

Compute the kmeans-clustering.

Parameters:

xunivariate time-series: The input samples.
yIgnored, optional: Not used.

Returns:

object: Fitted estimator.

fit_predict(X, y=None, **kwargs)[source]#

Perform clustering on X and returns cluster labels.

Parameters:

Xarray-like of shape (n_samples, n_features): Input data.
yIgnored: Not used, present for API consistency by convention.
**kwargsdict: Arguments to be passed to fit.

Added in version 1.4.

Returns:

labelsndarray of shape (n_samples,), dtype=np.int64: Cluster labels.

fit_transform(X, y=None, **fit_params)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:

Xarray-like of shape (n_samples, n_features): Input samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None: Target values (None for unsupervised transformations).
**fit_paramsdict: Additional fit parameters.

Returns:

X_newndarray array of shape (n_samples, n_features_new): Transformed array.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(x)[source]#

Predict the closest cluster for each sample.

Parameters:

xunivariate time-series: The input samples.

Returns:

ndarray of shape (n_samples, ): Index of the cluster each sample belongs to.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:

transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:

selfestimator instance: Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

transform(x)[source]#

Transform the input to a cluster distance space.

Parameters:

xunivariate time-series: The input samples.

Returns:

ndarray of shape (n_samples, n_clusters): The distance between each sample and each cluster.

class wildboar.distance.KMedoids(n_clusters=8, metric='euclidean', metric_params=None, init='random', n_init='auto', algorithm='fast', max_iter=30, tol=0.0001, verbose=0, n_jobs=None, random_state=None)[source]#

KMedoid algorithm.

Parameters:

n_clustersint, optional: The number of clusters.
metricstr, optional: The metric.
metric_paramsdict, optional: The metric parameters. Read more about the metrics and their parameters in the User guide.
init{“auto”, “random”, “min”}, optional: Cluster initialization. If “random”, randomly initialize n_clusters, if “min” select the samples with the smallest distance to the other samples.
n_init“auto” or int, optional: Number times the algorithm is re-initialized with new centroids.
algorithm{“fast”, “pam”}, optional: The algorithm for updating cluster assignments. If “pam”, use the Partitioning Around Medoids algorithm.
max_iterint, optional: The maximum number of iterations for a single run of the algorithm.
tolfloat, optional: Relative tolerance to declare convergence of two consecutive iterations.
verboseint, optional: Print diagnostic messages during convergence.
n_jobsint, optional: The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.
random_stateRandomState or int, optional: Determines random number generation for centroid initialization and barycentering when fitting with metric=”dtw”.

Attributes:

n_iter_int: The number of iterations before convergence.
cluster_centers_ndarray of shape (n_clusters, n_timestep): The cluster centers.
medoid_indices_ndarray of shape (n_clusters, ): The index of the medoid in the input samples.
labels_ndarray of shape (n_samples, ): The cluster assignment.

fit(x, y=None)[source]#

Compute the kmedoids-clustering.

Parameters:

xunivariate time-series: The input samples.
yIgnored, optional: Not used.

Returns:

object: Fitted estimator.

fit_predict(X, y=None, **kwargs)[source]#

Perform clustering on X and returns cluster labels.

Parameters:

Xarray-like of shape (n_samples, n_features): Input data.
yIgnored: Not used, present for API consistency by convention.
**kwargsdict: Arguments to be passed to fit.

Added in version 1.4.

Returns:

labelsndarray of shape (n_samples,), dtype=np.int64: Cluster labels.

fit_transform(X, y=None, **fit_params)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:

Xarray-like of shape (n_samples, n_features): Input samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None: Target values (None for unsupervised transformations).
**fit_paramsdict: Additional fit parameters.

Returns:

X_newndarray array of shape (n_samples, n_features_new): Transformed array.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(x)[source]#

Predict the closest cluster for each sample.

Parameters:

xunivariate time-series: The input samples.

Returns:

ndarray of shape (n_samples, ): Index of the cluster each sample belongs to.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:

transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:

selfestimator instance: Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

transform(x)[source]#

Transform the input to a cluster distance space.

Parameters:

xunivariate time-series: The input samples.

Returns:

ndarray of shape (n_samples, n_clusters): The distance between each sample and each cluster.

class wildboar.distance.KNeighborsClassifier(n_neighbors=5, *, metric='euclidean', metric_params=None, n_jobs=None)[source]#

Classifier implementing k-nearest neighbors.

Parameters:

n_neighborsint, optional

The number of neighbors.

metricstr, optional

The distance metric.

metric_paramsdict, optional

Optional parameters to the distance metric.

Read more about the metrics and their parameters in the User guide.

n_jobsint, optional

The number of parallel jobs.

Attributes:

classes_ndarray of shapel (n_classes, ): Known class labels.

fit(x, y)[source]#

Fit the classifier to the training data.

Parameters:

xunivariate time-series or multivaraite time-series: The input samples.
yarray-like of shape (n_samples, ): The input labels.

Returns:

KNeighborClassifier: This instance.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

predict(x)[source]#

Compute the class label for the samples in x.

Parameters:

xunivariate time-series or multivariate time-series: The input samples.

Returns:

ndarray of shape (n_samples, ): The class label for each sample.

predict_proba(x)[source]#

Compute probability estimates for the samples in x.

Parameters:

xunivariate time-series or multivariate time-series: The input samples.

Returns:

ndarray of shape (n_samples, len(self.classes_)): The probability of each class for each sample.

score(X, y, sample_weight=None)[source]#

Return accuracy on provided data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:

Xarray-like of shape (n_samples, n_features): Test samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True labels for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns:

scorefloat: Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

class wildboar.distance.MDS(n_components=2, *, metric=True, n_init=4, max_iter=300, verbose=0, eps=0.001, n_jobs=None, random_state=None, dissimilarity='euclidean', dissimilarity_params=None, normalized_stress='auto')[source]#

Multidimensional scaling.

Parameters:

n_componentsint, optional

Number of dimensions in which to immerse the dissimilarities.

metricbool, optional

If True, perform metric MDS; otherwise, perform nonmetric MDS. When False (i.e. non-metric MDS), dissimilarities with 0 are considered as missing values.

n_initint, optional

Number of times the SMACOF algorithm will be run with different initializations. The final results will be the best output of the runs, determined by the run with the smallest final stress.

max_iterint, optional

Maximum number of iterations of the SMACOF algorithm for a single run.

verboseint, optional

Level of verbosity.

epsfloat, optional

Relative tolerance with respect to stress at which to declare convergence. The value of eps should be tuned separately depending on whether or not normalized_stress is being used.

n_jobsint, optional

The number of jobs to use for the computation. If multiple initializations are used (n_init), each run of the algorithm is computed in parallel.

random_stateint, RandomState instance or None, optional

Determines the random number generator used to initialize the centers. Pass an int for reproducible results across multiple function calls.

dissimilaritystr, optional

The dissimilarity measure.

See _METRICS.keys() for a list of supported metrics.

dissimilarity_paramsdict, optional

Parameters to the dissimilarity measue.

Read more about the parameters in the User guide.

normalized_stressbool or “auto”, optional

Whether use and return normed stress value (Stress-1) instead of raw stress calculated by default. Only supported in non-metric MDS.

Notes

This implementation is a convenience wrapper around sklearn.manifold.MDS to when using Wildboar metrics.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

wildboar.distance.argmin_distance(x, y=None, *, dim=0, k=1, metric='euclidean', metric_params=None, lower_bound=None, sorted=False, return_distance=False, n_jobs=None)[source]#

Find the indicies of the samples with the lowest distance in Y.

Parameters:

xarray-like of shape (x_samples, x_timestep): The needle.
yarray-like of shape (y_samples, y_timestep), optional: The haystack.
dimint, optional: The dimension where the distance is computed.
kint, optional: The number of closest samples.
metricstr, optional: The distance metric. See _METRICS.keys() for a list of supported metrics.
metric_paramsdict, optional: Parameters to the metric. Read more about the parameters in the User guide.
lower_boundarray-like of shape (x_samples, y_samples), optional: Lower bound on the distance metric. Read more about supported lower bounds in the User Guide.
sortedbool, optional: Sort the indicies from smallest to largest distance.
return_distancebool, optional: Return the distance for the k samples.
n_jobsint, optional: The number of parallel jobs.

Returns:

indicesndarray of shape (n_samples, k): The indices of the samples in Y with the smallest distance.
distancendarray of shape (n_samples, k), optional: The distance of the samples in Y with the smallest distance.

Warning

Passing a callable to the metric parameter has a significant performance implication.
Assigning an invalid value to the parameter lower_bound will yield an incorrect outcome.

Examples

>>> from wildoar.distance import argmin_distance
>>> X = np.array([[1, 2, 3, 4], [10, 1, 2, 3]])
>>> Y = np.array([[1, 2, 11, 2], [2, 4, 6, 7], [10, 11, 2, 3]])
>>> argmin_distance(X, Y, k=2, return_distance=True)
(array([[0, 1],
        [1, 2]]),
 array([[ 8.24621125,  4.79583152],
        [10.24695077, 10.        ]]))

Using a lower bound:

>>> from wildboar.datasets import load_basic_motions
>>> from wildboar.distance import argmin_distance
>>> from wildboar.distance.lb import DtwKeoghLowerBound
>>> X, y = load_basic_motions()
>>> X = X.reshape(X.shape[0], -1)
>>> lbkeogh = DtwKeoghLowerBound(r=1.0).fit(X[30:])
>>> argmin_distance(
...     X[:30],
...     X[30:],
...     metric="dtw",
...     metric_params={"r": 1.0},
...     lower_bound=lbkeogh.transform(X[:30])
... )

wildboar.distance.argmin_subsequence_distance(y, x, *, dim=0, k=1, metric='euclidean', metric_params=None, scale=False, return_distance=False, n_jobs=None)[source]#

Compute the k:th closest subsequences.

For the i:th shapelet and the i:th sample return the index and, optionally, the distance of the k closest matches.

Parameters:

yarray-like of shape (n_samples, m_timestep) or list of 1d-arrays

The subsequences.

xarray-like of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)

The samples. If x.ndim == 1, it will be broadcast have the same number of samples that y.

dimint, optional

The dimension in x to find subsequences in.

kint, optional

The of closest subsequences to find.

metricstr, optional

The metric.

See _SUBSEQUENCE_METRICS.keys() for a list of supported metrics.

metric_paramsdict, optional

Parameters to the metric.

Read more about the parameters in the User guide.

scalebool, optional

If True, scale the subsequences before distance computation.

return_distancebool, optional

Return the distance for the k closest subsequences.

n_jobsint, optional

The number of parallel jobs.

Returns:

indicesndarray of shape (n_samples, k): The indices of the k closest subsequences.
distancendarray of shape (n_samples, k), optional: The distance of the k closest subsequences.

Warning

Passing a callable to the metric parameter has a significant performance implication.

Examples

>>> import numpy as np
>>> from wildboar.datasets import load_dataset
>>> from wildboar.distance import argmin_subsequence_distance
>>> s = np.lib.stride_tricks.sliding_window_view(X[0], window_shape=10)
>>> x = np.broadcast_to(X[0], shape=(s.shape[0], X.shape[1]))
>>> argmin_subsequence_distance(s, x, k=4)

wildboar.distance.distance_profile(y, x, *, dilation=1, padding=0, dim=0, metric='mass', metric_params=None, scale=False, n_jobs=None)[source]#

Compute the distance profile.

The distance profile corresponds to the distance of the subsequences in y for every time point of the samples in x.

Parameters:

yarray-like of shape (m_timestep, ) or (n_samples, m_timestep)

The subsequences. if y.ndim is 1, we will broacast y to have the same number of samples as x.

xndarray of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)

The samples. If x.ndim is 1, we will broadcast x to have the same number of samples as y.

dilationint, optional

The dilation, i.e., the spacing between points in the subsequences.

paddingint or {“same”}, optional

The amount of padding applied to the input time series. If “same”, the output size is the same as the input size.

dimint, optional

The dim to search for shapelets.

metricstr or callable, optional

The distance metric

See _SUBSEQUENCE_METRICS.keys() for a list of supported metrics.

metric_paramsdict, optional

Parameters to the metric.

Read more about the parameters in the User guide.

scalebool, optional

If True, scale the subsequences before distance computation.

n_jobsint, optional

The number of parallel jobs to run.

Returns:

ndarray of shape (n_samples, output_size) or (output_size, ): The distance profile. output_size is given by: n_timestep + 2 * padding - (n_timestep - 1) * dilation + 1) + 1. If both x and y contains a single subsequence and a single sample, the output is squeezed.

Warning

Passing a callable to the metric parameter has a significant performance implication.

Examples

>>> from wildboar.datasets import load_dataset
>>> from wildboar.distance import distance_profile
>>> X, _ = load_dataset("ECG200")
>>> distance_profile(X[0], X[1:].reshape(-1))
array([14.00120332, 14.41943788, 14.81597243, ...,  4.75219094,
       5.72681005,  6.70155561])

>>> distance_profile(
...     X[0, 0:9], X[1:5], metric="dtw", dilation=2, padding="same"
... )[0, :10]
array([8.01881424, 7.15083281, 7.48856368, 6.83139294, 6.75595579,
       6.30073636, 6.65346307, 6.27919601, 6.25666948, 6.0961576 ])

wildboar.distance.matrix_profile(X, Y=None, *, dim=0, window=5, exclude=None, kind='warn', return_index=False, n_jobs=None)[source]#

Compute the matrix profile of every subsequence in X.

If Y is given compute the metrix profile of every subsequence in X finding the minimum distance in any time series in Y; othervise finding the minimum distance in any time series in X. The former corresponds to a self-join and the latter to an AB join.

The output approximately corresponds to that of matrix_profile where X.flatten() but without computing the distance where two time series overlap. The outputs exactly correspond when X.shape[0] == 1.

Parameters:

Xarray-like of shape (x_samples, x_timestep)

The time series for which the matrix profile is computed.

Yarray-like of shape (y_samples, y_timestep), optional

The time series used to annotate X. If None, X is used to annotate.

dimint, optional

The dimension.

windowint or float, optional

The window size.

If float, the window size is a fraction of x_timestep.
If int, the window size is exact.

excludeint or float, optional

The exclusion zone.

If float, the exclusion zone is a fraction of window.
If int, the exclusion zone is exact.
If None, the exclusion zone is determined automatically. If Y is None, (self-join) the value is 0.2, otherwise (AB-join) the value is 0.0.

kind{“paired”, “default”}, optional

The kind of matrix profile.

if “paired”, compute the matrix profile for each time series in X optionally annotated with each time series in Y.
if “default”, compute the matrix profile for every subsequence in every time series in X optional annotated with every time series in Y.

return_indexbool, optional

Return the matrix profile index.

n_jobsint, optional

The number of parallel jobs.

Returns:

mpndarray of shape (x_samples, profile_size): The matrix profile.
(mpi_sample, mpi_start)ndarray of shape (x_samples, profile_size), optional: The matrix profile index sample and start positions. Returned if return_index=True.

wildboar.distance.paired_distance(x, y, *, dim='mean', metric='euclidean', metric_params=None, n_jobs=None)[source]#

Compute the distance between the i:th time series.

Parameters:

xndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)

The input data.

yndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)

The input data. y will be broadcasted to the shape of x.

dimint or {‘mean’, ‘full’}, optional

The dim to compute distance.

metricstr or callable, optional

The distance metric

See _METRICS.keys() for a list of supported metrics.

metric_paramsdict, optional

Parameters to the metric.

Read more about the parameters in the User guide.

n_jobsint, optional

The number of parallel jobs.

Returns:

ndarray

The distances. Return depends on input:

if x.ndim == 1, return scalar.
if dim=’full’, return ndarray of shape (n_dims, n_samples).
if x.ndim > 1, return an ndarray of shape (n_samples, ).

Warning

Passing a callable to the metric parameter has a significant performance implication.

wildboar.distance.paired_matrix_profile(X, Y=None, *, window=5, dim=0, exclude=None, n_jobs=-1, return_index=False)[source]#

Compute the matrix profile.

If only X is given, compute the similarity self-join of every subsequence in X of size window to its nearest neighbor in X excluding trivial matches according to the exclude parameter.
If both X and Y are given, compute the similarity join of every subsequenec in X of size window to its nearest neighbor in Y excluding matches according to the exclude parameter.

Parameters:

Xarray-like of shape (n_timestep, ), (n_samples, x_timestep) or (n_samples, n_dim, x_timestep)

The first time series.

Yarray-like of shape (n_timestep, ), (n_samples, y_timestep) or (n_samples, n_dim, y_timestep), optional

The optional second time series. Y is broadcast to the shape of X if possible.

windowint or float, optional

The subsequence size, by default 5

if float, a fraction of y_timestep
if int, the exact subsequence size.

dimint, optional

The dim to compute the matrix profile for, by default 0.

excludeint or float, optional

The size of the exclusion zone. The default exclusion zone is 0.2 for similarity self-join and 0.0 for similarity join.

if float, expressed as a fraction of the windows size.
if int, exact size (0 >= exclude < window).

n_jobsint, optional

The number of jobs to use when computing the profile.

return_indexbool, optional

Return the matrix profile index.

Returns:

mpndarray of shape (profile_size, ) or (n_samples, profile_size): The matrix profile.
mpindarray of shape (profile_size, ) or (n_samples, profile_size), optional: The matrix profile index.

Notes

The profile_size is X.shape[-1] - window + 1.

References

Yeh, C. C. M. et al. (2016).: Matrix profile I: All pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In 2016 IEEE 16th international conference on data mining (ICDM)

wildboar.distance.paired_subsequence_distance(y, x, *, dim=0, metric='euclidean', metric_params=None, scale=False, return_index=False, n_jobs=None)[source]#

Minimum subsequence distance between the i:th subsequence and time series.

Parameters:

ylist or ndarray of shape (n_samples, m_timestep)

Input time series.

if list, a list of array-like of shape (m_timestep, ).

xndarray of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)

The input data.

dimint, optional

The dim to search for shapelets.

metricstr or callable, optional

The distance metric

See _SUBSEQUENCE_METRICS.keys() for a list of supported metrics.

metric_paramsdict, optional

Parameters to the metric.

Read more about the parameters in the User guide.

scalebool, optional

If True, scale the subsequences before distance computation.

Added in version 1.3.

return_indexbool, optional

if True return the index of the best match. If there are many equally good matches, the first match is returned.

n_jobsint, optional

The number of parallel jobs to run.

Returns:

distfloat, ndarray: An array of shape (n_samples, ) with the minumum distance between the i:th subsequence and the i:th sample.
indicesint, ndarray, optional: An array of shape (n_samples, ) with the index of the best matching position of the i:th subsequence and the i:th sample.

Warning

Passing a callable to the metric parameter has a significant performance implication.

wildboar.distance.paired_subsequence_match(y, x, threshold=None, *, dim=0, metric='euclidean', metric_params=None, scale=False, max_matches=None, return_distance=False, n_jobs=None)[source]#

Find matching subsequnces.

Find the positions where the distance is less than the threshold between the i:th subsequences and time series.

If a threshold is given, the default behaviour is to return all matching indices in the order of occurrence
If no threshold is given, the default behaviour is to return the top 10 matching indicies ordered by distance
If both threshold and max_matches are given, the top matches are returned ordered by distance and time series.

Parameters:

ylist or ndarray of shape (n_samples, n_timestep)

Input time series.

if list, a list of array-like of shape (n_timestep, ) with length n_samples.

xndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)

The input data.

thresholdfloat, optional

The distance threshold used to consider a subsequence matching. If no threshold is selected, max_matches defaults to 10.

dimint, optional

The dim to search for shapelets.

metricstr or callable, optional

The distance metric

See _SUBSEQUENCE_METRICS.keys() for a list of supported metrics.

metric_paramsdict, optional

Parameters to the metric.

Read more about the parameters in the User guide.

scalebool, optional

If True, scale the subsequences before distance computation.

Added in version 1.3.

max_matchesint, optional

Return the top max_matches matches below threshold.

If a threshold is given, the default behaviour is to return all matching indices in the order of occurrence .
If no threshold is given, the default behaviour is to return the top 10 matching indicies ordered by distance
If both threshold and max_matches are given the top matches are returned ordered by distance.

return_distancebool, optional

If True, return the distance of the match.

n_jobsint, optional

The number of parallel jobs to run. Ignored.

Returns:

indiciesndarray of shape (n_samples, ): The start index of matching subsequences.
distancendarray of shape (n_samples, ), optional: The distances of matching subsequences.

Warning

Passing a callable to the metric parameter has a significant performance implication.

wildboar.distance.pairwise_distance(x, y=None, *, dim='mean', metric='euclidean', metric_params=None, n_jobs=None)[source]#

Compute the distance between every time series in X and Y.

Parameters:

xndarray of shape (n_timestep, ), (x_samples, n_timestep) or (x_samples, n_dims, n_timestep)

The input data.

yndarray of shape (n_timestep, ), (y_samples, n_timestep) or (y_samples, n_dims, n_timestep), optional

The input data.

dimint or {‘mean’, ‘full’}, optional

The dim to compute distance.

metricstr or callable, optional

The distance metric

See _METRICS.keys() for a list of supported metrics.

metric_paramsdict, optional

Parameters to the metric.

Read more about the parameters in the User guide.

n_jobsint, optional

The number of parallel jobs.

Returns:

float or ndarray

The distances. Return depends on input.

if x.ndim == 1 and y.ndim == 1, scalar.
if dim=”full”, array of shape (n_dims, x_samples, y_samples).
if dim=”full” and y is None, array of shape (n_dims, x_samples, x_samples).
if x.ndim > 1 and y is None, array of shape (x_samples, x_samples).
if x.ndim > 1 and y.ndim > 1, array of shape (x_samples, y_samples).
if x.ndim == 1 and y.ndim > 1, array of shape (y_samples, ).
if y.ndim == 1 and x.ndim > 1, array of shape (x_samples, ).

Warning

Passing a callable to the metric parameter has a significant performance implication.

wildboar.distance.pairwise_subsequence_distance(y, x, *, dim=0, metric='euclidean', metric_params=None, scale=False, return_index=False, n_jobs=None)[source]#

Minimum subsequence distance between subsequences and time series.

Parameters:

ylist or ndarray of shape (n_subsequences, n_timestep)

Input time series.

if list, a list of array-like of shape (n_timestep, ).

xndarray of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)

The input data.

dimint, optional

The dim to search for subsequence.

metricstr or callable, optional

The distance metric

See _SUBSEQUENCE_METRICS.keys() for a list of supported metrics.

metric_paramsdict, optional

Parameters to the metric.

Read more about the parameters in the User guide.

scalebool, optional

If True, scale the subsequences before distance computation.

Added in version 1.3.

return_indexbool, optional

if True return the index of the best match. If there are many equally good matches, the first match is returned.

n_jobsint, optional

The number of parallel jobs.

Returns:

distfloat, ndarray

The minumum distance. Return depends on input:

if len(y) > 1 and x.ndim > 1, return an array of shape (n_samples, n_subsequences).
if len(y) == 1, return an array of shape (n_samples, ).
if x.ndim == 1, return an array of shape (n_subsequences, ).
if x.ndim == 1 and len(y) == 1, return scalar.

indicesint, ndarray, optional

The start index of the minumum distance. Return dependes on input:

if len(y) > 1 and x.ndim > 1, return an array of shape (n_samples, n_subsequences).
if len(y) == 1, return an array of shape (n_samples, ).
if x.ndim == 1, return an array of shape (n_subsequences, ).
if x.ndim == 1 and len(y) == 1, return scalar.

Warning

Passing a callable to the metric parameter has a significant performance implication.

wildboar.distance.subsequence_match(y, x, threshold=None, *, dim=0, metric='euclidean', metric_params=None, scale=False, max_matches=None, exclude=None, return_distance=False, n_jobs=None)[source]#

Find matching subsequnces.

Find the positions where the distance is less than the threshold between the subsequence and all time series.

If a threshold is given, the default behaviour is to return all matching indices in the order of occurrence
If no threshold is given, the default behaviour is to return the top 10 matching indicies ordered by distance
If both threshold and max_matches are given, the top matches are returned ordered by distance.

Parameters:

yarray-like of shape (yn_timestep, )

The subsequence.

xndarray of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)

The input data.

threshold{“auto”}, float or callable, optional

The distance threshold used to consider a subsequence matching. If no threshold is selected, max_matches defaults to 10.

if float, return all matches closer than threshold
if callable, return all matches closer than the treshold computed by the threshold function, given all distances to the subsequence
if str, return all matches according to the named threshold.

dimint, optional

The dim to search for shapelets.

metricstr or callable, optional

The distance metric

See _SUBSEQUENCE_METRICS.keys() for a list of supported metrics.

metric_paramsdict, optional

Parameters to the metric.

Read more about the parameters in the User guide.

scalebool, optional

If True, scale the subsequences before distance computation.

Added in version 1.3.

max_matchesint, optional

Return the top max_matches matches below threshold.

excludefloat or int, optional

Exclude trivial matches in the vicinity of the match.

if float, the exclusion zone is computed as math.ceil(exclude * y.size)
if int, the exclusion zone is exact

A match is considered trivial if a match with lower distance is within exclude timesteps of another match with higher distance.

return_distancebool, optional

if True, return the distance of the match.

n_jobsint, optional

The number of parallel jobs to run.

Returns:

indiciesndarray of shape (n_samples, ) or (n_matches, ): The start index of matching subsequences. Returns a single array of n_matches if x.ndim == 1. If no matches are found for a sample, the array element is None.
distancendarray of shape (n_samples, ), optional: The distances of matching subsequences. Returns a single array of n_matches if x.ndim == 1. If no matches are found for a sample, the array element is None.

Warning

Passing a callable to the metric parameter has a significant performance implication.

wildboar.distance#

Classes#

Functions#

This Page

`wildboar.distance`#