wildboar.distance#

Fast distance computations.

The wildboar.distance module includes functions for computing paired and pairwise distances between time series and between time series and subsequences.

See the User Guide for more details and examples.

Submodules#

Package Contents#

Classes#

KMeans

KMeans clustering with support for DTW and weighted DTW.

KMedoids

KMedoid algorithm.

KNeighborsClassifier

Classifier implementing k-nearest neighbors.

MDS

Multidimensional scaling.

Functions#

argmin_distance(x[, y, dim, k, metric, metric_params, ...])

Find the indicies of the samples with the lowest distance in Y.

argmin_subsequence_distance(y, x, *[, dim, k, metric, ...])

Compute the k:th closest subsequences.

distance_profile(y, x, *[, dilation, padding, dim, ...])

Compute the distance profile.

matrix_profile(x[, y, window, dim, exclude, n_jobs, ...])

Compute the matrix profile.

paired_distance(x, y, *[, dim, metric, metric_params, ...])

Compute the distance between the i:th time series.

paired_subsequence_distance(y, x, *[, dim, metric, ...])

Minimum subsequence distance between the i:th subsequence and time series.

paired_subsequence_match(y, x[, threshold, dim, ...])

Find matching subsequnces.

pairwise_distance(x[, y, dim, metric, metric_params, ...])

Compute the distance between subsequences and time series.

pairwise_subsequence_distance(y, x, *[, dim, metric, ...])

Minimum subsequence distance between subsequences and time series.

subsequence_match(y, x[, threshold, dim, metric, ...])

Find matching subsequnces.

class wildboar.distance.KMeans(n_clusters=8, *, metric='euclidean', r=1.0, g=None, init='random', n_init='auto', max_iter=300, tol=0.001, verbose=0, random_state=None)[source]#

KMeans clustering with support for DTW and weighted DTW.

Parameters:
n_clustersint, optional

The number of clusters.

metric{“euclidean”, “dtw”}, optional

The metric.

rfloat, optional

The size of the warping window.

gfloat, optional

SoftDTW penalty. If None, traditional DTW is used.

init{“random”}, optional

Cluster initialization. If “random”, randomly initialize n_clusters.

n_init“auto” or int, optional

Number times the algorithm is re-initialized with new centroids.

max_iterint, optional

The maximum number of iterations for a single run of the algorithm.

tolfloat, optional

Relative tolerance to declare convergence of two consecutive iterations.

verboseint, optional

Print diagnostic messages during convergence.

random_stateRandomState or int, optional

Determines random number generation for centroid initialization and barycentering when fitting with metric=”dtw”.

Attributes:
n_iter_int

The number of iterations before convergence.

cluster_centers_ndarray of shape (n_clusters, n_timestep)

The cluster centers.

labels_ndarray of shape (n_samples, )

The cluster assignment.

fit(x, y=None)[source]#

Compute the kmeans-clustering.

Parameters:
xunivariate time-series

The input samples.

yIgnored, optional

Not used.

Returns:
object

Fitted estimator.

fit_predict(X, y=None, **kwargs)[source]#

Perform clustering on X and returns cluster labels.

Parameters:
Xarray-like of shape (n_samples, n_features)

Input data.

yIgnored

Not used, present for API consistency by convention.

**kwargsdict

Arguments to be passed to fit.

Added in version 1.4.

Returns:
labelsndarray of shape (n_samples,), dtype=np.int64

Cluster labels.

fit_transform(X, y=None, **fit_params)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns:
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

predict(x)[source]#

Predict the closest cluster for each sample.

Parameters:
xunivariate time-series

The input samples.

Returns:
ndarray of shape (n_samples, )

Index of the cluster each sample belongs to.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:
selfestimator instance

Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

transform(x)[source]#

Transform the input to a cluster distance space.

Parameters:
xunivariate time-series

The input samples.

Returns:
ndarray of shape (n_samples, n_clusters)

The distance between each sample and each cluster.

class wildboar.distance.KMedoids(n_clusters=8, metric='euclidean', metric_params=None, init='random', n_init='auto', algorithm='fast', max_iter=30, tol=0.0001, verbose=0, n_jobs=None, random_state=None)[source]#

KMedoid algorithm.

Parameters:
n_clustersint, optional

The number of clusters.

metricstr, optional

The metric.

metric_paramsdict, optional

The metric parameters. Read more about the metrics and their parameters in the User guide.

init{“auto”, “random”, “min”}, optional

Cluster initialization. If “random”, randomly initialize n_clusters, if “min” select the samples with the smallest distance to the other samples.

n_init“auto” or int, optional

Number times the algorithm is re-initialized with new centroids.

algorithm{“fast”, “pam”}, optional

The algorithm for updating cluster assignments. If “pam”, use the Partitioning Around Medoids algorithm.

max_iterint, optional

The maximum number of iterations for a single run of the algorithm.

tolfloat, optional

Relative tolerance to declare convergence of two consecutive iterations.

verboseint, optional

Print diagnostic messages during convergence.

n_jobsint, optional

The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.

random_stateRandomState or int, optional

Determines random number generation for centroid initialization and barycentering when fitting with metric=”dtw”.

Attributes:
n_iter_int

The number of iterations before convergence.

cluster_centers_ndarray of shape (n_clusters, n_timestep)

The cluster centers.

medoid_indices_ndarray of shape (n_clusters, )

The index of the medoid in the input samples.

labels_ndarray of shape (n_samples, )

The cluster assignment.

fit(x, y=None)[source]#

Compute the kmedoids-clustering.

Parameters:
xunivariate time-series

The input samples.

yIgnored, optional

Not used.

Returns:
object

Fitted estimator.

fit_predict(X, y=None, **kwargs)[source]#

Perform clustering on X and returns cluster labels.

Parameters:
Xarray-like of shape (n_samples, n_features)

Input data.

yIgnored

Not used, present for API consistency by convention.

**kwargsdict

Arguments to be passed to fit.

Added in version 1.4.

Returns:
labelsndarray of shape (n_samples,), dtype=np.int64

Cluster labels.

fit_transform(X, y=None, **fit_params)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns:
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

predict(x)[source]#

Predict the closest cluster for each sample.

Parameters:
xunivariate time-series

The input samples.

Returns:
ndarray of shape (n_samples, )

Index of the cluster each sample belongs to.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:
selfestimator instance

Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

transform(x)[source]#

Transform the input to a cluster distance space.

Parameters:
xunivariate time-series

The input samples.

Returns:
ndarray of shape (n_samples, n_clusters)

The distance between each sample and each cluster.

class wildboar.distance.KNeighborsClassifier(n_neighbors=5, *, metric='euclidean', metric_params=None, n_jobs=None)[source]#

Classifier implementing k-nearest neighbors.

Parameters:
n_neighborsint, optional

The number of neighbors.

metricstr, optional

The distance metric.

metric_paramsdict, optional

Optional parameters to the distance metric.

Read more about the metrics and their parameters in the User guide.

n_jobsint, optional

The number of parallel jobs.

Attributes:
classes_ndarray of shapel (n_classes, )

Known class labels.

fit(x, y)[source]#

Fit the classifier to the training data.

Parameters:
xunivariate time-series or multivaraite time-series

The input samples.

yarray-like of shape (n_samples, )

The input labels.

Returns:
KNeighborClassifier

This instance.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

predict(x)[source]#

Compute the class label for the samples in x.

Parameters:
xunivariate time-series or multivariate time-series

The input samples.

Returns:
ndarray of shape (n_samples, )

The class label for each sample.

predict_proba(x)[source]#

Compute probability estimates for the samples in x.

Parameters:
xunivariate time-series or multivariate time-series

The input samples.

Returns:
ndarray of shape (n_samples, len(self.classes_))

The probability of each class for each sample.

score(X, y, sample_weight=None)[source]#

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

class wildboar.distance.MDS(n_components=2, *, metric=True, n_init=4, max_iter=300, verbose=0, eps=0.001, n_jobs=None, random_state=None, dissimilarity='euclidean', dissimilarity_params=None, normalized_stress='warn')[source]#

Multidimensional scaling.

Parameters:
n_componentsint, optional

Number of dimensions in which to immerse the dissimilarities.

metricbool, optional

If True, perform metric MDS; otherwise, perform nonmetric MDS. When False (i.e. non-metric MDS), dissimilarities with 0 are considered as missing values.

n_initint, optional

Number of times the SMACOF algorithm will be run with different initializations. The final results will be the best output of the runs, determined by the run with the smallest final stress.

max_iterint, optional

Maximum number of iterations of the SMACOF algorithm for a single run.

verboseint, optional

Level of verbosity.

epsfloat, optional

Relative tolerance with respect to stress at which to declare convergence. The value of eps should be tuned separately depending on whether or not normalized_stress is being used.

n_jobsint, optional

The number of jobs to use for the computation. If multiple initializations are used (n_init), each run of the algorithm is computed in parallel.

random_stateint, RandomState instance or None, optional

Determines the random number generator used to initialize the centers. Pass an int for reproducible results across multiple function calls.

dissimilaritystr, optional

The dissimilarity measure.

See _METRICS.keys() for a list of supported metrics.

dissimilarity_paramsdict, optional

Parameters to the dissimilarity measue.

Read more about the parameters in the User guide.

normalized_stressbool or “auto”, optional

Whether use and return normed stress value (Stress-1) instead of raw stress calculated by default. Only supported in non-metric MDS.

Notes

This implementation is a convenience wrapper around sklearn.manifold.MDS to when using Wildboar metrics.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

wildboar.distance.argmin_distance(x, y=None, *, dim=0, k=1, metric='euclidean', metric_params=None, sorted=False, return_distance=False, n_jobs=None)[source]#

Find the indicies of the samples with the lowest distance in Y.

Parameters:
xunivariate time-series or multivariate time-series

The needle.

yunivariate time-series or multivariate time-series, optional

The haystack.

dimint, optional

The dimension where the distance is computed.

kint, optional

The number of closest samples.

metricstr, optional

The distance metric

See _METRICS.keys() for a list of supported metrics.

metric_paramsdict, optional

Parameters to the metric.

Read more about the parameters in the User guide.

sortedbool, optional

Sort the indicies from smallest to largest distance.

return_distancebool, optional

Return the distance for the k samples.

n_jobsint, optional

The number of parallel jobs.

Returns:
indicesndarray of shape (n_samples, k)

The indices of the samples in Y with the smallest distance.

distancendarray of shape (n_samples, k), optional

The distance of the samples in Y with the smallest distance.

Warning

Passing a callable to the metric parameter has a significant performance implication.

Examples

>>> from wildoar.distance import argmin_distance
>>> X = np.array([[1, 2, 3, 4], [10, 1, 2, 3]])
>>> Y = np.array([[1, 2, 11, 2], [2, 4, 6, 7], [10, 11, 2, 3]])
>>> argmin_distance(X, Y, k=2, return_distance=True)
(array([[0, 1],
        [1, 2]]),
 array([[ 8.24621125,  4.79583152],
        [10.24695077, 10.        ]]))
wildboar.distance.argmin_subsequence_distance(y, x, *, dim=0, k=1, metric='euclidean', metric_params=None, scale=False, return_distance=False, n_jobs=None)[source]#

Compute the k:th closest subsequences.

For the i:th shapelet and the i:th sample return the index and, optionally, the distance of the k closest matches.

Parameters:
yarray-like of shape (n_samples, m_timestep) or list of 1d-arrays

The subsequences.

xarray-like of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)

The samples. If x.ndim == 1, it will be broadcast have the same number of samples that y.

dimint, optional

The dimension in x to find subsequences in.

kint, optional

The of closest subsequences to find.

metricstr, optional

The metric.

See _SUBSEQUENCE_METRICS.keys() for a list of supported metrics.

metric_paramsdict, optional

Parameters to the metric.

Read more about the parameters in the User guide.

scalebool, optional

If True, scale the subsequences before distance computation.

return_distancebool, optional

Return the distance for the k closest subsequences.

n_jobsint, optional

The number of parallel jobs.

Returns:
indicesndarray of shape (n_samples, k)

The indices of the k closest subsequences.

distancendarray of shape (n_samples, k), optional

The distance of the k closest subsequences.

Warning

Passing a callable to the metric parameter has a significant performance implication.

Examples

>>> import numpy as np
>>> from wildboar.datasets import load_dataset
>>> from wildboar.distance import argmin_subsequence_distance
>>> s = np.lib.stride_tricks.sliding_window_view(X[0], window_shape=10)
>>> x = np.broadcast_to(X[0], shape=(s.shape[0], X.shape[1]))
>>> argmin_subsequence_distance(s, x, k=4)
wildboar.distance.distance_profile(y, x, *, dilation=1, padding=0, dim=0, metric='mass', metric_params=None, scale=False, n_jobs=None)[source]#

Compute the distance profile.

The distance profile corresponds to the distance of the subsequences in y for every time point of the samples in x.

Parameters:
yarray-like of shape (m_timestep, ) or (n_samples, m_timestep)

The subsequences. if y.ndim is 1, we will broacast y to have the same number of samples as x.

xndarray of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)

The samples. If x.ndim is 1, we will broadcast x to have the same number of samples as y.

dilationint, optional

The dilation, i.e., the spacing between points in the subsequences.

paddingint or {“same”}, optional

The amount of padding applied to the input time series. If “same”, the output size is the same as the input size.

dimint, optional

The dim to search for shapelets.

metricstr or callable, optional

The distance metric

See _SUBSEQUENCE_METRICS.keys() for a list of supported metrics.

metric_paramsdict, optional

Parameters to the metric.

Read more about the parameters in the User guide.

scalebool, optional

If True, scale the subsequences before distance computation.

n_jobsint, optional

The number of parallel jobs to run.

Returns:
ndarray of shape (n_samples, output_size) or (output_size, )

The distance profile. output_size is given by: n_timestep + 2 * padding - (n_timestep - 1) * dilation + 1) + 1. If both x and y contains a single subsequence and a single sample, the output is squeezed.

Warning

Passing a callable to the metric parameter has a significant performance implication.

Examples

>>> from wildboar.datasets import load_dataset
>>> from wildboar.distance import distance_profile
>>> X, _ = load_dataset("ECG200")
>>> distance_profile(X[0], X[1:].reshape(-1))
array([14.00120332, 14.41943788, 14.81597243, ...,  4.75219094,
       5.72681005,  6.70155561])
>>> distance_profile(
...     X[0, 0:9], X[1:5], metric="dtw", dilation=2, padding="same"
... )[0, :10]
array([8.01881424, 7.15083281, 7.48856368, 6.83139294, 6.75595579,
       6.30073636, 6.65346307, 6.27919601, 6.25666948, 6.0961576 ])
wildboar.distance.matrix_profile(x, y=None, *, window=5, dim=0, exclude=None, n_jobs=-1, return_index=False)[source]#

Compute the matrix profile.

  • If only x is given, compute the similarity self-join of every subsequence in x of size window to its nearest neighbor in x excluding trivial matches according to the exclude parameter.

  • If both x and y are given, compute the similarity join of every subsequenec in y of size window to its nearest neighbor in x excluding matches according to the exclude parameter.

Parameters:
xarray-like of shape (n_timestep, ), (n_samples, xn_timestep) or (n_samples, n_dim, xn_timestep)

The first time series.

yarray-like of shape (n_timestep, ), (n_samples, yn_timestep) or (n_samples, n_dim, yn_timestep), optional

The optional second time series. y is broadcast to the shape of x if possible.

windowint or float, optional

The subsequence size, by default 5

  • if float, a fraction of y.shape[-1].

  • if int, the exact subsequence size.

dimint, optional

The dim to compute the matrix profile for, by default 0.

excludeint or float, optional

The size of the exclusion zone. The default exclusion zone is 0.2 for similarity self-join and 0.0 for similarity join.

  • if float, expressed as a fraction of the windows size.

  • if int, exact size (0 >= exclude < window).

n_jobsint, optional

The number of jobs to use when computing the profile.

return_indexbool, optional

Return the matrix profile index.

Returns:
mpndarray of shape (profile_size, ) or (n_samples, profile_size)

The matrix profile.

mpindarray of shape (profile_size, ) or (n_samples, profile_size), optional

The matrix profile index.

Notes

The profile_size depends on the input.

  • If y is None, profile_size is x.shape[-1] - window + 1

  • If y is not None, profile_size is y.shape[-1] - window + 1

References

Yeh, C. C. M. et al. (2016).

Matrix profile I: All pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In 2016 IEEE 16th international conference on data mining (ICDM)

wildboar.distance.paired_distance(x, y, *, dim='warn', metric='euclidean', metric_params=None, n_jobs=None)[source]#

Compute the distance between the i:th time series.

Parameters:
xndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)

The input data.

yndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)

The input data. y will be broadcasted to the shape of x.

dimint or {‘mean’, ‘full’}, optional

The dim to compute distance.

metricstr or callable, optional

The distance metric

See _METRICS.keys() for a list of supported metrics.

metric_paramsdict, optional

Parameters to the metric.

Read more about the parameters in the User guide.

n_jobsint, optional

The number of parallel jobs.

Returns:
ndarray

The distances. Return depends on input:

  • if x.ndim == 1, return scalar.

  • if dim=’full’, return ndarray of shape (n_dims, n_samples).

  • if x.ndim > 1, return an ndarray of shape (n_samples, ).

Warning

Passing a callable to the metric parameter has a significant performance implication.

wildboar.distance.paired_subsequence_distance(y, x, *, dim=0, metric='euclidean', metric_params=None, scale=False, return_index=False, n_jobs=None)[source]#

Minimum subsequence distance between the i:th subsequence and time series.

Parameters:
ylist or ndarray of shape (n_samples, m_timestep)

Input time series.

  • if list, a list of array-like of shape (m_timestep, ).

xndarray of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)

The input data.

dimint, optional

The dim to search for shapelets.

metricstr or callable, optional

The distance metric

See _SUBSEQUENCE_METRICS.keys() for a list of supported metrics.

metric_paramsdict, optional

Parameters to the metric.

Read more about the parameters in the User guide.

scalebool, optional

If True, scale the subsequences before distance computation.

Added in version 1.3.

return_indexbool, optional
  • if True return the index of the best match. If there are many equally good matches, the first match is returned.

n_jobsint, optional

The number of parallel jobs to run.

Returns:
distfloat, ndarray

An array of shape (n_samples, ) with the minumum distance between the i:th subsequence and the i:th sample.

indicesint, ndarray, optional

An array of shape (n_samples, ) with the index of the best matching position of the i:th subsequence and the i:th sample.

Warning

Passing a callable to the metric parameter has a significant performance implication.

wildboar.distance.paired_subsequence_match(y, x, threshold=None, *, dim=0, metric='euclidean', metric_params=None, scale=False, max_matches=None, return_distance=False, n_jobs=None)[source]#

Find matching subsequnces.

Find the positions where the distance is less than the threshold between the i:th subsequences and time series.

  • If a threshold is given, the default behaviour is to return all matching indices in the order of occurrence

  • If no threshold is given, the default behaviour is to return the top 10 matching indicies ordered by distance

  • If both threshold and max_matches are given, the top matches are returned ordered by distance and time series.

Parameters:
ylist or ndarray of shape (n_samples, n_timestep)

Input time series.

  • if list, a list of array-like of shape (n_timestep, ) with length n_samples.

xndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)

The input data.

thresholdfloat, optional

The distance threshold used to consider a subsequence matching. If no threshold is selected, max_matches defaults to 10.

dimint, optional

The dim to search for shapelets.

metricstr or callable, optional

The distance metric

See _SUBSEQUENCE_METRICS.keys() for a list of supported metrics.

metric_paramsdict, optional

Parameters to the metric.

Read more about the parameters in the User guide.

scalebool, optional

If True, scale the subsequences before distance computation.

Added in version 1.3.

max_matchesint, optional

Return the top max_matches matches below threshold.

  • If a threshold is given, the default behaviour is to return all matching indices in the order of occurrence .

  • If no threshold is given, the default behaviour is to return the top 10 matching indicies ordered by distance

  • If both threshold and max_matches are given the top matches are returned ordered by distance.

return_distancebool, optional

If True, return the distance of the match.

n_jobsint, optional

The number of parallel jobs to run. Ignored.

Returns:
indiciesndarray of shape (n_samples, )

The start index of matching subsequences.

distancendarray of shape (n_samples, ), optional

The distances of matching subsequences.

Warning

Passing a callable to the metric parameter has a significant performance implication.

wildboar.distance.pairwise_distance(x, y=None, *, dim='warn', metric='euclidean', metric_params=None, n_jobs=None)[source]#

Compute the distance between subsequences and time series.

Parameters:
xndarray of shape (n_timestep, ), (x_samples, n_timestep) or (x_samples, n_dims, n_timestep)

The input data.

yndarray of shape (n_timestep, ), (y_samples, n_timestep) or (y_samples, n_dims, n_timestep), optional

The input data.

dimint or {‘mean’, ‘full’}, optional

The dim to compute distance.

metricstr or callable, optional

The distance metric

See _METRICS.keys() for a list of supported metrics.

metric_paramsdict, optional

Parameters to the metric.

Read more about the parameters in the User guide.

n_jobsint, optional

The number of parallel jobs.

Returns:
float or ndarray

The distances. Return depends on input.

  • if x.ndim == 1 and y.ndim == 1, scalar.

  • if dim=”full”, array of shape (n_dims, x_samples, y_samples).

  • if dim=”full” and y is None, array of shape (n_dims, x_samples, x_samples).

  • if x.ndim > 1 and y is None, array of shape (x_samples, x_samples).

  • if x.ndim > 1 and y.ndim > 1, array of shape (x_samples, y_samples).

  • if x.ndim == 1 and y.ndim > 1, array of shape (y_samples, ).

  • if y.ndim == 1 and x.ndim > 1, array of shape (x_samples, ).

Warning

Passing a callable to the metric parameter has a significant performance implication.

wildboar.distance.pairwise_subsequence_distance(y, x, *, dim=0, metric='euclidean', metric_params=None, scale=False, return_index=False, n_jobs=None)[source]#

Minimum subsequence distance between subsequences and time series.

Parameters:
ylist or ndarray of shape (n_subsequences, n_timestep)

Input time series.

  • if list, a list of array-like of shape (n_timestep, ).

xndarray of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)

The input data.

dimint, optional

The dim to search for subsequence.

metricstr or callable, optional

The distance metric

See _SUBSEQUENCE_METRICS.keys() for a list of supported metrics.

metric_paramsdict, optional

Parameters to the metric.

Read more about the parameters in the User guide.

scalebool, optional

If True, scale the subsequences before distance computation.

Added in version 1.3.

return_indexbool, optional
  • if True return the index of the best match. If there are many equally good matches, the first match is returned.

n_jobsint, optional

The number of parallel jobs.

Returns:
distfloat, ndarray

The minumum distance. Return depends on input:

  • if len(y) > 1 and x.ndim > 1, return an array of shape (n_samples, n_subsequences).

  • if len(y) == 1, return an array of shape (n_samples, ).

  • if x.ndim == 1, return an array of shape (n_subsequences, ).

  • if x.ndim == 1 and len(y) == 1, return scalar.

indicesint, ndarray, optional

The start index of the minumum distance. Return dependes on input:

  • if len(y) > 1 and x.ndim > 1, return an array of shape (n_samples, n_subsequences).

  • if len(y) == 1, return an array of shape (n_samples, ).

  • if x.ndim == 1, return an array of shape (n_subsequences, ).

  • if x.ndim == 1 and len(y) == 1, return scalar.

Warning

Passing a callable to the metric parameter has a significant performance implication.

wildboar.distance.subsequence_match(y, x, threshold=None, *, dim=0, metric='euclidean', metric_params=None, scale=False, max_matches=None, exclude=None, return_distance=False, n_jobs=None)[source]#

Find matching subsequnces.

Find the positions where the distance is less than the threshold between the subsequence and all time series.

  • If a threshold is given, the default behaviour is to return all matching indices in the order of occurrence

  • If no threshold is given, the default behaviour is to return the top 10 matching indicies ordered by distance

  • If both threshold and max_matches are given, the top matches are returned ordered by distance.

Parameters:
yarray-like of shape (yn_timestep, )

The subsequence.

xndarray of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)

The input data.

threshold{“auto”}, float or callable, optional

The distance threshold used to consider a subsequence matching. If no threshold is selected, max_matches defaults to 10.

  • if float, return all matches closer than threshold

  • if callable, return all matches closer than the treshold computed by the threshold function, given all distances to the subsequence

  • if str, return all matches according to the named threshold.

dimint, optional

The dim to search for shapelets.

metricstr or callable, optional

The distance metric

See _SUBSEQUENCE_METRICS.keys() for a list of supported metrics.

metric_paramsdict, optional

Parameters to the metric.

Read more about the parameters in the User guide.

scalebool, optional

If True, scale the subsequences before distance computation.

Added in version 1.3.

max_matchesint, optional

Return the top max_matches matches below threshold.

excludefloat or int, optional

Exclude trivial matches in the vicinity of the match.

  • if float, the exclusion zone is computed as math.ceil(exclude * y.size)

  • if int, the exclusion zone is exact

A match is considered trivial if a match with lower distance is within exclude timesteps of another match with higher distance.

return_distancebool, optional
  • if True, return the distance of the match.

n_jobsint, optional

The number of parallel jobs to run.

Returns:
indiciesndarray of shape (n_samples, ) or (n_matches, )

The start index of matching subsequences. Returns a single array of n_matches if x.ndim == 1. If no matches are found for a sample, the array element is None.

distancendarray of shape (n_samples, ), optional

The distances of matching subsequences. Returns a single array of n_matches if x.ndim == 1. If no matches are found for a sample, the array element is None.

Warning

Passing a callable to the metric parameter has a significant performance implication.