wildboar.distance#
Fast distance computations.
The wildboar.distance module includes functions for computing
paired and pairwise distances between time series and between time series and
subsequences.
See the User Guide for more details and examples.
Classes#
KMeans clustering with support for DTW and weighted DTW.  | 
|
KMedoid algorithm.  | 
|
Classifier implementing k-nearest neighbors.  | 
|
Multidimensional scaling.  | 
Functions#
  | 
Find the indicies of the samples with the lowest distance in Y.  | 
  | 
Compute the k:th closest subsequences.  | 
  | 
Compute the distance profile.  | 
  | 
Compute the matrix profile of every subsequence in X.  | 
  | 
Compute the distance between the i:th time series.  | 
  | 
Compute the matrix profile.  | 
  | 
Minimum subsequence distance between the i:th subsequence and time series.  | 
  | 
Find matching subsequnces.  | 
  | 
Compute the distance between every time series in X and Y.  | 
  | 
Minimum subsequence distance between subsequences and time series.  | 
  | 
Find matching subsequnces.  | 
- class wildboar.distance.KMeans(n_clusters=8, *, metric='euclidean', r=1.0, g=None, init='random', n_init='auto', max_iter=300, tol=0.001, verbose=0, random_state=None)[source]#
 KMeans clustering with support for DTW and weighted DTW.
- Parameters:
 - n_clustersint, optional
 The number of clusters.
- metric{“euclidean”, “dtw”}, optional
 The metric.
- rfloat, optional
 The size of the warping window.
- gfloat, optional
 SoftDTW penalty. If None, traditional DTW is used.
- init{“random”}, optional
 Cluster initialization. If “random”, randomly initialize n_clusters.
- n_init“auto” or int, optional
 Number times the algorithm is re-initialized with new centroids.
- max_iterint, optional
 The maximum number of iterations for a single run of the algorithm.
- tolfloat, optional
 Relative tolerance to declare convergence of two consecutive iterations.
- verboseint, optional
 Print diagnostic messages during convergence.
- random_stateRandomState or int, optional
 Determines random number generation for centroid initialization and barycentering when fitting with metric=”dtw”.
- Attributes:
 - n_iter_int
 The number of iterations before convergence.
- cluster_centers_ndarray of shape (n_clusters, n_timestep)
 The cluster centers.
- labels_ndarray of shape (n_samples, )
 The cluster assignment.
- fit(x, y=None)[source]#
 Compute the kmeans-clustering.
- Parameters:
 - xunivariate time-series
 The input samples.
- yIgnored, optional
 Not used.
- Returns:
 - object
 Fitted estimator.
- fit_predict(X, y=None, **kwargs)[source]#
 Perform clustering on X and returns cluster labels.
- Parameters:
 - Xarray-like of shape (n_samples, n_features)
 Input data.
- yIgnored
 Not used, present for API consistency by convention.
- **kwargsdict
 Arguments to be passed to
fit.Added in version 1.4.
- Returns:
 - labelsndarray of shape (n_samples,), dtype=np.int64
 Cluster labels.
- fit_transform(X, y=None, **fit_params)[source]#
 Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
 - Xarray-like of shape (n_samples, n_features)
 Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
 Target values (None for unsupervised transformations).
- **fit_paramsdict
 Additional fit parameters.
- Returns:
 - X_newndarray array of shape (n_samples, n_features_new)
 Transformed array.
- get_metadata_routing()[source]#
 Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
 - routingMetadataRequest
 A
MetadataRequestencapsulating routing information.
- get_params(deep=True)[source]#
 Get parameters for this estimator.
- Parameters:
 - deepbool, default=True
 If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
 - paramsdict
 Parameter names mapped to their values.
- predict(x)[source]#
 Predict the closest cluster for each sample.
- Parameters:
 - xunivariate time-series
 The input samples.
- Returns:
 - ndarray of shape (n_samples, )
 Index of the cluster each sample belongs to.
- set_output(*, transform=None)[source]#
 Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
 - transform{“default”, “pandas”, “polars”}, default=None
 Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
 - selfestimator instance
 Estimator instance.
- set_params(**params)[source]#
 Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
 - **paramsdict
 Estimator parameters.
- Returns:
 - selfestimator instance
 Estimator instance.
- class wildboar.distance.KMedoids(n_clusters=8, metric='euclidean', metric_params=None, init='random', n_init='auto', algorithm='fast', max_iter=30, tol=0.0001, verbose=0, n_jobs=None, random_state=None)[source]#
 KMedoid algorithm.
- Parameters:
 - n_clustersint, optional
 The number of clusters.
- metricstr, optional
 The metric.
- metric_paramsdict, optional
 The metric parameters. Read more about the metrics and their parameters in the User guide.
- init{“auto”, “random”, “min”}, optional
 Cluster initialization. If “random”, randomly initialize n_clusters, if “min” select the samples with the smallest distance to the other samples.
- n_init“auto” or int, optional
 Number times the algorithm is re-initialized with new centroids.
- algorithm{“fast”, “pam”}, optional
 The algorithm for updating cluster assignments. If “pam”, use the Partitioning Around Medoids algorithm.
- max_iterint, optional
 The maximum number of iterations for a single run of the algorithm.
- tolfloat, optional
 Relative tolerance to declare convergence of two consecutive iterations.
- verboseint, optional
 Print diagnostic messages during convergence.
- n_jobsint, optional
 The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.
- random_stateRandomState or int, optional
 Determines random number generation for centroid initialization and barycentering when fitting with metric=”dtw”.
- Attributes:
 - n_iter_int
 The number of iterations before convergence.
- cluster_centers_ndarray of shape (n_clusters, n_timestep)
 The cluster centers.
- medoid_indices_ndarray of shape (n_clusters, )
 The index of the medoid in the input samples.
- labels_ndarray of shape (n_samples, )
 The cluster assignment.
- fit(x, y=None)[source]#
 Compute the kmedoids-clustering.
- Parameters:
 - xunivariate time-series
 The input samples.
- yIgnored, optional
 Not used.
- Returns:
 - object
 Fitted estimator.
- fit_predict(X, y=None, **kwargs)[source]#
 Perform clustering on X and returns cluster labels.
- Parameters:
 - Xarray-like of shape (n_samples, n_features)
 Input data.
- yIgnored
 Not used, present for API consistency by convention.
- **kwargsdict
 Arguments to be passed to
fit.Added in version 1.4.
- Returns:
 - labelsndarray of shape (n_samples,), dtype=np.int64
 Cluster labels.
- fit_transform(X, y=None, **fit_params)[source]#
 Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
 - Xarray-like of shape (n_samples, n_features)
 Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
 Target values (None for unsupervised transformations).
- **fit_paramsdict
 Additional fit parameters.
- Returns:
 - X_newndarray array of shape (n_samples, n_features_new)
 Transformed array.
- get_metadata_routing()[source]#
 Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
 - routingMetadataRequest
 A
MetadataRequestencapsulating routing information.
- get_params(deep=True)[source]#
 Get parameters for this estimator.
- Parameters:
 - deepbool, default=True
 If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
 - paramsdict
 Parameter names mapped to their values.
- predict(x)[source]#
 Predict the closest cluster for each sample.
- Parameters:
 - xunivariate time-series
 The input samples.
- Returns:
 - ndarray of shape (n_samples, )
 Index of the cluster each sample belongs to.
- set_output(*, transform=None)[source]#
 Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
 - transform{“default”, “pandas”, “polars”}, default=None
 Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
 - selfestimator instance
 Estimator instance.
- set_params(**params)[source]#
 Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
 - **paramsdict
 Estimator parameters.
- Returns:
 - selfestimator instance
 Estimator instance.
- class wildboar.distance.KNeighborsClassifier(n_neighbors=5, *, metric='euclidean', metric_params=None, n_jobs=None)[source]#
 Classifier implementing k-nearest neighbors.
- Parameters:
 - n_neighborsint, optional
 The number of neighbors.
- metricstr, optional
 The distance metric.
- metric_paramsdict, optional
 Optional parameters to the distance metric.
Read more about the metrics and their parameters in the User guide.
- n_jobsint, optional
 The number of parallel jobs.
- Attributes:
 - classes_ndarray of shapel (n_classes, )
 Known class labels.
- fit(x, y)[source]#
 Fit the classifier to the training data.
- Parameters:
 - xunivariate time-series or multivaraite time-series
 The input samples.
- yarray-like of shape (n_samples, )
 The input labels.
- Returns:
 - KNeighborClassifier
 This instance.
- get_metadata_routing()[source]#
 Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
 - routingMetadataRequest
 A
MetadataRequestencapsulating routing information.
- get_params(deep=True)[source]#
 Get parameters for this estimator.
- Parameters:
 - deepbool, default=True
 If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
 - paramsdict
 Parameter names mapped to their values.
- predict(x)[source]#
 Compute the class label for the samples in x.
- Parameters:
 - xunivariate time-series or multivariate time-series
 The input samples.
- Returns:
 - ndarray of shape (n_samples, )
 The class label for each sample.
- predict_proba(x)[source]#
 Compute probability estimates for the samples in x.
- Parameters:
 - xunivariate time-series or multivariate time-series
 The input samples.
- Returns:
 - ndarray of shape (n_samples, len(self.classes_))
 The probability of each class for each sample.
- score(X, y, sample_weight=None)[source]#
 Return accuracy on provided data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
 - Xarray-like of shape (n_samples, n_features)
 Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
 True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
 Sample weights.
- Returns:
 - scorefloat
 Mean accuracy of
self.predict(X)w.r.t. y.
- set_params(**params)[source]#
 Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
 - **paramsdict
 Estimator parameters.
- Returns:
 - selfestimator instance
 Estimator instance.
- class wildboar.distance.MDS(n_components=2, *, metric=True, n_init=4, max_iter=300, verbose=0, eps=0.001, n_jobs=None, random_state=None, dissimilarity='euclidean', dissimilarity_params=None, normalized_stress='auto')[source]#
 Multidimensional scaling.
- Parameters:
 - n_componentsint, optional
 Number of dimensions in which to immerse the dissimilarities.
- metricbool, optional
 If True, perform metric MDS; otherwise, perform nonmetric MDS. When False (i.e. non-metric MDS), dissimilarities with 0 are considered as missing values.
- n_initint, optional
 Number of times the SMACOF algorithm will be run with different initializations. The final results will be the best output of the runs, determined by the run with the smallest final stress.
- max_iterint, optional
 Maximum number of iterations of the SMACOF algorithm for a single run.
- verboseint, optional
 Level of verbosity.
- epsfloat, optional
 Relative tolerance with respect to stress at which to declare convergence. The value of eps should be tuned separately depending on whether or not normalized_stress is being used.
- n_jobsint, optional
 The number of jobs to use for the computation. If multiple initializations are used (
n_init), each run of the algorithm is computed in parallel.- random_stateint, RandomState instance or None, optional
 Determines the random number generator used to initialize the centers. Pass an int for reproducible results across multiple function calls.
- dissimilaritystr, optional
 The dissimilarity measure.
See _METRICS.keys() for a list of supported metrics.
- dissimilarity_paramsdict, optional
 Parameters to the dissimilarity measue.
Read more about the parameters in the User guide.
- normalized_stressbool or “auto”, optional
 Whether use and return normed stress value (Stress-1) instead of raw stress calculated by default. Only supported in non-metric MDS.
Notes
This implementation is a convenience wrapper around
sklearn.manifold.MDSto when using Wildboar metrics.- get_metadata_routing()[source]#
 Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
 - routingMetadataRequest
 A
MetadataRequestencapsulating routing information.
- get_params(deep=True)[source]#
 Get parameters for this estimator.
- Parameters:
 - deepbool, default=True
 If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
 - paramsdict
 Parameter names mapped to their values.
- set_params(**params)[source]#
 Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
 - **paramsdict
 Estimator parameters.
- Returns:
 - selfestimator instance
 Estimator instance.
- wildboar.distance.argmin_distance(x, y=None, *, dim=0, k=1, metric='euclidean', metric_params=None, lower_bound=None, sorted=False, return_distance=False, n_jobs=None)[source]#
 Find the indicies of the samples with the lowest distance in Y.
- Parameters:
 - xarray-like of shape (x_samples, x_timestep)
 The needle.
- yarray-like of shape (y_samples, y_timestep), optional
 The haystack.
- dimint, optional
 The dimension where the distance is computed.
- kint, optional
 The number of closest samples.
- metricstr, optional
 The distance metric. See
_METRICS.keys()for a list of supported metrics.- metric_paramsdict, optional
 Parameters to the metric. Read more about the parameters in the User guide.
- lower_boundarray-like of shape (x_samples, y_samples), optional
 Lower bound on the distance metric. Read more about supported lower bounds in the User Guide.
- sortedbool, optional
 Sort the indicies from smallest to largest distance.
- return_distancebool, optional
 Return the distance for the k samples.
- n_jobsint, optional
 The number of parallel jobs.
- Returns:
 - indicesndarray of shape (n_samples, k)
 The indices of the samples in Y with the smallest distance.
- distancendarray of shape (n_samples, k), optional
 The distance of the samples in Y with the smallest distance.
Warning
Passing a callable to the
metricparameter has a significant performance implication.Assigning an invalid value to the parameter
lower_boundwill yield an incorrect outcome.
Examples
>>> from wildoar.distance import argmin_distance >>> X = np.array([[1, 2, 3, 4], [10, 1, 2, 3]]) >>> Y = np.array([[1, 2, 11, 2], [2, 4, 6, 7], [10, 11, 2, 3]]) >>> argmin_distance(X, Y, k=2, return_distance=True) (array([[0, 1], [1, 2]]), array([[ 8.24621125, 4.79583152], [10.24695077, 10. ]]))
Using a lower bound:
>>> from wildboar.datasets import load_basic_motions >>> from wildboar.distance import argmin_distance >>> from wildboar.distance.lb import DtwKeoghLowerBound >>> X, y = load_basic_motions() >>> X = X.reshape(X.shape[0], -1) >>> lbkeogh = DtwKeoghLowerBound(r=1.0).fit(X[30:]) >>> argmin_distance( ... X[:30], ... X[30:], ... metric="dtw", ... metric_params={"r": 1.0}, ... lower_bound=lbkeogh.transform(X[:30]) ... )
- wildboar.distance.argmin_subsequence_distance(y, x, *, dim=0, k=1, metric='euclidean', metric_params=None, scale=False, return_distance=False, n_jobs=None)[source]#
 Compute the k:th closest subsequences.
For the i:th shapelet and the i:th sample return the index and, optionally, the distance of the k closest matches.
- Parameters:
 - yarray-like of shape (n_samples, m_timestep) or list of 1d-arrays
 The subsequences.
- xarray-like of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
 The samples. If x.ndim == 1, it will be broadcast have the same number of samples that y.
- dimint, optional
 The dimension in x to find subsequences in.
- kint, optional
 The of closest subsequences to find.
- metricstr, optional
 The metric.
See
_SUBSEQUENCE_METRICS.keys()for a list of supported metrics.- metric_paramsdict, optional
 Parameters to the metric.
Read more about the parameters in the User guide.
- scalebool, optional
 If True, scale the subsequences before distance computation.
- return_distancebool, optional
 Return the distance for the k closest subsequences.
- n_jobsint, optional
 The number of parallel jobs.
- Returns:
 - indicesndarray of shape (n_samples, k)
 The indices of the k closest subsequences.
- distancendarray of shape (n_samples, k), optional
 The distance of the k closest subsequences.
Warning
Passing a callable to the metric parameter has a significant performance implication.
Examples
>>> import numpy as np >>> from wildboar.datasets import load_dataset >>> from wildboar.distance import argmin_subsequence_distance >>> s = np.lib.stride_tricks.sliding_window_view(X[0], window_shape=10) >>> x = np.broadcast_to(X[0], shape=(s.shape[0], X.shape[1])) >>> argmin_subsequence_distance(s, x, k=4)
- wildboar.distance.distance_profile(y, x, *, dilation=1, padding=0, dim=0, metric='mass', metric_params=None, scale=False, n_jobs=None)[source]#
 Compute the distance profile.
The distance profile corresponds to the distance of the subsequences in y for every time point of the samples in x.
- Parameters:
 - yarray-like of shape (m_timestep, ) or (n_samples, m_timestep)
 The subsequences. if y.ndim is 1, we will broacast y to have the same number of samples as x.
- xndarray of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
 The samples. If x.ndim is 1, we will broadcast x to have the same number of samples as y.
- dilationint, optional
 The dilation, i.e., the spacing between points in the subsequences.
- paddingint or {“same”}, optional
 The amount of padding applied to the input time series. If “same”, the output size is the same as the input size.
- dimint, optional
 The dim to search for shapelets.
- metricstr or callable, optional
 The distance metric
See
_SUBSEQUENCE_METRICS.keys()for a list of supported metrics.- metric_paramsdict, optional
 Parameters to the metric.
Read more about the parameters in the User guide.
- scalebool, optional
 If True, scale the subsequences before distance computation.
- n_jobsint, optional
 The number of parallel jobs to run.
- Returns:
 - ndarray of shape (n_samples, output_size) or (output_size, )
 The distance profile. output_size is given by: n_timestep + 2 * padding - (n_timestep - 1) * dilation + 1) + 1. If both x and y contains a single subsequence and a single sample, the output is squeezed.
Warning
Passing a callable to the metric parameter has a significant performance implication.
Examples
>>> from wildboar.datasets import load_dataset >>> from wildboar.distance import distance_profile >>> X, _ = load_dataset("ECG200") >>> distance_profile(X[0], X[1:].reshape(-1)) array([14.00120332, 14.41943788, 14.81597243, ..., 4.75219094, 5.72681005, 6.70155561])
>>> distance_profile( ... X[0, 0:9], X[1:5], metric="dtw", dilation=2, padding="same" ... )[0, :10] array([8.01881424, 7.15083281, 7.48856368, 6.83139294, 6.75595579, 6.30073636, 6.65346307, 6.27919601, 6.25666948, 6.0961576 ])
- wildboar.distance.matrix_profile(X, Y=None, *, dim=0, window=5, exclude=None, kind='warn', return_index=False, n_jobs=None)[source]#
 Compute the matrix profile of every subsequence in X.
If Y is given compute the metrix profile of every subsequence in X finding the minimum distance in any time series in Y; othervise finding the minimum distance in any time series in X. The former corresponds to a self-join and the latter to an AB join.
The output approximately corresponds to that of
matrix_profilewhere X.flatten() but without computing the distance where two time series overlap. The outputs exactly correspond whenX.shape[0] == 1.- Parameters:
 - Xarray-like of shape (x_samples, x_timestep)
 The time series for which the matrix profile is computed.
- Yarray-like of shape (y_samples, y_timestep), optional
 The time series used to annotate X. If None, X is used to annotate.
- dimint, optional
 The dimension.
- windowint or float, optional
 The window size.
If float, the window size is a fraction of x_timestep.
If int, the window size is exact.
- excludeint or float, optional
 The exclusion zone.
If float, the exclusion zone is a fraction of window.
If int, the exclusion zone is exact.
If None, the exclusion zone is determined automatically. If Y is None, (self-join) the value is 0.2, otherwise (AB-join) the value is 0.0.
- kind{“paired”, “default”}, optional
 The kind of matrix profile.
if “paired”, compute the matrix profile for each time series in X optionally annotated with each time series in Y.
if “default”, compute the matrix profile for every subsequence in every time series in X optional annotated with every time series in Y.
- return_indexbool, optional
 Return the matrix profile index.
- n_jobsint, optional
 The number of parallel jobs.
- Returns:
 - mpndarray of shape (x_samples, profile_size)
 The matrix profile.
- (mpi_sample, mpi_start)ndarray of shape (x_samples, profile_size), optional
 The matrix profile index sample and start positions. Returned if return_index=True.
- wildboar.distance.paired_distance(x, y, *, dim='mean', metric='euclidean', metric_params=None, n_jobs=None)[source]#
 Compute the distance between the i:th time series.
- Parameters:
 - xndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
 The input data.
- yndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
 The input data. y will be broadcasted to the shape of x.
- dimint or {‘mean’, ‘full’}, optional
 The dim to compute distance.
- metricstr or callable, optional
 The distance metric
See
_METRICS.keys()for a list of supported metrics.- metric_paramsdict, optional
 Parameters to the metric.
Read more about the parameters in the User guide.
- n_jobsint, optional
 The number of parallel jobs.
- Returns:
 - ndarray
 The distances. Return depends on input:
if x.ndim == 1, return scalar.
if dim=’full’, return ndarray of shape (n_dims, n_samples).
if x.ndim > 1, return an ndarray of shape (n_samples, ).
Warning
Passing a callable to the metric parameter has a significant performance implication.
- wildboar.distance.paired_matrix_profile(X, Y=None, *, window=5, dim=0, exclude=None, n_jobs=-1, return_index=False)[source]#
 Compute the matrix profile.
If only X is given, compute the similarity self-join of every subsequence in X of size
windowto its nearest neighbor in X excluding trivial matches according to the exclude parameter.If both X and Y are given, compute the similarity join of every subsequenec in X of size window to its nearest neighbor in Y excluding matches according to the exclude parameter.
- Parameters:
 - Xarray-like of shape (n_timestep, ), (n_samples, x_timestep) or (n_samples, n_dim, x_timestep)
 The first time series.
- Yarray-like of shape (n_timestep, ), (n_samples, y_timestep) or (n_samples, n_dim, y_timestep), optional
 The optional second time series. Y is broadcast to the shape of X if possible.
- windowint or float, optional
 The subsequence size, by default 5
if float, a fraction of y_timestep
if int, the exact subsequence size.
- dimint, optional
 The dim to compute the matrix profile for, by default 0.
- excludeint or float, optional
 The size of the exclusion zone. The default exclusion zone is 0.2 for similarity self-join and 0.0 for similarity join.
if float, expressed as a fraction of the windows size.
if int, exact size (0 >= exclude < window).
- n_jobsint, optional
 The number of jobs to use when computing the profile.
- return_indexbool, optional
 Return the matrix profile index.
- Returns:
 - mpndarray of shape (profile_size, ) or (n_samples, profile_size)
 The matrix profile.
- mpindarray of shape (profile_size, ) or (n_samples, profile_size), optional
 The matrix profile index.
Notes
The profile_size is
X.shape[-1] - window + 1.References
- Yeh, C. C. M. et al. (2016).
 Matrix profile I: All pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In 2016 IEEE 16th international conference on data mining (ICDM)
- wildboar.distance.paired_subsequence_distance(y, x, *, dim=0, metric='euclidean', metric_params=None, scale=False, return_index=False, n_jobs=None)[source]#
 Minimum subsequence distance between the i:th subsequence and time series.
- Parameters:
 - ylist or ndarray of shape (n_samples, m_timestep)
 Input time series.
if list, a list of array-like of shape (m_timestep, ).
- xndarray of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
 The input data.
- dimint, optional
 The dim to search for shapelets.
- metricstr or callable, optional
 The distance metric
See
_SUBSEQUENCE_METRICS.keys()for a list of supported metrics.- metric_paramsdict, optional
 Parameters to the metric.
Read more about the parameters in the User guide.
- scalebool, optional
 If True, scale the subsequences before distance computation.
Added in version 1.3.
- return_indexbool, optional
 if True return the index of the best match. If there are many equally good matches, the first match is returned.
- n_jobsint, optional
 The number of parallel jobs to run.
- Returns:
 - distfloat, ndarray
 An array of shape (n_samples, ) with the minumum distance between the i:th subsequence and the i:th sample.
- indicesint, ndarray, optional
 An array of shape (n_samples, ) with the index of the best matching position of the i:th subsequence and the i:th sample.
Warning
Passing a callable to the metric parameter has a significant performance implication.
- wildboar.distance.paired_subsequence_match(y, x, threshold=None, *, dim=0, metric='euclidean', metric_params=None, scale=False, max_matches=None, return_distance=False, n_jobs=None)[source]#
 Find matching subsequnces.
Find the positions where the distance is less than the threshold between the i:th subsequences and time series.
If a threshold is given, the default behaviour is to return all matching indices in the order of occurrence
If no threshold is given, the default behaviour is to return the top 10 matching indicies ordered by distance
If both threshold and max_matches are given, the top matches are returned ordered by distance and time series.
- Parameters:
 - ylist or ndarray of shape (n_samples, n_timestep)
 Input time series.
if list, a list of array-like of shape (n_timestep, ) with length n_samples.
- xndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
 The input data.
- thresholdfloat, optional
 The distance threshold used to consider a subsequence matching. If no threshold is selected, max_matches defaults to 10.
- dimint, optional
 The dim to search for shapelets.
- metricstr or callable, optional
 The distance metric
See
_SUBSEQUENCE_METRICS.keys()for a list of supported metrics.- metric_paramsdict, optional
 Parameters to the metric.
Read more about the parameters in the User guide.
- scalebool, optional
 If True, scale the subsequences before distance computation.
Added in version 1.3.
- max_matchesint, optional
 Return the top max_matches matches below threshold.
If a threshold is given, the default behaviour is to return all matching indices in the order of occurrence .
If no threshold is given, the default behaviour is to return the top 10 matching indicies ordered by distance
If both threshold and max_matches are given the top matches are returned ordered by distance.
- return_distancebool, optional
 If True, return the distance of the match.
- n_jobsint, optional
 The number of parallel jobs to run. Ignored.
- Returns:
 - indiciesndarray of shape (n_samples, )
 The start index of matching subsequences.
- distancendarray of shape (n_samples, ), optional
 The distances of matching subsequences.
Warning
Passing a callable to the metric parameter has a significant performance implication.
- wildboar.distance.pairwise_distance(x, y=None, *, dim='mean', metric='euclidean', metric_params=None, n_jobs=None)[source]#
 Compute the distance between every time series in X and Y.
- Parameters:
 - xndarray of shape (n_timestep, ), (x_samples, n_timestep) or (x_samples, n_dims, n_timestep)
 The input data.
- yndarray of shape (n_timestep, ), (y_samples, n_timestep) or (y_samples, n_dims, n_timestep), optional
 The input data.
- dimint or {‘mean’, ‘full’}, optional
 The dim to compute distance.
- metricstr or callable, optional
 The distance metric
See
_METRICS.keys()for a list of supported metrics.- metric_paramsdict, optional
 Parameters to the metric.
Read more about the parameters in the User guide.
- n_jobsint, optional
 The number of parallel jobs.
- Returns:
 - float or ndarray
 The distances. Return depends on input.
if x.ndim == 1 and y.ndim == 1, scalar.
if dim=”full”, array of shape (n_dims, x_samples, y_samples).
if dim=”full” and y is None, array of shape (n_dims, x_samples, x_samples).
if x.ndim > 1 and y is None, array of shape (x_samples, x_samples).
if x.ndim > 1 and y.ndim > 1, array of shape (x_samples, y_samples).
if x.ndim == 1 and y.ndim > 1, array of shape (y_samples, ).
if y.ndim == 1 and x.ndim > 1, array of shape (x_samples, ).
Warning
Passing a callable to the metric parameter has a significant performance implication.
- wildboar.distance.pairwise_subsequence_distance(y, x, *, dim=0, metric='euclidean', metric_params=None, scale=False, return_index=False, n_jobs=None)[source]#
 Minimum subsequence distance between subsequences and time series.
- Parameters:
 - ylist or ndarray of shape (n_subsequences, n_timestep)
 Input time series.
if list, a list of array-like of shape (n_timestep, ).
- xndarray of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
 The input data.
- dimint, optional
 The dim to search for subsequence.
- metricstr or callable, optional
 The distance metric
See
_SUBSEQUENCE_METRICS.keys()for a list of supported metrics.- metric_paramsdict, optional
 Parameters to the metric.
Read more about the parameters in the User guide.
- scalebool, optional
 If True, scale the subsequences before distance computation.
Added in version 1.3.
- return_indexbool, optional
 if True return the index of the best match. If there are many equally good matches, the first match is returned.
- n_jobsint, optional
 The number of parallel jobs.
- Returns:
 - distfloat, ndarray
 The minumum distance. Return depends on input:
if len(y) > 1 and x.ndim > 1, return an array of shape (n_samples, n_subsequences).
if len(y) == 1, return an array of shape (n_samples, ).
if x.ndim == 1, return an array of shape (n_subsequences, ).
if x.ndim == 1 and len(y) == 1, return scalar.
- indicesint, ndarray, optional
 The start index of the minumum distance. Return dependes on input:
if len(y) > 1 and x.ndim > 1, return an array of shape (n_samples, n_subsequences).
if len(y) == 1, return an array of shape (n_samples, ).
if x.ndim == 1, return an array of shape (n_subsequences, ).
if x.ndim == 1 and len(y) == 1, return scalar.
Warning
Passing a callable to the metric parameter has a significant performance implication.
- wildboar.distance.subsequence_match(y, x, threshold=None, *, dim=0, metric='euclidean', metric_params=None, scale=False, max_matches=None, exclude=None, return_distance=False, n_jobs=None)[source]#
 Find matching subsequnces.
Find the positions where the distance is less than the threshold between the subsequence and all time series.
If a threshold is given, the default behaviour is to return all matching indices in the order of occurrence
If no threshold is given, the default behaviour is to return the top 10 matching indicies ordered by distance
If both threshold and max_matches are given, the top matches are returned ordered by distance.
- Parameters:
 - yarray-like of shape (yn_timestep, )
 The subsequence.
- xndarray of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
 The input data.
- threshold{“auto”}, float or callable, optional
 The distance threshold used to consider a subsequence matching. If no threshold is selected, max_matches defaults to 10.
if float, return all matches closer than threshold
if callable, return all matches closer than the treshold computed by the threshold function, given all distances to the subsequence
if str, return all matches according to the named threshold.
- dimint, optional
 The dim to search for shapelets.
- metricstr or callable, optional
 The distance metric
See
_SUBSEQUENCE_METRICS.keys()for a list of supported metrics.- metric_paramsdict, optional
 Parameters to the metric.
Read more about the parameters in the User guide.
- scalebool, optional
 If True, scale the subsequences before distance computation.
Added in version 1.3.
- max_matchesint, optional
 Return the top max_matches matches below threshold.
- excludefloat or int, optional
 Exclude trivial matches in the vicinity of the match.
if float, the exclusion zone is computed as
math.ceil(exclude * y.size)if int, the exclusion zone is exact
A match is considered trivial if a match with lower distance is within exclude timesteps of another match with higher distance.
- return_distancebool, optional
 if True, return the distance of the match.
- n_jobsint, optional
 The number of parallel jobs to run.
- Returns:
 - indiciesndarray of shape (n_samples, ) or (n_matches, )
 The start index of matching subsequences. Returns a single array of n_matches if x.ndim == 1. If no matches are found for a sample, the array element is None.
- distancendarray of shape (n_samples, ), optional
 The distances of matching subsequences. Returns a single array of n_matches if x.ndim == 1. If no matches are found for a sample, the array element is None.
Warning
Passing a callable to the metric parameter has a significant performance implication.