wildboar.distance
#
Fast distance computations.
The wildboar.distance
module includes functions for computing
paired and pairwise distances between time series and between time series and
subsequences.
See the User Guide for more details and examples.
Submodules#
Package Contents#
Classes#
KMeans clustering with support for DTW and weighted DTW. |
|
KMedoid algorithm. |
|
Classifier implementing k-nearest neighbors. |
|
Multidimensional scaling. |
Functions#
|
Find the indicies of the samples with the lowest distance in Y. |
|
Compute the k:th closest subsequences. |
|
Compute the distance profile. |
|
Compute the matrix profile. |
|
Compute the distance between the i:th time series. |
|
Minimum subsequence distance between the i:th subsequence and time series. |
|
Find matching subsequnces. |
|
Compute the distance between subsequences and time series. |
|
Minimum subsequence distance between subsequences and time series. |
|
Find matching subsequnces. |
- class wildboar.distance.KMeans(n_clusters=8, *, metric='euclidean', r=1.0, g=None, init='random', n_init='auto', max_iter=300, tol=0.001, verbose=0, random_state=None)[source]#
KMeans clustering with support for DTW and weighted DTW.
- Parameters:
- n_clustersint, optional
The number of clusters.
- metric{“euclidean”, “dtw”}, optional
The metric.
- rfloat, optional
The size of the warping window.
- gfloat, optional
SoftDTW penalty. If None, traditional DTW is used.
- init{“random”}, optional
Cluster initialization. If “random”, randomly initialize n_clusters.
- n_init“auto” or int, optional
Number times the algorithm is re-initialized with new centroids.
- max_iterint, optional
The maximum number of iterations for a single run of the algorithm.
- tolfloat, optional
Relative tolerance to declare convergence of two consecutive iterations.
- verboseint, optional
Print diagnostic messages during convergence.
- random_stateRandomState or int, optional
Determines random number generation for centroid initialization and barycentering when fitting with metric=”dtw”.
- Attributes:
- n_iter_int
The number of iterations before convergence.
- cluster_centers_ndarray of shape (n_clusters, n_timestep)
The cluster centers.
- labels_ndarray of shape (n_samples, )
The cluster assignment.
- fit(x, y=None)[source]#
Compute the kmeans-clustering.
- Parameters:
- xunivariate time-series
The input samples.
- yIgnored, optional
Not used.
- Returns:
- object
Fitted estimator.
- fit_predict(X, y=None, **kwargs)[source]#
Perform clustering on X and returns cluster labels.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input data.
- yIgnored
Not used, present for API consistency by convention.
- **kwargsdict
Arguments to be passed to
fit
.Added in version 1.4.
- Returns:
- labelsndarray of shape (n_samples,), dtype=np.int64
Cluster labels.
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(x)[source]#
Predict the closest cluster for each sample.
- Parameters:
- xunivariate time-series
The input samples.
- Returns:
- ndarray of shape (n_samples, )
Index of the cluster each sample belongs to.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.distance.KMedoids(n_clusters=8, metric='euclidean', metric_params=None, init='random', n_init='auto', algorithm='fast', max_iter=30, tol=0.0001, verbose=0, n_jobs=None, random_state=None)[source]#
KMedoid algorithm.
- Parameters:
- n_clustersint, optional
The number of clusters.
- metricstr, optional
The metric.
- metric_paramsdict, optional
The metric parameters. Read more about the metrics and their parameters in the User guide.
- init{“auto”, “random”, “min”}, optional
Cluster initialization. If “random”, randomly initialize n_clusters, if “min” select the samples with the smallest distance to the other samples.
- n_init“auto” or int, optional
Number times the algorithm is re-initialized with new centroids.
- algorithm{“fast”, “pam”}, optional
The algorithm for updating cluster assignments. If “pam”, use the Partitioning Around Medoids algorithm.
- max_iterint, optional
The maximum number of iterations for a single run of the algorithm.
- tolfloat, optional
Relative tolerance to declare convergence of two consecutive iterations.
- verboseint, optional
Print diagnostic messages during convergence.
- n_jobsint, optional
The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.
- random_stateRandomState or int, optional
Determines random number generation for centroid initialization and barycentering when fitting with metric=”dtw”.
- Attributes:
- n_iter_int
The number of iterations before convergence.
- cluster_centers_ndarray of shape (n_clusters, n_timestep)
The cluster centers.
- medoid_indices_ndarray of shape (n_clusters, )
The index of the medoid in the input samples.
- labels_ndarray of shape (n_samples, )
The cluster assignment.
- fit(x, y=None)[source]#
Compute the kmedoids-clustering.
- Parameters:
- xunivariate time-series
The input samples.
- yIgnored, optional
Not used.
- Returns:
- object
Fitted estimator.
- fit_predict(X, y=None, **kwargs)[source]#
Perform clustering on X and returns cluster labels.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input data.
- yIgnored
Not used, present for API consistency by convention.
- **kwargsdict
Arguments to be passed to
fit
.Added in version 1.4.
- Returns:
- labelsndarray of shape (n_samples,), dtype=np.int64
Cluster labels.
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(x)[source]#
Predict the closest cluster for each sample.
- Parameters:
- xunivariate time-series
The input samples.
- Returns:
- ndarray of shape (n_samples, )
Index of the cluster each sample belongs to.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.distance.KNeighborsClassifier(n_neighbors=5, *, metric='euclidean', metric_params=None, n_jobs=None)[source]#
Classifier implementing k-nearest neighbors.
- Parameters:
- n_neighborsint, optional
The number of neighbors.
- metricstr, optional
The distance metric.
- metric_paramsdict, optional
Optional parameters to the distance metric.
Read more about the metrics and their parameters in the User guide.
- n_jobsint, optional
The number of parallel jobs.
- Attributes:
- classes_ndarray of shapel (n_classes, )
Known class labels.
- fit(x, y)[source]#
Fit the classifier to the training data.
- Parameters:
- xunivariate time-series or multivaraite time-series
The input samples.
- yarray-like of shape (n_samples, )
The input labels.
- Returns:
- KNeighborClassifier
This instance.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(x)[source]#
Compute the class label for the samples in x.
- Parameters:
- xunivariate time-series or multivariate time-series
The input samples.
- Returns:
- ndarray of shape (n_samples, )
The class label for each sample.
- predict_proba(x)[source]#
Compute probability estimates for the samples in x.
- Parameters:
- xunivariate time-series or multivariate time-series
The input samples.
- Returns:
- ndarray of shape (n_samples, len(self.classes_))
The probability of each class for each sample.
- score(X, y, sample_weight=None)[source]#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.distance.MDS(n_components=2, *, metric=True, n_init=4, max_iter=300, verbose=0, eps=0.001, n_jobs=None, random_state=None, dissimilarity='euclidean', dissimilarity_params=None, normalized_stress='warn')[source]#
Multidimensional scaling.
- Parameters:
- n_componentsint, optional
Number of dimensions in which to immerse the dissimilarities.
- metricbool, optional
If True, perform metric MDS; otherwise, perform nonmetric MDS. When False (i.e. non-metric MDS), dissimilarities with 0 are considered as missing values.
- n_initint, optional
Number of times the SMACOF algorithm will be run with different initializations. The final results will be the best output of the runs, determined by the run with the smallest final stress.
- max_iterint, optional
Maximum number of iterations of the SMACOF algorithm for a single run.
- verboseint, optional
Level of verbosity.
- epsfloat, optional
Relative tolerance with respect to stress at which to declare convergence. The value of eps should be tuned separately depending on whether or not normalized_stress is being used.
- n_jobsint, optional
The number of jobs to use for the computation. If multiple initializations are used (
n_init
), each run of the algorithm is computed in parallel.- random_stateint, RandomState instance or None, optional
Determines the random number generator used to initialize the centers. Pass an int for reproducible results across multiple function calls.
- dissimilaritystr, optional
The dissimilarity measure.
See _METRICS.keys() for a list of supported metrics.
- dissimilarity_paramsdict, optional
Parameters to the dissimilarity measue.
Read more about the parameters in the User guide.
- normalized_stressbool or “auto”, optional
Whether use and return normed stress value (Stress-1) instead of raw stress calculated by default. Only supported in non-metric MDS.
Notes
This implementation is a convenience wrapper around
sklearn.manifold.MDS
to when using Wildboar metrics.- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- wildboar.distance.argmin_distance(x, y=None, *, dim=0, k=1, metric='euclidean', metric_params=None, sorted=False, return_distance=False, n_jobs=None)[source]#
Find the indicies of the samples with the lowest distance in Y.
- Parameters:
- xunivariate time-series or multivariate time-series
The needle.
- yunivariate time-series or multivariate time-series, optional
The haystack.
- dimint, optional
The dimension where the distance is computed.
- kint, optional
The number of closest samples.
- metricstr, optional
The distance metric
See
_METRICS.keys()
for a list of supported metrics.- metric_paramsdict, optional
Parameters to the metric.
Read more about the parameters in the User guide.
- sortedbool, optional
Sort the indicies from smallest to largest distance.
- return_distancebool, optional
Return the distance for the k samples.
- n_jobsint, optional
The number of parallel jobs.
- Returns:
- indicesndarray of shape (n_samples, k)
The indices of the samples in Y with the smallest distance.
- distancendarray of shape (n_samples, k), optional
The distance of the samples in Y with the smallest distance.
Warning
Passing a callable to the metric parameter has a significant performance implication.
Examples
>>> from wildoar.distance import argmin_distance >>> X = np.array([[1, 2, 3, 4], [10, 1, 2, 3]]) >>> Y = np.array([[1, 2, 11, 2], [2, 4, 6, 7], [10, 11, 2, 3]]) >>> argmin_distance(X, Y, k=2, return_distance=True) (array([[0, 1], [1, 2]]), array([[ 8.24621125, 4.79583152], [10.24695077, 10. ]]))
- wildboar.distance.argmin_subsequence_distance(y, x, *, dim=0, k=1, metric='euclidean', metric_params=None, scale=False, return_distance=False, n_jobs=None)[source]#
Compute the k:th closest subsequences.
For the i:th shapelet and the i:th sample return the index and, optionally, the distance of the k closest matches.
- Parameters:
- yarray-like of shape (n_samples, m_timestep) or list of 1d-arrays
The subsequences.
- xarray-like of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The samples. If x.ndim == 1, it will be broadcast have the same number of samples that y.
- dimint, optional
The dimension in x to find subsequences in.
- kint, optional
The of closest subsequences to find.
- metricstr, optional
The metric.
See
_SUBSEQUENCE_METRICS.keys()
for a list of supported metrics.- metric_paramsdict, optional
Parameters to the metric.
Read more about the parameters in the User guide.
- scalebool, optional
If True, scale the subsequences before distance computation.
- return_distancebool, optional
Return the distance for the k closest subsequences.
- n_jobsint, optional
The number of parallel jobs.
- Returns:
- indicesndarray of shape (n_samples, k)
The indices of the k closest subsequences.
- distancendarray of shape (n_samples, k), optional
The distance of the k closest subsequences.
Warning
Passing a callable to the metric parameter has a significant performance implication.
Examples
>>> import numpy as np >>> from wildboar.datasets import load_dataset >>> from wildboar.distance import argmin_subsequence_distance >>> s = np.lib.stride_tricks.sliding_window_view(X[0], window_shape=10) >>> x = np.broadcast_to(X[0], shape=(s.shape[0], X.shape[1])) >>> argmin_subsequence_distance(s, x, k=4)
- wildboar.distance.distance_profile(y, x, *, dilation=1, padding=0, dim=0, metric='mass', metric_params=None, scale=False, n_jobs=None)[source]#
Compute the distance profile.
The distance profile corresponds to the distance of the subsequences in y for every time point of the samples in x.
- Parameters:
- yarray-like of shape (m_timestep, ) or (n_samples, m_timestep)
The subsequences. if y.ndim is 1, we will broacast y to have the same number of samples as x.
- xndarray of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The samples. If x.ndim is 1, we will broadcast x to have the same number of samples as y.
- dilationint, optional
The dilation, i.e., the spacing between points in the subsequences.
- paddingint or {“same”}, optional
The amount of padding applied to the input time series. If “same”, the output size is the same as the input size.
- dimint, optional
The dim to search for shapelets.
- metricstr or callable, optional
The distance metric
See
_SUBSEQUENCE_METRICS.keys()
for a list of supported metrics.- metric_paramsdict, optional
Parameters to the metric.
Read more about the parameters in the User guide.
- scalebool, optional
If True, scale the subsequences before distance computation.
- n_jobsint, optional
The number of parallel jobs to run.
- Returns:
- ndarray of shape (n_samples, output_size) or (output_size, )
The distance profile. output_size is given by: n_timestep + 2 * padding - (n_timestep - 1) * dilation + 1) + 1. If both x and y contains a single subsequence and a single sample, the output is squeezed.
Warning
Passing a callable to the metric parameter has a significant performance implication.
Examples
>>> from wildboar.datasets import load_dataset >>> from wildboar.distance import distance_profile >>> X, _ = load_dataset("ECG200") >>> distance_profile(X[0], X[1:].reshape(-1)) array([14.00120332, 14.41943788, 14.81597243, ..., 4.75219094, 5.72681005, 6.70155561])
>>> distance_profile( ... X[0, 0:9], X[1:5], metric="dtw", dilation=2, padding="same" ... )[0, :10] array([8.01881424, 7.15083281, 7.48856368, 6.83139294, 6.75595579, 6.30073636, 6.65346307, 6.27919601, 6.25666948, 6.0961576 ])
- wildboar.distance.matrix_profile(x, y=None, *, window=5, dim=0, exclude=None, n_jobs=-1, return_index=False)[source]#
Compute the matrix profile.
If only
x
is given, compute the similarity self-join of every subsequence inx
of sizewindow
to its nearest neighbor in x excluding trivial matches according to theexclude
parameter.If both
x
andy
are given, compute the similarity join of every subsequenec iny
of sizewindow
to its nearest neighbor inx
excluding matches according to theexclude
parameter.
- Parameters:
- xarray-like of shape (n_timestep, ), (n_samples, xn_timestep) or (n_samples, n_dim, xn_timestep)
The first time series.
- yarray-like of shape (n_timestep, ), (n_samples, yn_timestep) or (n_samples, n_dim, yn_timestep), optional
The optional second time series. y is broadcast to the shape of x if possible.
- windowint or float, optional
The subsequence size, by default 5
if float, a fraction of y.shape[-1].
if int, the exact subsequence size.
- dimint, optional
The dim to compute the matrix profile for, by default 0.
- excludeint or float, optional
The size of the exclusion zone. The default exclusion zone is 0.2 for similarity self-join and 0.0 for similarity join.
if float, expressed as a fraction of the windows size.
if int, exact size (0 >= exclude < window).
- n_jobsint, optional
The number of jobs to use when computing the profile.
- return_indexbool, optional
Return the matrix profile index.
- Returns:
- mpndarray of shape (profile_size, ) or (n_samples, profile_size)
The matrix profile.
- mpindarray of shape (profile_size, ) or (n_samples, profile_size), optional
The matrix profile index.
Notes
The profile_size depends on the input.
If y is None, profile_size is
x.shape[-1] - window + 1
If y is not None, profile_size is
y.shape[-1] - window + 1
References
- Yeh, C. C. M. et al. (2016).
Matrix profile I: All pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In 2016 IEEE 16th international conference on data mining (ICDM)
- wildboar.distance.paired_distance(x, y, *, dim='warn', metric='euclidean', metric_params=None, n_jobs=None)[source]#
Compute the distance between the i:th time series.
- Parameters:
- xndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input data.
- yndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input data. y will be broadcasted to the shape of x.
- dimint or {‘mean’, ‘full’}, optional
The dim to compute distance.
- metricstr or callable, optional
The distance metric
See
_METRICS.keys()
for a list of supported metrics.- metric_paramsdict, optional
Parameters to the metric.
Read more about the parameters in the User guide.
- n_jobsint, optional
The number of parallel jobs.
- Returns:
- ndarray
The distances. Return depends on input:
if x.ndim == 1, return scalar.
if dim=’full’, return ndarray of shape (n_dims, n_samples).
if x.ndim > 1, return an ndarray of shape (n_samples, ).
Warning
Passing a callable to the metric parameter has a significant performance implication.
- wildboar.distance.paired_subsequence_distance(y, x, *, dim=0, metric='euclidean', metric_params=None, scale=False, return_index=False, n_jobs=None)[source]#
Minimum subsequence distance between the i:th subsequence and time series.
- Parameters:
- ylist or ndarray of shape (n_samples, m_timestep)
Input time series.
if list, a list of array-like of shape (m_timestep, ).
- xndarray of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input data.
- dimint, optional
The dim to search for shapelets.
- metricstr or callable, optional
The distance metric
See
_SUBSEQUENCE_METRICS.keys()
for a list of supported metrics.- metric_paramsdict, optional
Parameters to the metric.
Read more about the parameters in the User guide.
- scalebool, optional
If True, scale the subsequences before distance computation.
Added in version 1.3.
- return_indexbool, optional
if True return the index of the best match. If there are many equally good matches, the first match is returned.
- n_jobsint, optional
The number of parallel jobs to run.
- Returns:
- distfloat, ndarray
An array of shape (n_samples, ) with the minumum distance between the i:th subsequence and the i:th sample.
- indicesint, ndarray, optional
An array of shape (n_samples, ) with the index of the best matching position of the i:th subsequence and the i:th sample.
Warning
Passing a callable to the metric parameter has a significant performance implication.
- wildboar.distance.paired_subsequence_match(y, x, threshold=None, *, dim=0, metric='euclidean', metric_params=None, scale=False, max_matches=None, return_distance=False, n_jobs=None)[source]#
Find matching subsequnces.
Find the positions where the distance is less than the threshold between the i:th subsequences and time series.
If a threshold is given, the default behaviour is to return all matching indices in the order of occurrence
If no threshold is given, the default behaviour is to return the top 10 matching indicies ordered by distance
If both threshold and max_matches are given, the top matches are returned ordered by distance and time series.
- Parameters:
- ylist or ndarray of shape (n_samples, n_timestep)
Input time series.
if list, a list of array-like of shape (n_timestep, ) with length n_samples.
- xndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input data.
- thresholdfloat, optional
The distance threshold used to consider a subsequence matching. If no threshold is selected, max_matches defaults to 10.
- dimint, optional
The dim to search for shapelets.
- metricstr or callable, optional
The distance metric
See
_SUBSEQUENCE_METRICS.keys()
for a list of supported metrics.- metric_paramsdict, optional
Parameters to the metric.
Read more about the parameters in the User guide.
- scalebool, optional
If True, scale the subsequences before distance computation.
Added in version 1.3.
- max_matchesint, optional
Return the top max_matches matches below threshold.
If a threshold is given, the default behaviour is to return all matching indices in the order of occurrence .
If no threshold is given, the default behaviour is to return the top 10 matching indicies ordered by distance
If both threshold and max_matches are given the top matches are returned ordered by distance.
- return_distancebool, optional
If True, return the distance of the match.
- n_jobsint, optional
The number of parallel jobs to run. Ignored.
- Returns:
- indiciesndarray of shape (n_samples, )
The start index of matching subsequences.
- distancendarray of shape (n_samples, ), optional
The distances of matching subsequences.
Warning
Passing a callable to the metric parameter has a significant performance implication.
- wildboar.distance.pairwise_distance(x, y=None, *, dim='warn', metric='euclidean', metric_params=None, n_jobs=None)[source]#
Compute the distance between subsequences and time series.
- Parameters:
- xndarray of shape (n_timestep, ), (x_samples, n_timestep) or (x_samples, n_dims, n_timestep)
The input data.
- yndarray of shape (n_timestep, ), (y_samples, n_timestep) or (y_samples, n_dims, n_timestep), optional
The input data.
- dimint or {‘mean’, ‘full’}, optional
The dim to compute distance.
- metricstr or callable, optional
The distance metric
See
_METRICS.keys()
for a list of supported metrics.- metric_paramsdict, optional
Parameters to the metric.
Read more about the parameters in the User guide.
- n_jobsint, optional
The number of parallel jobs.
- Returns:
- float or ndarray
The distances. Return depends on input.
if x.ndim == 1 and y.ndim == 1, scalar.
if dim=”full”, array of shape (n_dims, x_samples, y_samples).
if dim=”full” and y is None, array of shape (n_dims, x_samples, x_samples).
if x.ndim > 1 and y is None, array of shape (x_samples, x_samples).
if x.ndim > 1 and y.ndim > 1, array of shape (x_samples, y_samples).
if x.ndim == 1 and y.ndim > 1, array of shape (y_samples, ).
if y.ndim == 1 and x.ndim > 1, array of shape (x_samples, ).
Warning
Passing a callable to the metric parameter has a significant performance implication.
- wildboar.distance.pairwise_subsequence_distance(y, x, *, dim=0, metric='euclidean', metric_params=None, scale=False, return_index=False, n_jobs=None)[source]#
Minimum subsequence distance between subsequences and time series.
- Parameters:
- ylist or ndarray of shape (n_subsequences, n_timestep)
Input time series.
if list, a list of array-like of shape (n_timestep, ).
- xndarray of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input data.
- dimint, optional
The dim to search for subsequence.
- metricstr or callable, optional
The distance metric
See
_SUBSEQUENCE_METRICS.keys()
for a list of supported metrics.- metric_paramsdict, optional
Parameters to the metric.
Read more about the parameters in the User guide.
- scalebool, optional
If True, scale the subsequences before distance computation.
Added in version 1.3.
- return_indexbool, optional
if True return the index of the best match. If there are many equally good matches, the first match is returned.
- n_jobsint, optional
The number of parallel jobs.
- Returns:
- distfloat, ndarray
The minumum distance. Return depends on input:
if len(y) > 1 and x.ndim > 1, return an array of shape (n_samples, n_subsequences).
if len(y) == 1, return an array of shape (n_samples, ).
if x.ndim == 1, return an array of shape (n_subsequences, ).
if x.ndim == 1 and len(y) == 1, return scalar.
- indicesint, ndarray, optional
The start index of the minumum distance. Return dependes on input:
if len(y) > 1 and x.ndim > 1, return an array of shape (n_samples, n_subsequences).
if len(y) == 1, return an array of shape (n_samples, ).
if x.ndim == 1, return an array of shape (n_subsequences, ).
if x.ndim == 1 and len(y) == 1, return scalar.
Warning
Passing a callable to the metric parameter has a significant performance implication.
- wildboar.distance.subsequence_match(y, x, threshold=None, *, dim=0, metric='euclidean', metric_params=None, scale=False, max_matches=None, exclude=None, return_distance=False, n_jobs=None)[source]#
Find matching subsequnces.
Find the positions where the distance is less than the threshold between the subsequence and all time series.
If a threshold is given, the default behaviour is to return all matching indices in the order of occurrence
If no threshold is given, the default behaviour is to return the top 10 matching indicies ordered by distance
If both threshold and max_matches are given, the top matches are returned ordered by distance.
- Parameters:
- yarray-like of shape (yn_timestep, )
The subsequence.
- xndarray of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep)
The input data.
- threshold{“auto”}, float or callable, optional
The distance threshold used to consider a subsequence matching. If no threshold is selected, max_matches defaults to 10.
if float, return all matches closer than threshold
if callable, return all matches closer than the treshold computed by the threshold function, given all distances to the subsequence
if str, return all matches according to the named threshold.
- dimint, optional
The dim to search for shapelets.
- metricstr or callable, optional
The distance metric
See
_SUBSEQUENCE_METRICS.keys()
for a list of supported metrics.- metric_paramsdict, optional
Parameters to the metric.
Read more about the parameters in the User guide.
- scalebool, optional
If True, scale the subsequences before distance computation.
Added in version 1.3.
- max_matchesint, optional
Return the top max_matches matches below threshold.
- excludefloat or int, optional
Exclude trivial matches in the vicinity of the match.
if float, the exclusion zone is computed as
math.ceil(exclude * y.size)
if int, the exclusion zone is exact
A match is considered trivial if a match with lower distance is within exclude timesteps of another match with higher distance.
- return_distancebool, optional
if True, return the distance of the match.
- n_jobsint, optional
The number of parallel jobs to run.
- Returns:
- indiciesndarray of shape (n_samples, ) or (n_matches, )
The start index of matching subsequences. Returns a single array of n_matches if x.ndim == 1. If no matches are found for a sample, the array element is None.
- distancendarray of shape (n_samples, ), optional
The distances of matching subsequences. Returns a single array of n_matches if x.ndim == 1. If no matches are found for a sample, the array element is None.
Warning
Passing a callable to the metric parameter has a significant performance implication.