*************************** :py:mod:`wildboar.distance` *************************** .. py:module:: wildboar.distance .. autoapi-nested-parse:: Fast distance computations. The :py:mod:`wildboar.distance` module includes functions for computing paired and pairwise distances between time series and between time series and subsequences. See the :ref:`User Guide ` for more details and examples. .. !! processed by numpydoc !! Submodules ========== .. toctree:: :titlesonly: :maxdepth: 1 dtw/index.rst Package Contents ---------------- Classes ------- .. autoapisummary:: wildboar.distance.KMeans wildboar.distance.KMedoids wildboar.distance.KNeighborsClassifier wildboar.distance.MDS Functions --------- .. autoapisummary:: wildboar.distance.argmin_distance wildboar.distance.argmin_subsequence_distance wildboar.distance.distance_profile wildboar.distance.matrix_profile wildboar.distance.paired_distance wildboar.distance.paired_subsequence_distance wildboar.distance.paired_subsequence_match wildboar.distance.pairwise_distance wildboar.distance.pairwise_subsequence_distance wildboar.distance.subsequence_match .. py:class:: KMeans(n_clusters=8, *, metric='euclidean', r=1.0, g=None, init='random', n_init='auto', max_iter=300, tol=0.001, verbose=0, random_state=None) KMeans clustering with support for DTW and weighted DTW. :Parameters: **n_clusters** : int, optional The number of clusters. **metric** : {"euclidean", "dtw"}, optional The metric. **r** : float, optional The size of the warping window. **g** : float, optional SoftDTW penalty. If None, traditional DTW is used. **init** : {"random"}, optional Cluster initialization. If "random", randomly initialize `n_clusters`. **n_init** : "auto" or int, optional Number times the algorithm is re-initialized with new centroids. **max_iter** : int, optional The maximum number of iterations for a single run of the algorithm. **tol** : float, optional Relative tolerance to declare convergence of two consecutive iterations. **verbose** : int, optional Print diagnostic messages during convergence. **random_state** : RandomState or int, optional Determines random number generation for centroid initialization and barycentering when fitting with `metric="dtw"`. :Attributes: **n_iter_** : int The number of iterations before convergence. **cluster_centers_** : ndarray of shape (n_clusters, n_timestep) The cluster centers. **labels_** : ndarray of shape (n_samples, ) The cluster assignment. .. !! processed by numpydoc !! .. py:method:: fit(x, y=None) Compute the kmeans-clustering. :Parameters: **x** : univariate time-series The input samples. **y** : Ignored, optional Not used. :Returns: object Fitted estimator. .. !! processed by numpydoc !! .. py:method:: fit_predict(X, y=None, **kwargs) Perform clustering on `X` and returns cluster labels. :Parameters: **X** : array-like of shape (n_samples, n_features) Input data. **y** : Ignored Not used, present for API consistency by convention. **\*\*kwargs** : dict Arguments to be passed to ``fit``. .. versionadded:: 1.4 :Returns: **labels** : ndarray of shape (n_samples,), dtype=np.int64 Cluster labels. .. !! processed by numpydoc !! .. py:method:: fit_transform(X, y=None, **fit_params) Fit to data, then transform it. Fits transformer to `X` and `y` with optional parameters `fit_params` and returns a transformed version of `X`. :Parameters: **X** : array-like of shape (n_samples, n_features) Input samples. **y** : array-like of shape (n_samples,) or (n_samples, n_outputs), default=None Target values (None for unsupervised transformations). **\*\*fit_params** : dict Additional fit parameters. :Returns: **X_new** : ndarray array of shape (n_samples, n_features_new) Transformed array. .. !! processed by numpydoc !! .. py:method:: get_metadata_routing() Get metadata routing of this object. Please check :ref:`User Guide ` on how the routing mechanism works. :Returns: **routing** : MetadataRequest A :class:`~sklearn.utils.metadata_routing.MetadataRequest` encapsulating routing information. .. !! processed by numpydoc !! .. py:method:: get_params(deep=True) Get parameters for this estimator. :Parameters: **deep** : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. :Returns: **params** : dict Parameter names mapped to their values. .. !! processed by numpydoc !! .. py:method:: predict(x) Predict the closest cluster for each sample. :Parameters: **x** : univariate time-series The input samples. :Returns: ndarray of shape (n_samples, ) Index of the cluster each sample belongs to. .. !! processed by numpydoc !! .. py:method:: set_output(*, transform=None) Set output container. See :ref:`sphx_glr_auto_examples_miscellaneous_plot_set_output.py` for an example on how to use the API. :Parameters: **transform** : {"default", "pandas", "polars"}, default=None Configure output of `transform` and `fit_transform`. - `"default"`: Default output format of a transformer - `"pandas"`: DataFrame output - `"polars"`: Polars output - `None`: Transform configuration is unchanged .. versionadded:: 1.4 `"polars"` option was added. :Returns: **self** : estimator instance Estimator instance. .. !! processed by numpydoc !! .. py:method:: set_params(**params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as :class:`~sklearn.pipeline.Pipeline`). The latter have parameters of the form ``__`` so that it's possible to update each component of a nested object. :Parameters: **\*\*params** : dict Estimator parameters. :Returns: **self** : estimator instance Estimator instance. .. !! processed by numpydoc !! .. py:method:: transform(x) Transform the input to a cluster distance space. :Parameters: **x** : univariate time-series The input samples. :Returns: ndarray of shape (n_samples, n_clusters) The distance between each sample and each cluster. .. !! processed by numpydoc !! .. py:class:: KMedoids(n_clusters=8, metric='euclidean', metric_params=None, init='random', n_init='auto', algorithm='fast', max_iter=30, tol=0.0001, verbose=0, n_jobs=None, random_state=None) KMedoid algorithm. :Parameters: **n_clusters** : int, optional The number of clusters. **metric** : str, optional The metric. **metric_params** : dict, optional The metric parameters. Read more about the metrics and their parameters in the :ref:`User guide `. **init** : {"auto", "random", "min"}, optional Cluster initialization. If "random", randomly initialize `n_clusters`, if "min" select the samples with the smallest distance to the other samples. **n_init** : "auto" or int, optional Number times the algorithm is re-initialized with new centroids. **algorithm** : {"fast", "pam"}, optional The algorithm for updating cluster assignments. If "pam", use the Partitioning Around Medoids algorithm. **max_iter** : int, optional The maximum number of iterations for a single run of the algorithm. **tol** : float, optional Relative tolerance to declare convergence of two consecutive iterations. **verbose** : int, optional Print diagnostic messages during convergence. **n_jobs** : int, optional The number of jobs to run in parallel. A value of `None` means using a single core and a value of `-1` means using all cores. Positive integers mean the exact number of cores. **random_state** : RandomState or int, optional Determines random number generation for centroid initialization and barycentering when fitting with `metric="dtw"`. :Attributes: **n_iter_** : int The number of iterations before convergence. **cluster_centers_** : ndarray of shape (n_clusters, n_timestep) The cluster centers. **medoid_indices_** : ndarray of shape (n_clusters, ) The index of the medoid in the input samples. **labels_** : ndarray of shape (n_samples, ) The cluster assignment. .. !! processed by numpydoc !! .. py:method:: fit(x, y=None) Compute the kmedoids-clustering. :Parameters: **x** : univariate time-series The input samples. **y** : Ignored, optional Not used. :Returns: object Fitted estimator. .. !! processed by numpydoc !! .. py:method:: fit_predict(X, y=None, **kwargs) Perform clustering on `X` and returns cluster labels. :Parameters: **X** : array-like of shape (n_samples, n_features) Input data. **y** : Ignored Not used, present for API consistency by convention. **\*\*kwargs** : dict Arguments to be passed to ``fit``. .. versionadded:: 1.4 :Returns: **labels** : ndarray of shape (n_samples,), dtype=np.int64 Cluster labels. .. !! processed by numpydoc !! .. py:method:: fit_transform(X, y=None, **fit_params) Fit to data, then transform it. Fits transformer to `X` and `y` with optional parameters `fit_params` and returns a transformed version of `X`. :Parameters: **X** : array-like of shape (n_samples, n_features) Input samples. **y** : array-like of shape (n_samples,) or (n_samples, n_outputs), default=None Target values (None for unsupervised transformations). **\*\*fit_params** : dict Additional fit parameters. :Returns: **X_new** : ndarray array of shape (n_samples, n_features_new) Transformed array. .. !! processed by numpydoc !! .. py:method:: get_metadata_routing() Get metadata routing of this object. Please check :ref:`User Guide ` on how the routing mechanism works. :Returns: **routing** : MetadataRequest A :class:`~sklearn.utils.metadata_routing.MetadataRequest` encapsulating routing information. .. !! processed by numpydoc !! .. py:method:: get_params(deep=True) Get parameters for this estimator. :Parameters: **deep** : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. :Returns: **params** : dict Parameter names mapped to their values. .. !! processed by numpydoc !! .. py:method:: predict(x) Predict the closest cluster for each sample. :Parameters: **x** : univariate time-series The input samples. :Returns: ndarray of shape (n_samples, ) Index of the cluster each sample belongs to. .. !! processed by numpydoc !! .. py:method:: set_output(*, transform=None) Set output container. See :ref:`sphx_glr_auto_examples_miscellaneous_plot_set_output.py` for an example on how to use the API. :Parameters: **transform** : {"default", "pandas", "polars"}, default=None Configure output of `transform` and `fit_transform`. - `"default"`: Default output format of a transformer - `"pandas"`: DataFrame output - `"polars"`: Polars output - `None`: Transform configuration is unchanged .. versionadded:: 1.4 `"polars"` option was added. :Returns: **self** : estimator instance Estimator instance. .. !! processed by numpydoc !! .. py:method:: set_params(**params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as :class:`~sklearn.pipeline.Pipeline`). The latter have parameters of the form ``__`` so that it's possible to update each component of a nested object. :Parameters: **\*\*params** : dict Estimator parameters. :Returns: **self** : estimator instance Estimator instance. .. !! processed by numpydoc !! .. py:method:: transform(x) Transform the input to a cluster distance space. :Parameters: **x** : univariate time-series The input samples. :Returns: ndarray of shape (n_samples, n_clusters) The distance between each sample and each cluster. .. !! processed by numpydoc !! .. py:class:: KNeighborsClassifier(n_neighbors=5, *, metric='euclidean', metric_params=None, n_jobs=None) Classifier implementing k-nearest neighbors. :Parameters: **n_neighbors** : int, optional The number of neighbors. **metric** : str, optional The distance metric. **metric_params** : dict, optional Optional parameters to the distance metric. Read more about the metrics and their parameters in the :ref:`User guide `. **n_jobs** : int, optional The number of parallel jobs. :Attributes: **classes_** : ndarray of shapel (n_classes, ) Known class labels. .. !! processed by numpydoc !! .. py:method:: fit(x, y) Fit the classifier to the training data. :Parameters: **x** : univariate time-series or multivaraite time-series The input samples. **y** : array-like of shape (n_samples, ) The input labels. :Returns: KNeighborClassifier This instance. .. !! processed by numpydoc !! .. py:method:: get_metadata_routing() Get metadata routing of this object. Please check :ref:`User Guide ` on how the routing mechanism works. :Returns: **routing** : MetadataRequest A :class:`~sklearn.utils.metadata_routing.MetadataRequest` encapsulating routing information. .. !! processed by numpydoc !! .. py:method:: get_params(deep=True) Get parameters for this estimator. :Parameters: **deep** : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. :Returns: **params** : dict Parameter names mapped to their values. .. !! processed by numpydoc !! .. py:method:: predict(x) Compute the class label for the samples in x. :Parameters: **x** : univariate time-series or multivariate time-series The input samples. :Returns: ndarray of shape (n_samples, ) The class label for each sample. .. !! processed by numpydoc !! .. py:method:: predict_proba(x) Compute probability estimates for the samples in x. :Parameters: **x** : univariate time-series or multivariate time-series The input samples. :Returns: ndarray of shape (n_samples, len(self.classes_)) The probability of each class for each sample. .. !! processed by numpydoc !! .. py:method:: score(X, y, sample_weight=None) Return the mean accuracy on the given test data and labels. In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted. :Parameters: **X** : array-like of shape (n_samples, n_features) Test samples. **y** : array-like of shape (n_samples,) or (n_samples, n_outputs) True labels for `X`. **sample_weight** : array-like of shape (n_samples,), default=None Sample weights. :Returns: **score** : float Mean accuracy of ``self.predict(X)`` w.r.t. `y`. .. !! processed by numpydoc !! .. py:method:: set_params(**params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as :class:`~sklearn.pipeline.Pipeline`). The latter have parameters of the form ``__`` so that it's possible to update each component of a nested object. :Parameters: **\*\*params** : dict Estimator parameters. :Returns: **self** : estimator instance Estimator instance. .. !! processed by numpydoc !! .. py:class:: MDS(n_components=2, *, metric=True, n_init=4, max_iter=300, verbose=0, eps=0.001, n_jobs=None, random_state=None, dissimilarity='euclidean', dissimilarity_params=None, normalized_stress='warn') Multidimensional scaling. :Parameters: **n_components** : int, optional Number of dimensions in which to immerse the dissimilarities. **metric** : bool, optional If `True`, perform metric MDS; otherwise, perform nonmetric MDS. When `False` (i.e. non-metric MDS), dissimilarities with 0 are considered as missing values. **n_init** : int, optional Number of times the SMACOF algorithm will be run with different initializations. The final results will be the best output of the runs, determined by the run with the smallest final stress. **max_iter** : int, optional Maximum number of iterations of the SMACOF algorithm for a single run. **verbose** : int, optional Level of verbosity. **eps** : float, optional Relative tolerance with respect to stress at which to declare convergence. The value of `eps` should be tuned separately depending on whether or not `normalized_stress` is being used. **n_jobs** : int, optional The number of jobs to use for the computation. If multiple initializations are used (``n_init``), each run of the algorithm is computed in parallel. **random_state** : int, RandomState instance or None, optional Determines the random number generator used to initialize the centers. Pass an int for reproducible results across multiple function calls. **dissimilarity** : str, optional The dissimilarity measure. See `_METRICS.keys()` for a list of supported metrics. **dissimilarity_params** : dict, optional Parameters to the dissimilarity measue. Read more about the parameters in the :ref:`User guide `. **normalized_stress** : bool or "auto", optional Whether use and return normed stress value (Stress-1) instead of raw stress calculated by default. Only supported in non-metric MDS. .. rubric:: Notes This implementation is a convenience wrapper around :class:`sklearn.manifold.MDS` to when using Wildboar metrics. .. !! processed by numpydoc !! .. py:method:: get_metadata_routing() Get metadata routing of this object. Please check :ref:`User Guide ` on how the routing mechanism works. :Returns: **routing** : MetadataRequest A :class:`~sklearn.utils.metadata_routing.MetadataRequest` encapsulating routing information. .. !! processed by numpydoc !! .. py:method:: get_params(deep=True) Get parameters for this estimator. :Parameters: **deep** : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. :Returns: **params** : dict Parameter names mapped to their values. .. !! processed by numpydoc !! .. py:method:: set_params(**params) Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as :class:`~sklearn.pipeline.Pipeline`). The latter have parameters of the form ``__`` so that it's possible to update each component of a nested object. :Parameters: **\*\*params** : dict Estimator parameters. :Returns: **self** : estimator instance Estimator instance. .. !! processed by numpydoc !! .. py:function:: argmin_distance(x, y=None, *, dim=0, k=1, metric='euclidean', metric_params=None, sorted=False, return_distance=False, n_jobs=None) Find the indicies of the samples with the lowest distance in `Y`. :Parameters: **x** : univariate time-series or multivariate time-series The needle. **y** : univariate time-series or multivariate time-series, optional The haystack. **dim** : int, optional The dimension where the distance is computed. **k** : int, optional The number of closest samples. **metric** : str, optional The distance metric See ``_METRICS.keys()`` for a list of supported metrics. **metric_params** : dict, optional Parameters to the metric. Read more about the parameters in the :ref:`User guide `. **sorted** : bool, optional Sort the indicies from smallest to largest distance. **return_distance** : bool, optional Return the distance for the `k` samples. **n_jobs** : int, optional The number of parallel jobs. :Returns: **indices** : ndarray of shape (n_samples, k) The indices of the samples in `Y` with the smallest distance. **distance** : ndarray of shape (n_samples, k), optional The distance of the samples in `Y` with the smallest distance. .. warning:: Passing a callable to the `metric` parameter has a significant performance implication. .. rubric:: Examples >>> from wildoar.distance import argmin_distance >>> X = np.array([[1, 2, 3, 4], [10, 1, 2, 3]]) >>> Y = np.array([[1, 2, 11, 2], [2, 4, 6, 7], [10, 11, 2, 3]]) >>> argmin_distance(X, Y, k=2, return_distance=True) (array([[0, 1], [1, 2]]), array([[ 8.24621125, 4.79583152], [10.24695077, 10. ]])) .. !! processed by numpydoc !! .. py:function:: argmin_subsequence_distance(y, x, *, dim=0, k=1, metric='euclidean', metric_params=None, scale=False, return_distance=False, n_jobs=None) Compute the k:th closest subsequences. For the i:th shapelet and the i:th sample return the index and, optionally, the distance of the `k` closest matches. :Parameters: **y** : array-like of shape (n_samples, m_timestep) or list of 1d-arrays The subsequences. **x** : array-like of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep) The samples. If x.ndim == 1, it will be broadcast have the same number of samples that y. **dim** : int, optional The dimension in x to find subsequences in. **k** : int, optional The of closest subsequences to find. **metric** : str, optional The metric. See ``_SUBSEQUENCE_METRICS.keys()`` for a list of supported metrics. **metric_params** : dict, optional Parameters to the metric. Read more about the parameters in the :ref:`User guide `. **scale** : bool, optional If True, scale the subsequences before distance computation. **return_distance** : bool, optional Return the distance for the `k` closest subsequences. **n_jobs** : int, optional The number of parallel jobs. :Returns: **indices** : ndarray of shape (n_samples, k) The indices of the `k` closest subsequences. **distance** : ndarray of shape (n_samples, k), optional The distance of the `k` closest subsequences. .. warning:: Passing a callable to the `metric` parameter has a significant performance implication. .. rubric:: Examples >>> import numpy as np >>> from wildboar.datasets import load_dataset >>> from wildboar.distance import argmin_subsequence_distance >>> s = np.lib.stride_tricks.sliding_window_view(X[0], window_shape=10) >>> x = np.broadcast_to(X[0], shape=(s.shape[0], X.shape[1])) >>> argmin_subsequence_distance(s, x, k=4) .. !! processed by numpydoc !! .. py:function:: distance_profile(y, x, *, dilation=1, padding=0, dim=0, metric='mass', metric_params=None, scale=False, n_jobs=None) Compute the distance profile. The distance profile corresponds to the distance of the subsequences in y for every time point of the samples in x. :Parameters: **y** : array-like of shape (m_timestep, ) or (n_samples, m_timestep) The subsequences. if `y.ndim` is 1, we will broacast `y` to have the same number of samples as `x`. **x** : ndarray of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep) The samples. If `x.ndim` is 1, we will broadcast `x` to have the same number of samples as `y`. **dilation** : int, optional The dilation, i.e., the spacing between points in the subsequences. **padding** : int or {"same"}, optional The amount of padding applied to the input time series. If "same", the output size is the same as the input size. **dim** : int, optional The dim to search for shapelets. **metric** : str or callable, optional The distance metric See ``_SUBSEQUENCE_METRICS.keys()`` for a list of supported metrics. **metric_params** : dict, optional Parameters to the metric. Read more about the parameters in the :ref:`User guide `. **scale** : bool, optional If True, scale the subsequences before distance computation. **n_jobs** : int, optional The number of parallel jobs to run. :Returns: ndarray of shape (n_samples, output_size) or (output_size, ) The distance profile. `output_size` is given by: `n_timestep + 2 * padding - (n_timestep - 1) * dilation + 1) + 1`. If both `x` and `y` contains a single subsequence and a single sample, the output is squeezed. .. warning:: Passing a callable to the `metric` parameter has a significant performance implication. .. rubric:: Examples >>> from wildboar.datasets import load_dataset >>> from wildboar.distance import distance_profile >>> X, _ = load_dataset("ECG200") >>> distance_profile(X[0], X[1:].reshape(-1)) array([14.00120332, 14.41943788, 14.81597243, ..., 4.75219094, 5.72681005, 6.70155561]) >>> distance_profile( ... X[0, 0:9], X[1:5], metric="dtw", dilation=2, padding="same" ... )[0, :10] array([8.01881424, 7.15083281, 7.48856368, 6.83139294, 6.75595579, 6.30073636, 6.65346307, 6.27919601, 6.25666948, 6.0961576 ]) .. !! processed by numpydoc !! .. py:function:: matrix_profile(x, y=None, *, window=5, dim=0, exclude=None, n_jobs=-1, return_index=False) Compute the matrix profile. - If only ``x`` is given, compute the similarity self-join of every subsequence in ``x`` of size ``window`` to its nearest neighbor in `x` excluding trivial matches according to the ``exclude`` parameter. - If both ``x`` and ``y`` are given, compute the similarity join of every subsequenec in ``y`` of size ``window`` to its nearest neighbor in ``x`` excluding matches according to the ``exclude`` parameter. :Parameters: **x** : array-like of shape (n_timestep, ), (n_samples, xn_timestep) or (n_samples, n_dim, xn_timestep) The first time series. **y** : array-like of shape (n_timestep, ), (n_samples, yn_timestep) or (n_samples, n_dim, yn_timestep), optional The optional second time series. y is broadcast to the shape of x if possible. **window** : int or float, optional The subsequence size, by default 5 - if float, a fraction of `y.shape[-1]`. - if int, the exact subsequence size. **dim** : int, optional The dim to compute the matrix profile for, by default 0. **exclude** : int or float, optional The size of the exclusion zone. The default exclusion zone is 0.2 for similarity self-join and 0.0 for similarity join. - if float, expressed as a fraction of the windows size. - if int, exact size (0 >= exclude < window). **n_jobs** : int, optional The number of jobs to use when computing the profile. **return_index** : bool, optional Return the matrix profile index. :Returns: **mp** : ndarray of shape (profile_size, ) or (n_samples, profile_size) The matrix profile. **mpi** : ndarray of shape (profile_size, ) or (n_samples, profile_size), optional The matrix profile index. .. rubric:: Notes The `profile_size` depends on the input. - If `y` is `None`, `profile_size` is ``x.shape[-1] - window + 1`` - If `y` is not `None`, `profile_size` is ``y.shape[-1] - window + 1`` .. rubric:: References Yeh, C. C. M. et al. (2016). Matrix profile I: All pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In 2016 IEEE 16th international conference on data mining (ICDM) .. only:: latex .. !! processed by numpydoc !! .. py:function:: paired_distance(x, y, *, dim='warn', metric='euclidean', metric_params=None, n_jobs=None) Compute the distance between the i:th time series. :Parameters: **x** : ndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep) The input data. **y** : ndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep) The input data. y will be broadcasted to the shape of x. **dim** : int or {'mean', 'full'}, optional The dim to compute distance. **metric** : str or callable, optional The distance metric See ``_METRICS.keys()`` for a list of supported metrics. **metric_params** : dict, optional Parameters to the metric. Read more about the parameters in the :ref:`User guide `. **n_jobs** : int, optional The number of parallel jobs. :Returns: ndarray The distances. Return depends on input: - if x.ndim == 1, return scalar. - if dim='full', return ndarray of shape (n_dims, n_samples). - if x.ndim > 1, return an ndarray of shape (n_samples, ). .. warning:: Passing a callable to the `metric` parameter has a significant performance implication. .. !! processed by numpydoc !! .. py:function:: paired_subsequence_distance(y, x, *, dim=0, metric='euclidean', metric_params=None, scale=False, return_index=False, n_jobs=None) Minimum subsequence distance between the i:th subsequence and time series. :Parameters: **y** : list or ndarray of shape (n_samples, m_timestep) Input time series. - if list, a list of array-like of shape (m_timestep, ). **x** : ndarray of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep) The input data. **dim** : int, optional The dim to search for shapelets. **metric** : str or callable, optional The distance metric See ``_SUBSEQUENCE_METRICS.keys()`` for a list of supported metrics. **metric_params** : dict, optional Parameters to the metric. Read more about the parameters in the :ref:`User guide `. **scale** : bool, optional If True, scale the subsequences before distance computation. .. versionadded:: 1.3 **return_index** : bool, optional - if True return the index of the best match. If there are many equally good matches, the first match is returned. **n_jobs** : int, optional The number of parallel jobs to run. :Returns: **dist** : float, ndarray An array of shape (n_samples, ) with the minumum distance between the i:th subsequence and the i:th sample. **indices** : int, ndarray, optional An array of shape (n_samples, ) with the index of the best matching position of the i:th subsequence and the i:th sample. .. warning:: Passing a callable to the `metric` parameter has a significant performance implication. .. !! processed by numpydoc !! .. py:function:: paired_subsequence_match(y, x, threshold=None, *, dim=0, metric='euclidean', metric_params=None, scale=False, max_matches=None, return_distance=False, n_jobs=None) Find matching subsequnces. Find the positions where the distance is less than the threshold between the i:th subsequences and time series. - If a `threshold` is given, the default behaviour is to return all matching indices in the order of occurrence - If no `threshold` is given, the default behaviour is to return the top 10 matching indicies ordered by distance - If both `threshold` and `max_matches` are given, the top matches are returned ordered by distance and time series. :Parameters: **y** : list or ndarray of shape (n_samples, n_timestep) Input time series. - if list, a list of array-like of shape (n_timestep, ) with length n_samples. **x** : ndarray of shape (n_samples, n_timestep) or (n_samples, n_dims, n_timestep) The input data. **threshold** : float, optional The distance threshold used to consider a subsequence matching. If no threshold is selected, `max_matches` defaults to 10. **dim** : int, optional The dim to search for shapelets. **metric** : str or callable, optional The distance metric See ``_SUBSEQUENCE_METRICS.keys()`` for a list of supported metrics. **metric_params** : dict, optional Parameters to the metric. Read more about the parameters in the :ref:`User guide `. **scale** : bool, optional If True, scale the subsequences before distance computation. .. versionadded:: 1.3 **max_matches** : int, optional Return the top `max_matches` matches below `threshold`. - If a `threshold` is given, the default behaviour is to return all matching indices in the order of occurrence . - If no `threshold` is given, the default behaviour is to return the top 10 matching indicies ordered by distance - If both `threshold` and `max_matches` are given the top matches are returned ordered by distance. **return_distance** : bool, optional If True, return the distance of the match. **n_jobs** : int, optional The number of parallel jobs to run. Ignored. :Returns: **indicies** : ndarray of shape (n_samples, ) The start index of matching subsequences. **distance** : ndarray of shape (n_samples, ), optional The distances of matching subsequences. .. warning:: Passing a callable to the `metric` parameter has a significant performance implication. .. !! processed by numpydoc !! .. py:function:: pairwise_distance(x, y=None, *, dim='warn', metric='euclidean', metric_params=None, n_jobs=None) Compute the distance between subsequences and time series. :Parameters: **x** : ndarray of shape (n_timestep, ), (x_samples, n_timestep) or (x_samples, n_dims, n_timestep) The input data. **y** : ndarray of shape (n_timestep, ), (y_samples, n_timestep) or (y_samples, n_dims, n_timestep), optional The input data. **dim** : int or {'mean', 'full'}, optional The dim to compute distance. **metric** : str or callable, optional The distance metric See ``_METRICS.keys()`` for a list of supported metrics. **metric_params** : dict, optional Parameters to the metric. Read more about the parameters in the :ref:`User guide `. **n_jobs** : int, optional The number of parallel jobs. :Returns: float or ndarray The distances. Return depends on input. - if x.ndim == 1 and y.ndim == 1, scalar. - if dim="full", array of shape (n_dims, x_samples, y_samples). - if dim="full" and y is None, array of shape (n_dims, x_samples, x_samples). - if x.ndim > 1 and y is None, array of shape (x_samples, x_samples). - if x.ndim > 1 and y.ndim > 1, array of shape (x_samples, y_samples). - if x.ndim == 1 and y.ndim > 1, array of shape (y_samples, ). - if y.ndim == 1 and x.ndim > 1, array of shape (x_samples, ). .. warning:: Passing a callable to the `metric` parameter has a significant performance implication. .. !! processed by numpydoc !! .. py:function:: pairwise_subsequence_distance(y, x, *, dim=0, metric='euclidean', metric_params=None, scale=False, return_index=False, n_jobs=None) Minimum subsequence distance between subsequences and time series. :Parameters: **y** : list or ndarray of shape (n_subsequences, n_timestep) Input time series. - if list, a list of array-like of shape (n_timestep, ). **x** : ndarray of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep) The input data. **dim** : int, optional The dim to search for subsequence. **metric** : str or callable, optional The distance metric See ``_SUBSEQUENCE_METRICS.keys()`` for a list of supported metrics. **metric_params** : dict, optional Parameters to the metric. Read more about the parameters in the :ref:`User guide `. **scale** : bool, optional If True, scale the subsequences before distance computation. .. versionadded:: 1.3 **return_index** : bool, optional - if True return the index of the best match. If there are many equally good matches, the first match is returned. **n_jobs** : int, optional The number of parallel jobs. :Returns: **dist** : float, ndarray The minumum distance. Return depends on input: - if len(y) > 1 and x.ndim > 1, return an array of shape (n_samples, n_subsequences). - if len(y) == 1, return an array of shape (n_samples, ). - if x.ndim == 1, return an array of shape (n_subsequences, ). - if x.ndim == 1 and len(y) == 1, return scalar. **indices** : int, ndarray, optional The start index of the minumum distance. Return dependes on input: - if len(y) > 1 and x.ndim > 1, return an array of shape (n_samples, n_subsequences). - if len(y) == 1, return an array of shape (n_samples, ). - if x.ndim == 1, return an array of shape (n_subsequences, ). - if x.ndim == 1 and len(y) == 1, return scalar. .. warning:: Passing a callable to the `metric` parameter has a significant performance implication. .. !! processed by numpydoc !! .. py:function:: subsequence_match(y, x, threshold=None, *, dim=0, metric='euclidean', metric_params=None, scale=False, max_matches=None, exclude=None, return_distance=False, n_jobs=None) Find matching subsequnces. Find the positions where the distance is less than the threshold between the subsequence and all time series. - If a `threshold` is given, the default behaviour is to return all matching indices in the order of occurrence - If no `threshold` is given, the default behaviour is to return the top 10 matching indicies ordered by distance - If both `threshold` and `max_matches` are given, the top matches are returned ordered by distance. :Parameters: **y** : array-like of shape (yn_timestep, ) The subsequence. **x** : ndarray of shape (n_timestep, ), (n_samples, n_timestep) or (n_samples, n_dims, n_timestep) The input data. **threshold** : {"auto"}, float or callable, optional The distance threshold used to consider a subsequence matching. If no threshold is selected, `max_matches` defaults to 10. - if float, return all matches closer than threshold - if callable, return all matches closer than the treshold computed by the threshold function, given all distances to the subsequence - if str, return all matches according to the named threshold. **dim** : int, optional The dim to search for shapelets. **metric** : str or callable, optional The distance metric See ``_SUBSEQUENCE_METRICS.keys()`` for a list of supported metrics. **metric_params** : dict, optional Parameters to the metric. Read more about the parameters in the :ref:`User guide `. **scale** : bool, optional If True, scale the subsequences before distance computation. .. versionadded:: 1.3 **max_matches** : int, optional Return the top `max_matches` matches below `threshold`. **exclude** : float or int, optional Exclude trivial matches in the vicinity of the match. - if float, the exclusion zone is computed as ``math.ceil(exclude * y.size)`` - if int, the exclusion zone is exact A match is considered trivial if a match with lower distance is within `exclude` timesteps of another match with higher distance. **return_distance** : bool, optional - if True, return the distance of the match. **n_jobs** : int, optional The number of parallel jobs to run. :Returns: **indicies** : ndarray of shape (n_samples, ) or (n_matches, ) The start index of matching subsequences. Returns a single array of n_matches if x.ndim == 1. If no matches are found for a sample, the array element is None. **distance** : ndarray of shape (n_samples, ), optional The distances of matching subsequences. Returns a single array of n_matches if x.ndim == 1. If no matches are found for a sample, the array element is None. .. warning:: Passing a callable to the `metric` parameter has a significant performance implication. .. !! processed by numpydoc !!