Metrics#
Wildboar supports both subsequence distance using
pairwise_subsequence_distance
and traditional distances using
pairwise_distance
. These and related functions support different
metrics, as specified by the metric
argument and metric parameters using the
metric_params
argument.
Distance metrics are functions \(d(x, y)\) such that \(d(x, y) < d(x, z)\) when time series \(x\) and \(y\) are “more similar” than \(x\) and \(z\). In Wildboar we use the term loosely to denote any function obeying the above inequality. A true metric () must also satisfy the following:
The distance to itself is always zero i.e., \(d(x, x) = 0\).
The distance between two distinct points is always positive, i.e., \(d(x, y) \gt 0\), provided that \(x\ne y\).
The distance is symmetrical, i.e., \(d(x, y) = d(y, x)\) which is to say that the distance between \(x\) and \(y\) is the same as the distance between \(y\) and $x$.
The triangle inequality holds, \(d(x, z) \lt d(x, y) + d(y, z)\), i.e., there can be no “shortcut” through \(y\) that makes the distance shorter.
Similarity measures fall into three categories.
Non-elastic true metrics such as \(Lp\)-norm that do not support time-shift.
Elastic () metrics that tolerate time-shifts but are not true metrics.
Elastic metrics that tolerate time-shifts, but that are true metrics.
Wildboar distinguishes between subsequence and non-subsequence metrics:
Subsequence metrics compute the \(min_{t'\in t} d(s, t')\) with s.shape[-1] <= t.shape[-1] (i.e., \(s\) is shorter than \(t\)) and the notation represents “taking all subsequences in \(t\) of the same length as \(s\) and compute the distance, taking the minimum”. Subsequence metrics are never true metrics unless
s.shape[-1] == t.shape[-1]
.Metrics compute the distance \(d(s, t)\) with
s.shape[-1] == t.shape[-1]
unless the metric is _elastic_. Elastic metrics support distance computations between time series of unequal length, without computing the minimum distance between equal-length subsequences.
Metric specification#
In Wildboar, some estimators accepts multiple parameterized metrics. For
instance ProximityForestClassifier
and the
EleasticEnsemble
.
A metric specification is a Python dict
with the metric name as key and
another dictionary with the parameters. The parameter dictionary must contain
the keys min_<parameter_name> and max_<parameter_name> and can optionally
contain the key num_<parameter_name>.
For example, to generate a metric specification with DTW and the r parameter in the range 0.01 to 0.1, we can specify it as follows:
metric = {"dtw": {"min_r": 0.01, "max_r": 0.1}}
This will generate a metric configuration with 10 DTW calculations with r set in the range from 0.01 to 0.1 (inclusive).
We can also specify multiple metrics:
metric = {
"dtw": {"min_r": 0.01, "max_r": 0.1},
"euclidean": None,
"msm": {"min_c": 1, "max_c": 100},
}
Note that we can also specify metrics for which there exists only one
instantiation. In the above example, euclidean
does not have any parameters
so we set its value to None
.
Subsequence metrics#
Wildboar implements several subsequence metrics. The elastic subsequence metrics are, as all subsequence metrics, implemented as the minimum distance over a sliding window. As such, if we need the elastic distance between two series and not the minimum distance, we should use the non-subsequence metric. Moreover, subsequence metrics are only true metrics () if we compute the distance between time series of equal length, which makes the minimum subsequence distance equal to the distance itself.
Metric name |
metric |
metric_params |
Comments |
|
---|---|---|---|---|
Euclidean |
“euclidean” |
{} |
||
Normalized Euclidean |
“normalized_euclidean” |
{} |
Euclidean distance, where length has been scaled to have unit norm. Undefined cases result in 0. |
|
Scaled Euclidean |
“scaled_euclidean” or “mass” |
{} |
Scales each subsequence to have zero mean and unit variance. |
|
Manhattan |
“manhattan” |
{} |
||
Minkowski |
“minkowski” |
{p: float} |
||
Chebyshev |
“chebyshev” |
{} |
||
Cosine |
“cosine” |
{} |
||
Angular |
“angular” |
{} |
||
Dynamic time warping |
“dtw” |
{“r”: float} |
Window r in [0, 1] |
|
Weighted DTW |
“wdtw” |
{“r”: float, “g”: float} |
Window r in [0, 1], default 1.0. Phase difference penalty g, default 0.05. |
|
Derivative DTW |
“ddtw” |
{“r”: float} |
Window r in [0, 1], default 1.0. |
|
Weighted Derivative DTW |
“wddtw” |
{“r”: float, “g”: float} |
Window r in [0, 1], default 1.0. Phase difference penalty g, default 0.05. |
|
Scaled DTW |
“scaled_dtw” |
{“r”: float} |
Window r in [0, 1] |
|
Longest common subsequence [2] |
“lcss” |
{r: float, epsilon: float} |
Window r in [0, 1], default 1.0. Match epsilon, default 1.0. |
|
Edit distance with real penalty [4] |
“erp” |
{r: float, g: float} |
Window r in [0, 1], default 1.0. Gap penalty g, default 0. |
|
Edit distance for real sequences [3] |
“edr” |
{r: float, epsilon: float} |
Window r in [0, 1], default 1.0. Match epsilon, default 1/4*max(std(x), std(y)). |
|
Move-split-merge [5] |
“msm” |
{r: float, c: float} |
Window r in [0, 1], default 1.0. Split/merge cost c, default 1. |
|
Time Warp Edit distance [6] |
“twe” |
{r: float, edit_penalty: float, stiffness: float} |
Window r in [0, 1]. Edit penalty (\(\lambda\)), default 1. Stiffness ($nu$), default 0.001. |
Elastic and non-elastic metrics#
Metric name |
metric |
metric_params |
Comments |
||
---|---|---|---|---|---|
Euclidean |
“euclidean” |
{} |
|||
Normalized Euclidean |
“normalized_euclidean” |
{} |
Euclidean distance, where length has been scaled to have unit norm. Undefined cases result in 0. |
||
Manhattan |
“manhattan” |
{} |
|||
Minkowski |
“minkowski” |
{p: float} |
|||
Chebyshev |
“chebyshev” |
{} |
|||
Cosine |
“cosine” |
{} |
|||
Angular |
“angular” |
{} |
|||
Longest common subsequence [2] |
“lcss” |
{r: float, epsilon: float} |
Window r in [0, 1], default 1. Match epsilon, default 1. |
||
Edit distance with real penalty [4] |
“erp” |
{r: float, g: float} |
Window r in [0, 1]. Gap penalty g, default 0. |
||
Edit distance for real sequences [3] |
“edr” |
{r: float, epsilon: float} |
Window r in [0, 1]. Match epsilon, default 1/4*max(std(x), std(y)). |
||
Move-split-merge [5] |
“msm” |
{r: float, c: float} |
Window r in [0, 1]. Split/merge cost c, default 1. |
||
Time Warp Edit distance [6] |
“twe” |
{r: float, edit_penalty: float, stiffness: float} |
Window r in [0, 1]. Edit penalty ($lambda$), default 1. Stiffness ($nu$), default 0.001. |
||
Amercing dynamic time warping [1] |
“dtw” |
{“r”: float, “p”: float} |
Window r in [0, 1]. Penalty p. Larger more penalty for warping. |
||
Dynamic time warping |
“dtw” |
{“r”: float} |
Window r in [0, 1]. |
||
Weighted DTW |
“wdtw” |
{“r”: float, “g”: float} |
Window r in [0, 1]. Phase difference penalty g, default 0.05. |
||
Derivative DTW |
“ddtw” |
{“r”: float} |
Window r in [0, 1]. |
||
Weighted Derivative DTW |
“wddtw” |
{“r”: float, “g”: float} |
Window r in [0, 1]. Phase difference penalty g, default 0.05. |