wildboar.metrics#

Evaluation metrics.

Package Contents#

Functions#

compactness_score(x_factual, x_counterfactual, *[, ...])

Compute compactness score.

plausability_score(x_plausible, x_counterfactuals, *)

Compute plausibility score.

proximity_score(x_factual, x_counterfactual[, metric, ...])

Compute proximity score.

redudancy_score(estimator, x_factual, ...[, ...])

Compute the redudancy score.

relative_proximity_score(x_native, x_factual, ...[, ...])

Compute relative proximity score.

silhouette_samples(x, labels, *[, metric, metric_params])

Compute the Silhouette Coefficient of each samples.

silhouette_score(x, labels, *[, metric, ...])

Compute the mean Silhouette Coefficient of all samples.

validity_score(y_predicted, y_counterfactual[, ...])

Compute validity score.

wildboar.metrics.compactness_score(x_factual, x_counterfactual, *, window=None, n_bins=None, atol=1e-08, average=True)[source]#

Compute compactness score.

The compactness of counterfactuals as measured by the fraction of changed timesteps. The fewer timesteps have changed between the original and the counterfactual, the lower the score.

Parameters:
x_factualarray-like of shape (n_samples, n_timesteps) or (n_samples, n_dims, n_timeteps)

The true samples.

x_counterfactualarray-like of shape (n_samples, n_timesteps) or (n_samples, n_dims, n_timeteps)

The counterfactual samples.

windowint, optional

If set, evaluate the difference between windows of specified size.

n_binsint, optional

If set, evaluate the set overlap of SAX transformed series.

atolfloat, optional

The absolute tolerance.

averagebool, optional

Compute average score over all dimensions.

Returns:
float

The compactness score. Lower score indicates more compact counterfactuals.

Notes

The samples in x_counterfactual and x_factual should be aligned such that the i:th counterfacutal sample is derived from the i:th factual sample.

References

Karlsson, I., Rebane, J., Papapetrou, P., & Gionis, A. (2020).

Locally and globally explainable time series tweaking. Knowledge and Information Systems, 62(5), 1671-1700.

wildboar.metrics.plausability_score(x_plausible, x_counterfactuals, *, y_plausible=None, y_counterfactual=None, estimator=None, method='accuracy', average=True)[source]#

Compute plausibility score.

Parameters:
x_plausiblearray-like of shape (n_samples, n_timesteps)

The plausible samples, typically the training or testing samples.

x_counterfactualsarray-like of shape (m_samples, n_timesteps)

The counterfactual samples.

y_plausiblearray-like of shape (n_samples, ), optional

The labels of the plausible samples.

y_counterfactualarray-like of shape (m_samples, ), optional

The desired label of the counterfactuals.

estimatorestimator, optional

The outlier estimator, must implement fit and predict. If None, we use LocalOutlierFactor.

  • if score=”mean”, the estimator must also implement decision_function.

method{‘score’, ‘accuracy’}, optional

The score function.

averagebool, optional

If True, return the average score for all labels in y_counterfactual; otherwise, return the score for the individual labels (ordered as np.unique).

Returns:
ndarray or float

The plausability.

  • if method=’scores’, the mean score is returned, with larger score incicating better performance.

  • if method=’accuracy’, the fraction of plausible counterfactuals are returned.

  • if y_counterfactual is None and average=False, the scores or accuracy for each counterfactual label is returned.

References

Delaney, E., Greene, D., & Keane, M. T. (2020).

Instance-based Counterfactual Explanations for Time Series Classification. arXiv, 2009.13211v2.

wildboar.metrics.proximity_score(x_factual, x_counterfactual, metric='normalized_euclidean', metric_params=None)[source]#

Compute proximity score.

The closer the counterfactual is to the original, the lower the score.

Parameters:
x_factualarray-like of shape (n_samples, n_timestep)

The true samples.

x_counterfactualarray-like of shape (n_samples, n_timestep)

The counterfactual samples.

metricstr or callable, optional

The distance metric

See _METRICS.keys() for a list of supported metrics.

metric_paramsdict, optional

Parameters to the metric.

Read more about the parameters in the User guide.

Returns:
float

The mean proximity.

Notes

The samples in x_counterfactual and x_factual should be aligned such that the i:th counterfacutal sample is derived from the i:th factual sample.

References

Delaney, E., Greene, D., & Keane, M. T. (2020).

Instance-based Counterfactual Explanations for Time Series Classification. arXiv, 2009.13211v2.

Karlsson, I., Rebane, J., Papapetrou, P., & Gionis, A. (2020).

Locally and globally explainable time series tweaking. Knowledge and Information Systems, 62(5), 1671-1700.

wildboar.metrics.redudancy_score(estimator, x_factual, x_counterfactual, y_counterfactual, *, n_intervals='sqrt', window=None, average=True)[source]#

Compute the redudancy score.

Redundancy is measure of how much impact non-overlapping intervals has in the construction of the counterfactuals.

Parameters:
estimatorEstimator

The estimator counterfactuals are computed for.

x_factualarray-like of shape (n_samples, n_timestep)

The factual samples, i.e., samples for which counterfactuals are computed.

x_counterfactualarray-like of shape (n_samples, n_timestep)

The counterfactual samples.

y_counterfactualarray-like of shape (n_samples, )

The desired counterfactual label.

n_intervals{“sqrt”, “log2”}, int or float, optional

The number of intervals.

windowint, optional

The size of an interval. If set, n_intervals is ignored.

averagebool, optional

Return the average redundancy over all intervals.

Returns:
ndarray of shape (n_intervals, ) or float

The redundancy of each interval, expressed as the fraction of samples that have the same label if the interval is replaced with the corresponding interval of the factual sample. If average is True, return a single float.

Notes

The samples in x_counterfactual and x_factual should be aligned such that the i:th counterfacutal sample is derived from the i:th factual sample.

wildboar.metrics.relative_proximity_score(x_native, x_factual, x_counterfactual, *, y_native=None, y_counterfactual=None, metric='euclidean', metric_params=None, average=True)[source]#

Compute relative proximity score.

The relative proximity score captures the mean proximity of counterfactual and test sample pairs over mean proximity of the closest native counterfactual. The lower the score, the better.

Parameters:
x_nativearray-like of shape (n_natives, n_timesteps)

The native counterfactual candidates. If y_counterfactual is None, the full array is considered as possible native counterfactuals. Typically, native counterfactual candidates correspond to samples which are labeled as the desired counterfactual label.

x_factualarray-like of shape (n_counterfactuals, n_timesteps)

The factual samples, i.e., the samples for which the counterfactuals where computed.

x_counterfactualarray-like of shape (n_counterfactuals, n_timesteps)

The counterfactual samples.

y_nativearray-like of shape (n_natives, ), optional

The label of the native counterfactual candidates.

y_counterfactualarray-like of shape (n_counterfactuals, ), optional

The desired counterfactual label.

metricstr or callable, optional

The distance metric

See _METRICS.keys() for a list of supported metrics.

metric_paramsdict, optional

Parameters to the metric.

Read more about the parameters in the User guide.

averagebool, optional

Average the relative proximity of all labels in y_counterfactual.

Returns:
ndarray or float

The relative proximity. If avarege=False and y_counterfactual is not None, return the relative proximity for each counterfactual label.

Notes

The samples in x_counterfactual and x_factual should be aligned such that the i:th counterfacutal sample is derived from the i:th factual sample.

References

Smyth, B., & Keane, M. T. (2021).

A Few Good Counterfactuals: Generating Interpretable, Plausible and Diverse Counterfactual Explanations. arXiv, 2101.09056v1.

wildboar.metrics.silhouette_samples(x, labels, *, metric='euclidean', metric_params=None)[source]#

Compute the Silhouette Coefficient of each samples.

Parameters:
xunivariate time-series or multivariate time-series

The input time series.

labelsarray-like of shape (n_samples,)

Predicted labels for each sample.

metricstr or callable, optional

The metric to use when calculating distance between time series.

metric_paramsdict, optional

The metric parameters. Read more about the metrics and their parameters in the User guide.

Returns:
ndarray of shape (n_samples, )

Silhouette Coefficient for each samples.

Notes

This is a convenient wrapper around sklearn.metrics.silhouette_samples using Wildboar native metrics.

wildboar.metrics.silhouette_score(x, labels, *, metric='euclidean', metric_params=None, sample_size=None, random_state=None)[source]#

Compute the mean Silhouette Coefficient of all samples.

Parameters:
xunivariate time-series or multivariate time-series

The input time series.

labelsarray-like of shape (n_samples,)

Predicted labels for each sample.

metricstr or callable, optional

The metric to use when calculating distance between time series.

metric_paramsdict, optional

The metric parameters. Read more about the metrics and their parameters in the User guide.

sample_sizeint, optional

The size of the sample to use when computing the Silhouette Coefficient on a random subset of the data. If sample_size is None, no sampling is used.

random_stateint or RandomState, optional

Determines random number generation for selecting a subset of samples. Used when sample_size is not None.

Returns:
float

Mean Silhouette Coefficient for all samples.

Notes

This is a convenient wrapper around sklearn.metrics.silhouette_score using Wildboar native metrics.

wildboar.metrics.validity_score(y_predicted, y_counterfactual, sample_weight=None)[source]#

Compute validity score.

The number counterfactuals that have the desired label.

Parameters:
y_predictedarray-like of shape (n_samples, )

The predicted label.

y_counterfactualarray-like of shape (n_samples, )

The predicted label.

sample_weightarray-like of shape (n_samples, ), optional

The sample weight.

Returns:
float

The fraction of counterfactuals with the correct label. Larger is better.

References

Delaney, E., Greene, D., & Keane, M. T. (2020).

Instance-based Counterfactual Explanations for Time Series Classification. arXiv, 2009.13211v2.

Karlsson, I., Rebane, J., Papapetrou, P., & Gionis, A. (2020).

Locally and globally explainable time series tweaking. Knowledge and Information Systems, 62(5), 1671-1700.