wildboar.linear_model#

Linear methods for both classification and regression.

Package Contents#

Classes#

CastorClassifier

A dictionary based method using dilated competing shapelets.

DilatedShapeletClassifier

A classifier that uses random dilated shapelets.

HydraClassifier

A Dictionary based method using convolutional kernels.

RandomShapeletClassifier

A classifier that uses random shapelets.

RandomShapeletRegressor

A regressor that uses random shapelets.

RocketClassifier

Implements the ROCKET classifier.

RocketRegressor

Implements the ROCKET regressor.

class wildboar.linear_model.CastorClassifier(n_groups=64, n_shapelets=8, *, metric='euclidean', metric_params=None, normalize_prob=0.8, shapelet_size=11, lower=0.05, upper=0.1, order=1, soft_min=True, soft_max=False, soft_threshold=True, alphas=(0.1, 1.0, 10.0), fit_intercept=True, scoring=None, cv=None, class_weight=None, normalize='sparse', random_state=None, n_jobs=None)[source]#

A dictionary based method using dilated competing shapelets.

Parameters:
n_groupsint, optional

The number of groups of dilated shapelets.

n_shapeletsint, optional

The number of dilated shapelets per group.

metricstr or callable, optional

The distance metric

See _METRICS.keys() for a list of supported metrics.

metric_paramsdict, optional

Parameters to the metric.

Read more about the parameters in the User guide.

normalize_probfloat, optional

The probability of standardizing a shapelet with zero mean and unit standard deviation.

shapelet_sizeint, optional

The length of the dilated shapelet.

lowerfloat, optional

The lower percentile to draw distance thresholds above.

upperfloat, optional

The upper percentile to draw distance thresholds below.

orderint or array-like, optional

The order of difference.

If int, half the groups with corresponding shapelets will be convolved with the order discrete difference along the time dimension.

soft_minbool, optional

If True, use the sum of minimal distances. Otherwise, use the count of minimal distances.

soft_maxbool, optional

If True, use the sum of maximal distances. Otherwise, use the count of maximal distances.

soft_thresholdbool, optional

If True, count the time steps below the threshold for all shapelets. Otherwise, count the time steps below the threshold for the shapelet with the minimal distance.

alphasarray-like of shape (n_alphas,), optional

Array of alpha values to try.

fit_interceptbool, optional

Whether to calculate the intercept for this model.

scoringstr, callable, optional

A string or a scorer callable object with signature scorer(estimator, X, y).

cvint, cross-validation generator or an iterable, optional

Determines the cross-validation splitting strategy.

class_weightdict or ‘balanced’, optional

Weights associated with classes in the form {class_label: weight}.

normalize“sparse” or bool, optional

Standardize before fitting. By default use datasets.preprocess.SparseScaler to standardize the attributes. Set to False to disable or True to use StandardScaler.

random_stateint or RandomState, optional

Controls the random sampling of kernels.

  • If int, random_state is the seed used by the random number generator.

  • If numpy.random.RandomState instance, random_state is the random number generator.

  • If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

n_jobsint, optional

The number of parallel jobs.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

score(X, y, sample_weight=None)[source]#

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

class wildboar.linear_model.DilatedShapeletClassifier(n_shapelets=1000, *, metric='euclidean', metric_params=None, normalize_prob=0.8, min_shapelet_size=None, max_shapelet_size=None, shapelet_size=None, lower=0.05, upper=0.1, alphas=(0.1, 1.0, 10.0), fit_intercept=True, scoring=None, cv=None, class_weight=None, normalize=True, random_state=None, n_jobs=None)[source]#

A classifier that uses random dilated shapelets.

Parameters:
n_shapeletsint, optional

The number of dilated shapelets.

metricstr or callable, optional

The distance metric

See _METRICS.keys() for a list of supported metrics.

metric_paramsdict, optional

Parameters to the metric.

Read more about the parameters in the User guide.

normalize_probfloat, optional

The probability of standardizing a shapelet with zero mean and unit standard deviation.

min_shapelet_sizefloat, optional

The minimum shapelet size. If None, use the discrete sizes in shapelet_size.

max_shapelet_sizefloat, optional

The maximum shapelet size. If None, use the discrete sizes in shapelet_size.

shapelet_sizearray-like, optional

The size of shapelets.

lowerfloat, optional

The lower percentile to draw distance thresholds above.

upperfloat, optional

The upper percentile to draw distance thresholds below.

alphasarray-like of shape (n_alphas,), optional

Array of alpha values to try.

fit_interceptbool, optional

Whether to calculate the intercept for this model.

scoringstr, callable, optional

A string or a scorer callable object with signature scorer(estimator, X, y).

cvint, cross-validation generator or an iterable, optional

Determines the cross-validation splitting strategy.

class_weightdict or ‘balanced’, optional

Weights associated with classes in the form {class_label: weight}.

normalizebool, optional

Standardize before fitting.

random_stateint or RandomState, optional

Controls the random sampling of kernels.

  • If int, random_state is the seed used by the random number generator.

  • If numpy.random.RandomState instance, random_state is the random number generator.

  • If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

n_jobsint, optional

The number of parallel jobs.

References

Antoine Guillaume, Christel Vrain, Elloumi Wael

Random Dilated Shapelet Transform: A New Approach for Time Series Shapelets Pattern Recognition and Artificial Intelligence, 2022

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

score(X, y, sample_weight=None)[source]#

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

class wildboar.linear_model.HydraClassifier(*, n_groups=64, n_kernels=8, kernel_size=9, sampling='normal', sampling_params=None, order=1, alphas=(0.1, 1.0, 10.0), fit_intercept=True, scoring=None, cv=None, class_weight=None, normalize='sparse', n_jobs=None, random_state=None)[source]#

A Dictionary based method using convolutional kernels.

Parameters:
n_groupsint, optional

The number of groups of kernels.

n_kernelsint, optional

The number of kernels per group.

kernel_sizeint, optional

The size of the kernel.

sampling{“normal”}, optional

The strategy for sampling kernels. By default kernel weights are sampled from a normal distribution with zero mean and unit standard deviation.

sampling_paramsdict, optional

Parameters to the sampling approach. The “normal” sampler accepts two parameters: mean and scale.

orderint, optional

The order of difference. If set, half the groups with corresponding kernels will be convolved with the order discrete difference along the time dimension.

alphasarray-like of shape (n_alphas,), optional

Array of alpha values to try.

fit_interceptbool, optional

Whether to calculate the intercept for this model.

scoringstr, callable, optional

A string or a scorer callable object with signature scorer(estimator, X, y).

cvint, cross-validation generator or an iterable, optional

Determines the cross-validation splitting strategy.

class_weightdict or ‘balanced’, optional

Weights associated with classes in the form {class_label: weight}.

normalizebool, optional

Standardize before fitting. By default use datasets.preprocess.SparseScaler to standardize the attributes. Set to False to disable or True to use StandardScaler.

n_jobsint, optional

The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.

random_stateint or RandomState, optional

Controls the random resampling of the original dataset.

  • If int, random_state is the seed used by the random number generator.

  • If numpy.random.RandomState instance, random_state is the random number generator.

  • If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

References

Dempster, A., Schmidt, D. F., & Webb, G. I. (2023).

Hydra: competing convolutional kernels for fast and accurate time series classification. Data Mining and Knowledge Discovery

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

score(X, y, sample_weight=None)[source]#

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

class wildboar.linear_model.RandomShapeletClassifier(n_shapelets=1000, *, metric='euclidean', metric_params=None, min_shapelet_size=0.1, max_shapelet_size=1.0, alphas=(0.1, 1.0, 10.0), fit_intercept=True, normalize=False, scoring=None, cv=None, class_weight=None, n_jobs=None, random_state=None)[source]#

A classifier that uses random shapelets.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

score(X, y, sample_weight=None)[source]#

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

class wildboar.linear_model.RandomShapeletRegressor(n_shapelets=1000, *, metric='euclidean', metric_params=None, min_shapelet_size=0.1, max_shapelet_size=1.0, alphas=(0.1, 1.0, 10.0), fit_intercept=True, normalize=False, scoring=None, cv=None, gcv_mode=None, n_jobs=None, random_state=None)[source]#

A regressor that uses random shapelets.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

score(X, y, sample_weight=None)[source]#

Return the coefficient of determination of the prediction.

The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True values for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

\(R^2\) of self.predict(X) w.r.t. y.

Notes

The \(R^2\) score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score. This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

class wildboar.linear_model.RocketClassifier(n_kernels=10000, *, kernel_size=None, sampling='normal', sampling_params=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, alphas=(0.1, 1.0, 10.0), fit_intercept=True, scoring=None, cv=None, class_weight=None, normalize=True, n_jobs=None, random_state=None)[source]#

Implements the ROCKET classifier.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

score(X, y, sample_weight=None)[source]#

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

class wildboar.linear_model.RocketRegressor(n_kernels=10000, *, alphas=(0.1, 1.0, 10.0), fit_intercept=True, scoring=None, cv=None, gcv_mode=None, n_jobs=None, random_state=None)[source]#

Implements the ROCKET regressor.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

score(X, y, sample_weight=None)[source]#

Return the coefficient of determination of the prediction.

The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True values for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

\(R^2\) of self.predict(X) w.r.t. y.

Notes

The \(R^2\) score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score. This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.