wildboar.linear_model#

Linear methods for both classification and regression.

Classes#

CastorClassifier

A dictionary based method using dilated competing shapelets.

CastorRegressor

A dictionary based method using dilated competing shapelets.

DilatedShapeletClassifier

A classifier that uses random dilated shapelets.

HydraClassifier

A Dictionary based method using convolutional kernels.

RandomShapeletClassifier

A classifier that uses random shapelets.

RandomShapeletRegressor

A regressor that uses random shapelets.

RocketClassifier

A classifier using Rocket transform.

RocketRegressor

A regressor using Rocket transform.


class wildboar.linear_model.CastorClassifier(n_groups=64, n_shapelets=8, *, metric='euclidean', metric_params=None, normalize_prob=0.8, shapelet_size=11, lower=0.05, upper=0.1, order=1, soft_min=True, soft_max=False, soft_threshold=True, alphas=(0.1, 1.0, 10.0), fit_intercept=True, scoring=None, cv=None, class_weight=None, normalize='sparse', random_state=None, n_jobs=None)[source]#

A dictionary based method using dilated competing shapelets.

Parameters:
n_groupsint, optional

The number of groups of dilated shapelets.

n_shapeletsint, optional

The number of dilated shapelets per group.

metricstr or callable, optional

The distance metric

See _METRICS.keys() for a list of supported metrics.

metric_paramsdict, optional

Parameters to the metric.

Read more about the parameters in the User guide.

normalize_probfloat, optional

The probability of standardizing a shapelet with zero mean and unit standard deviation.

shapelet_sizeint, optional

The length of the dilated shapelet.

lowerfloat, optional

The lower percentile to draw distance thresholds above.

upperfloat, optional

The upper percentile to draw distance thresholds below.

orderint or array-like, optional

The order of difference.

If int, half the groups with corresponding shapelets will be convolved with the order discrete difference along the time dimension.

soft_minbool, optional

If True, use the sum of minimal distances. Otherwise, use the count of minimal distances.

soft_maxbool, optional

If True, use the sum of maximal distances. Otherwise, use the count of maximal distances.

soft_thresholdbool, optional

If True, count the time steps below the threshold for all shapelets. Otherwise, count the time steps below the threshold for the shapelet with the minimal distance.

alphasarray-like of shape (n_alphas,), optional

Array of alpha values to try.

fit_interceptbool, optional

Whether to calculate the intercept for this model.

scoringstr, callable, optional

A string or a scorer callable object with signature scorer(estimator, X, y).

cvint, cross-validation generator or an iterable, optional

Determines the cross-validation splitting strategy.

class_weightdict or ‘balanced’, optional

Weights associated with classes in the form {class_label: weight}.

normalize“sparse” or bool, optional

Standardize before fitting. By default use datasets.preprocess.SparseScaler to standardize the attributes. Set to False to disable or True to use StandardScaler.

random_stateint or RandomState, optional

Controls the random sampling of kernels.

  • If int, random_state is the seed used by the random number generator.

  • If numpy.random.RandomState instance, random_state is the random number generator.

  • If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

n_jobsint, optional

The number of parallel jobs.

Notes

For better performance with multivariate datasets, set n_shapelets to n_shapelets * n_dims to ensure feature variability.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

score(X, y, sample_weight=None)[source]#

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

class wildboar.linear_model.CastorRegressor(n_groups=64, n_shapelets=8, *, metric='euclidean', metric_params=None, normalize_prob=0.8, shapelet_size=11, lower=0.05, upper=0.1, order=1, soft_min=True, soft_max=False, soft_threshold=True, alphas=(0.1, 1.0, 10.0), fit_intercept=True, scoring=None, cv=None, normalize='sparse', random_state=None, n_jobs=None)[source]#

A dictionary based method using dilated competing shapelets.

Parameters:
n_groupsint, optional

The number of groups of dilated shapelets.

n_shapeletsint, optional

The number of dilated shapelets per group.

metricstr or callable, optional

The distance metric

See _METRICS.keys() for a list of supported metrics.

metric_paramsdict, optional

Parameters to the metric.

Read more about the parameters in the User guide.

normalize_probfloat, optional

The probability of standardizing a shapelet with zero mean and unit standard deviation.

shapelet_sizeint, optional

The length of the dilated shapelet.

lowerfloat, optional

The lower percentile to draw distance thresholds above.

upperfloat, optional

The upper percentile to draw distance thresholds below.

orderint or array-like, optional

The order of difference.

If int, half the groups with corresponding shapelets will be convolved with the order discrete difference along the time dimension.

soft_minbool, optional

If True, use the sum of minimal distances. Otherwise, use the count of minimal distances.

soft_maxbool, optional

If True, use the sum of maximal distances. Otherwise, use the count of maximal distances.

soft_thresholdbool, optional

If True, count the time steps below the threshold for all shapelets. Otherwise, count the time steps below the threshold for the shapelet with the minimal distance.

alphasarray-like of shape (n_alphas,), optional

Array of alpha values to try.

fit_interceptbool, optional

Whether to calculate the intercept for this model.

scoringstr, callable, optional

A string or a scorer callable object with signature scorer(estimator, X, y).

cvint, cross-validation generator or an iterable, optional

Determines the cross-validation splitting strategy.

normalize“sparse” or bool, optional

Standardize before fitting. By default use datasets.preprocess.SparseScaler to standardize the attributes. Set to False to disable or True to use StandardScaler.

random_stateint or RandomState, optional

Controls the random sampling of kernels.

  • If int, random_state is the seed used by the random number generator.

  • If numpy.random.RandomState instance, random_state is the random number generator.

  • If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

n_jobsint, optional

The number of parallel jobs.

Notes

For better performance with multivariate datasets, set n_shapelets to n_shapelets * n_dims to ensure feature variability.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

score(X, y, sample_weight=None)[source]#

Return the coefficient of determination of the prediction.

The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True values for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

\(R^2\) of self.predict(X) w.r.t. y.

Notes

The \(R^2\) score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score. This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

class wildboar.linear_model.DilatedShapeletClassifier(n_shapelets=1000, *, metric='euclidean', metric_params=None, normalize_prob=0.8, min_shapelet_size=None, max_shapelet_size=None, shapelet_size=None, lower=0.05, upper=0.1, alphas=(0.1, 1.0, 10.0), fit_intercept=True, scoring=None, cv=None, class_weight=None, normalize=True, random_state=None, n_jobs=None)[source]#

A classifier that uses random dilated shapelets.

Parameters:
n_shapeletsint, optional

The number of dilated shapelets.

metricstr or callable, optional

The distance metric

See _METRICS.keys() for a list of supported metrics.

metric_paramsdict, optional

Parameters to the metric.

Read more about the parameters in the User guide.

normalize_probfloat, optional

The probability of standardizing a shapelet with zero mean and unit standard deviation.

min_shapelet_sizefloat, optional

The minimum shapelet size. If None, use the discrete sizes in shapelet_size.

max_shapelet_sizefloat, optional

The maximum shapelet size. If None, use the discrete sizes in shapelet_size.

shapelet_sizearray-like, optional

The size of shapelets.

lowerfloat, optional

The lower percentile to draw distance thresholds above.

upperfloat, optional

The upper percentile to draw distance thresholds below.

alphasarray-like of shape (n_alphas,), optional

Array of alpha values to try.

fit_interceptbool, optional

Whether to calculate the intercept for this model.

scoringstr, callable, optional

A string or a scorer callable object with signature scorer(estimator, X, y).

cvint, cross-validation generator or an iterable, optional

Determines the cross-validation splitting strategy.

class_weightdict or ‘balanced’, optional

Weights associated with classes in the form {class_label: weight}.

normalizebool, optional

Standardize before fitting.

random_stateint or RandomState, optional

Controls the random sampling of kernels.

  • If int, random_state is the seed used by the random number generator.

  • If numpy.random.RandomState instance, random_state is the random number generator.

  • If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

n_jobsint, optional

The number of parallel jobs.

References

Antoine Guillaume, Christel Vrain, Elloumi Wael

Random Dilated Shapelet Transform: A New Approach for Time Series Shapelets Pattern Recognition and Artificial Intelligence, 2022

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

score(X, y, sample_weight=None)[source]#

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

class wildboar.linear_model.HydraClassifier(*, n_groups=64, n_kernels=8, kernel_size=9, sampling='normal', sampling_params=None, order=1, alphas=(0.1, 1.0, 10.0), fit_intercept=True, scoring=None, cv=None, class_weight=None, normalize='sparse', n_jobs=None, random_state=None)[source]#

A Dictionary based method using convolutional kernels.

Parameters:
n_groupsint, optional

The number of groups of kernels.

n_kernelsint, optional

The number of kernels per group.

kernel_sizeint, optional

The size of the kernel.

sampling{“normal”}, optional

The strategy for sampling kernels. By default kernel weights are sampled from a normal distribution with zero mean and unit standard deviation.

sampling_paramsdict, optional

Parameters to the sampling approach. The “normal” sampler accepts two parameters: mean and scale.

orderint, optional

The order of difference. If set, half the groups with corresponding kernels will be convolved with the order discrete difference along the time dimension.

alphasarray-like of shape (n_alphas,), optional

Array of alpha values to try.

fit_interceptbool, optional

Whether to calculate the intercept for this model.

scoringstr, callable, optional

A string or a scorer callable object with signature scorer(estimator, X, y).

cvint, cross-validation generator or an iterable, optional

Determines the cross-validation splitting strategy.

class_weightdict or ‘balanced’, optional

Weights associated with classes in the form {class_label: weight}.

normalizebool, optional

Standardize before fitting. By default use datasets.preprocess.SparseScaler to standardize the attributes. Set to False to disable or True to use StandardScaler.

n_jobsint, optional

The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.

random_stateint or RandomState, optional

Controls the random resampling of the original dataset.

  • If int, random_state is the seed used by the random number generator.

  • If numpy.random.RandomState instance, random_state is the random number generator.

  • If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

References

Dempster, A., Schmidt, D. F., & Webb, G. I. (2023).

Hydra: competing convolutional kernels for fast and accurate time series classification. Data Mining and Knowledge Discovery

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

score(X, y, sample_weight=None)[source]#

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

class wildboar.linear_model.RandomShapeletClassifier(n_shapelets=1000, *, metric='euclidean', metric_params=None, min_shapelet_size=0.1, max_shapelet_size=1.0, coverage_probability=None, variability=None, alphas=(0.1, 1.0, 10.0), fit_intercept=True, normalize=False, scoring=None, cv=None, class_weight=None, random_state=None, n_jobs=None)[source]#

A classifier that uses random shapelets.

Parameters:
n_shapeletsint or {“log2”, “sqrt”, “auto”}, optional

The number of shapelets in the resulting transform.

  • if, “auto” the number of shapelets depend on the value of strategy. For “best” the number is 1; and for “random” it is 1000.

  • if, “log2”, the number of shaplets is the log2 of the total possible number of shapelets.

  • if, “sqrt”, the number of shaplets is the square root of the total possible number of shapelets.

metricstr or list, optional
  • If str, the distance metric used to identify the best shapelet.

  • If list, multiple metrics specified as a list of tuples, where the first

    element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specify a grid over the argument ‘r’ with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).

Read more about the metrics and their parameters in the User guide.

metric_paramsdict, optional

Parameters for the distance measure. Ignored unless metric is a string.

Read more about the parameters in the User guide.

min_shapelet_sizefloat, optional

Minimum shapelet size.

max_shapelet_sizefloat, optional

Maximum shapelet size.

coverage_probabilityfloat, optional

The probability that a time step is covered by a shapelet, in the range 0 < coverage_probability <= 1.

  • For larger coverage_probability, we get larger shapelets.

  • For smaller coverage_probability, we get shorter shapelets.

variabilityfloat, optional

Controls the shape of the Beta distribution used to sample shapelets. Defaults to 1.

  • Higher variability creates more uniform intervals.

  • Lower variability creates more variable intervals sizes.

alphasarray-like of shape (n_alphas,), optional

Array of alpha values to try.

fit_interceptbool, optional

Whether to calculate the intercept for this model.

normalizebool, optional

Standardize before fitting.

scoringstr, callable, optional

A string or a scorer callable object with signature scorer(estimator, X, y).

cvint, cross-validation generator or an iterable, optional

Determines the cross-validation splitting strategy.

class_weightdict or ‘balanced’, optional

Weights associated with classes in the form {class_label: weight}.

random_stateint or RandomState, optional

Controls the random sampling of kernels.

  • If int, random_state is the seed used by the random number generator.

  • If numpy.random.RandomState instance, random_state is the random number generator.

  • If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

n_jobsint, optional

The number of parallel jobs.

References

Wistuba, Martin, Josif Grabocka, and Lars Schmidt-Thieme.

Ultra-fast shapelets for time series classification. arXiv preprint arXiv:1503.05018 (2015).

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

score(X, y, sample_weight=None)[source]#

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

class wildboar.linear_model.RandomShapeletRegressor(n_shapelets=1000, *, metric='euclidean', metric_params=None, min_shapelet_size=0.1, max_shapelet_size=1.0, coverage_probability=None, variability=None, alphas=(0.1, 1.0, 10.0), fit_intercept=True, normalize=False, scoring=None, cv=None, gcv_mode=None, random_state=None, n_jobs=None)[source]#

A regressor that uses random shapelets.

Parameters:
n_shapeletsint or {“log2”, “sqrt”, “auto”}, optional

The number of shapelets in the resulting transform.

  • if, “auto” the number of shapelets depend on the value of strategy. For “best” the number is 1; and for “random” it is 1000.

  • if, “log2”, the number of shaplets is the log2 of the total possible number of shapelets.

  • if, “sqrt”, the number of shaplets is the square root of the total possible number of shapelets.

metricstr or list, optional
  • If str, the distance metric used to identify the best shapelet.

  • If list, multiple metrics specified as a list of tuples, where the first

    element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specify a grid over the argument ‘r’ with 10 values in the range 0 to 1, we would give the following specification: dict(min_r=0, max_r=1, num_r=10).

Read more about the metrics and their parameters in the User guide.

metric_paramsdict, optional

Parameters for the distance measure. Ignored unless metric is a string.

Read more about the parameters in the User guide.

min_shapelet_sizefloat, optional

Minimum shapelet size.

max_shapelet_sizefloat, optional

Maximum shapelet size.

coverage_probabilityfloat, optional

The probability that a time step is covered by a shapelet, in the range 0 < coverage_probability <= 1.

  • For larger coverage_probability, we get larger shapelets.

  • For smaller coverage_probability, we get shorter shapelets.

variabilityfloat, optional

Controls the shape of the Beta distribution used to sample shapelets. Defaults to 1.

  • Higher variability creates more uniform intervals.

  • Lower variability creates more variable intervals sizes.

alphasarray-like of shape (n_alphas,), optional

Array of alpha values to try.

fit_interceptbool, optional

Whether to calculate the intercept for this model.

normalizebool, optional

Standardize before fitting.

scoringstr, callable, optional

A string or a scorer callable object with signature scorer(estimator, X, y).

cvint, cross-validation generator or an iterable, optional

Determines the cross-validation splitting strategy.

gcv_mode{‘auto’, ‘svd’, ‘eigen’}, optional

Flag indicating which strategy to use when performing Leave-One-Out Cross-Validation. Options are:

'auto' : use 'svd' if n_samples > n_features, otherwise use 'eigen'
'svd' : force use of singular value decomposition of X when X is
    dense, eigenvalue decomposition of X^T.X when X is sparse.
'eigen' : force computation via eigendecomposition of X.X^T

The ‘auto’ mode is the default and is intended to pick the cheaper option of the two depending on the shape of the training data.

random_stateint or RandomState, optional

Controls the random sampling of kernels.

  • If int, random_state is the seed used by the random number generator.

  • If numpy.random.RandomState instance, random_state is the random number generator.

  • If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

n_jobsint, optional

The number of parallel jobs.

References

Wistuba, Martin, Josif Grabocka, and Lars Schmidt-Thieme.

Ultra-fast shapelets for time series classification. arXiv preprint arXiv:1503.05018 (2015).

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

score(X, y, sample_weight=None)[source]#

Return the coefficient of determination of the prediction.

The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True values for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

\(R^2\) of self.predict(X) w.r.t. y.

Notes

The \(R^2\) score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score. This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

class wildboar.linear_model.RocketClassifier(n_kernels=10000, *, sampling='normal', sampling_params=None, kernel_size=None, min_size=None, max_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, alphas=(0.1, 1.0, 10.0), fit_intercept=True, scoring=None, cv=None, class_weight=None, normalize=True, random_state=None, n_jobs=None)[source]#

A classifier using Rocket transform.

Parameters:
n_kernelsint, optional

The number of kernels to sample at each node.

sampling{“normal”, “uniform”, “shapelet”}, optional

The sampling of convolutional filters.

  • if “normal”, sample filter according to a normal distribution with mean and scale.

  • if “uniform”, sample filter according to a uniform distribution with lower and upper.

  • if “shapelet”, sample filters as subsequences in the training data.

sampling_paramsdict, optional

Parameters for the sampling strategy.

  • if “normal”, {"mean": float, "scale": float}, defaults to {"mean": 0, "scale": 1}.

  • if “uniform”, {"lower": float, "upper": float}, defaults to {"lower": -1, "upper": 1}.

kernel_sizearray-like, optional

The kernel size, by default [7, 11, 13].

min_sizefloat, optional

The minimum timestep size used for generating kernel sizes, If set, kernel_size is ignored.

max_sizefloat, optional

The maximum timestep size used for generating kernel sizes, If set, kernel_size is ignored.

bias_probfloat, optional

The probability of using the bias term.

normalize_probfloat, optional

The probability of performing normalization.

padding_probfloat, optional

The probability of padding with zeros.

alphasarray-like of shape (n_alphas,), optional

Array of alpha values to try.

fit_interceptbool, optional

Whether to calculate the intercept for this model.

scoringstr, callable, optional

A string or a scorer callable object with signature scorer(estimator, X, y).

cvint, cross-validation generator or an iterable, optional

Determines the cross-validation splitting strategy.

class_weightdict or ‘balanced’, optional

Weights associated with classes in the form {class_label: weight}.

normalize“sparse” or bool, optional

Standardize before fitting. By default use datasets.preprocess.SparseScaler to standardize the attributes. Set to False to disable or True to use StandardScaler.

random_stateint or RandomState, optional

Controls the random resampling of the original dataset.

  • If int, random_state is the seed used by the random number generator.

  • If numpy.random.RandomState instance, random_state is the random number generator.

  • If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

n_jobsint, optional

The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

score(X, y, sample_weight=None)[source]#

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

class wildboar.linear_model.RocketRegressor(n_kernels=10000, *, sampling='normal', sampling_params=None, kernel_size=None, min_size=None, max_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, alphas=(0.1, 1.0, 10.0), fit_intercept=True, scoring=None, cv=None, gcv_mode=None, normalize=True, random_state=None, n_jobs=None)[source]#

A regressor using Rocket transform.

Parameters:
n_kernelsint, optional

The number of kernels to sample at each node.

sampling{“normal”, “uniform”, “shapelet”}, optional

The sampling of convolutional filters.

  • if “normal”, sample filter according to a normal distribution with mean and scale.

  • if “uniform”, sample filter according to a uniform distribution with lower and upper.

  • if “shapelet”, sample filters as subsequences in the training data.

sampling_paramsdict, optional

Parameters for the sampling strategy.

  • if “normal”, {"mean": float, "scale": float}, defaults to {"mean": 0, "scale": 1}.

  • if “uniform”, {"lower": float, "upper": float}, defaults to {"lower": -1, "upper": 1}.

kernel_sizearray-like, optional

The kernel size, by default [7, 11, 13].

min_sizefloat, optional

The minimum timestep size used for generating kernel sizes, If set, kernel_size is ignored.

max_sizefloat, optional

The maximum timestep size used for generating kernel sizes, If set, kernel_size is ignored.

bias_probfloat, optional

The probability of using the bias term.

normalize_probfloat, optional

The probability of performing normalization.

padding_probfloat, optional

The probability of padding with zeros.

alphasarray-like of shape (n_alphas,), optional

Array of alpha values to try.

fit_interceptbool, optional

Whether to calculate the intercept for this model.

scoringstr, callable, optional

A string or a scorer callable object with signature scorer(estimator, X, y).

cvint, cross-validation generator or an iterable, optional

Determines the cross-validation splitting strategy.

gcv_mode{‘auto’, ‘svd’, ‘eigen’}, optional

Flag indicating which strategy to use when performing Leave-One-Out Cross-Validation. Options are:

'auto' : use 'svd' if n_samples > n_features, otherwise use 'eigen'
'svd' : force use of singular value decomposition of X when X is
    dense, eigenvalue decomposition of X^T.X when X is sparse.
'eigen' : force computation via eigendecomposition of X.X^T

The ‘auto’ mode is the default and is intended to pick the cheaper option of the two depending on the shape of the training data.

normalize“sparse” or bool, optional

Standardize before fitting. By default use datasets.preprocess.SparseScaler to standardize the attributes. Set to False to disable or True to use StandardScaler.

random_stateint or RandomState, optional

Controls the random resampling of the original dataset.

  • If int, random_state is the seed used by the random number generator.

  • If numpy.random.RandomState instance, random_state is the random number generator.

  • If None, the random number generator is the numpy.random.RandomState instance used by numpy.random.

n_jobsint, optional

The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

score(X, y, sample_weight=None)[source]#

Return the coefficient of determination of the prediction.

The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters:
Xarray-like of shape (n_samples, n_features)

Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True values for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns:
scorefloat

\(R^2\) of self.predict(X) w.r.t. y.

Notes

The \(R^2\) score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score. This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.