wildboar.linear_model
#
Linear methods for both classification and regression.
Package Contents#
Classes#
A dictionary based method using dilated competing shapelets. |
|
A classifier that uses random dilated shapelets. |
|
A Dictionary based method using convolutional kernels. |
|
A classifier that uses random shapelets. |
|
A regressor that uses random shapelets. |
|
Implements the ROCKET classifier. |
|
Implements the ROCKET regressor. |
- class wildboar.linear_model.CastorClassifier(n_groups=64, n_shapelets=8, *, metric='euclidean', metric_params=None, normalize_prob=0.8, shapelet_size=11, lower=0.05, upper=0.1, order=1, soft_min=True, soft_max=False, soft_threshold=True, alphas=(0.1, 1.0, 10.0), fit_intercept=True, scoring=None, cv=None, class_weight=None, normalize='sparse', random_state=None, n_jobs=None)[source]#
A dictionary based method using dilated competing shapelets.
- Parameters:
- n_groupsint, optional
The number of groups of dilated shapelets.
- n_shapeletsint, optional
The number of dilated shapelets per group.
- metricstr or callable, optional
The distance metric
See
_METRICS.keys()
for a list of supported metrics.- metric_paramsdict, optional
Parameters to the metric.
Read more about the parameters in the User guide.
- normalize_probfloat, optional
The probability of standardizing a shapelet with zero mean and unit standard deviation.
- shapelet_sizeint, optional
The length of the dilated shapelet.
- lowerfloat, optional
The lower percentile to draw distance thresholds above.
- upperfloat, optional
The upper percentile to draw distance thresholds below.
- orderint or array-like, optional
The order of difference.
If int, half the groups with corresponding shapelets will be convolved with the order discrete difference along the time dimension.
- soft_minbool, optional
If True, use the sum of minimal distances. Otherwise, use the count of minimal distances.
- soft_maxbool, optional
If True, use the sum of maximal distances. Otherwise, use the count of maximal distances.
- soft_thresholdbool, optional
If True, count the time steps below the threshold for all shapelets. Otherwise, count the time steps below the threshold for the shapelet with the minimal distance.
- alphasarray-like of shape (n_alphas,), optional
Array of alpha values to try.
- fit_interceptbool, optional
Whether to calculate the intercept for this model.
- scoringstr, callable, optional
A string or a scorer callable object with signature scorer(estimator, X, y).
- cvint, cross-validation generator or an iterable, optional
Determines the cross-validation splitting strategy.
- class_weightdict or ‘balanced’, optional
Weights associated with classes in the form {class_label: weight}.
- normalize“sparse” or bool, optional
Standardize before fitting. By default use
datasets.preprocess.SparseScaler
to standardize the attributes. Set to False to disable or True to use StandardScaler.- random_stateint or RandomState, optional
Controls the random sampling of kernels.
If int, random_state is the seed used by the random number generator.
If
numpy.random.RandomState
instance, random_state is the random number generator.If None, the random number generator is the
numpy.random.RandomState
instance used bynumpy.random
.
- n_jobsint, optional
The number of parallel jobs.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- score(X, y, sample_weight=None)[source]#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.linear_model.DilatedShapeletClassifier(n_shapelets=1000, *, metric='euclidean', metric_params=None, normalize_prob=0.8, min_shapelet_size=None, max_shapelet_size=None, shapelet_size=None, lower=0.05, upper=0.1, alphas=(0.1, 1.0, 10.0), fit_intercept=True, scoring=None, cv=None, class_weight=None, normalize=True, random_state=None, n_jobs=None)[source]#
A classifier that uses random dilated shapelets.
- Parameters:
- n_shapeletsint, optional
The number of dilated shapelets.
- metricstr or callable, optional
The distance metric
See
_METRICS.keys()
for a list of supported metrics.- metric_paramsdict, optional
Parameters to the metric.
Read more about the parameters in the User guide.
- normalize_probfloat, optional
The probability of standardizing a shapelet with zero mean and unit standard deviation.
- min_shapelet_sizefloat, optional
The minimum shapelet size. If None, use the discrete sizes in shapelet_size.
- max_shapelet_sizefloat, optional
The maximum shapelet size. If None, use the discrete sizes in shapelet_size.
- shapelet_sizearray-like, optional
The size of shapelets.
- lowerfloat, optional
The lower percentile to draw distance thresholds above.
- upperfloat, optional
The upper percentile to draw distance thresholds below.
- alphasarray-like of shape (n_alphas,), optional
Array of alpha values to try.
- fit_interceptbool, optional
Whether to calculate the intercept for this model.
- scoringstr, callable, optional
A string or a scorer callable object with signature scorer(estimator, X, y).
- cvint, cross-validation generator or an iterable, optional
Determines the cross-validation splitting strategy.
- class_weightdict or ‘balanced’, optional
Weights associated with classes in the form {class_label: weight}.
- normalizebool, optional
Standardize before fitting.
- random_stateint or RandomState, optional
Controls the random sampling of kernels.
If int, random_state is the seed used by the random number generator.
If
numpy.random.RandomState
instance, random_state is the random number generator.If None, the random number generator is the
numpy.random.RandomState
instance used bynumpy.random
.
- n_jobsint, optional
The number of parallel jobs.
References
- Antoine Guillaume, Christel Vrain, Elloumi Wael
Random Dilated Shapelet Transform: A New Approach for Time Series Shapelets Pattern Recognition and Artificial Intelligence, 2022
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- score(X, y, sample_weight=None)[source]#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.linear_model.HydraClassifier(*, n_groups=64, n_kernels=8, kernel_size=9, sampling='normal', sampling_params=None, order=1, alphas=(0.1, 1.0, 10.0), fit_intercept=True, scoring=None, cv=None, class_weight=None, normalize='sparse', n_jobs=None, random_state=None)[source]#
A Dictionary based method using convolutional kernels.
- Parameters:
- n_groupsint, optional
The number of groups of kernels.
- n_kernelsint, optional
The number of kernels per group.
- kernel_sizeint, optional
The size of the kernel.
- sampling{“normal”}, optional
The strategy for sampling kernels. By default kernel weights are sampled from a normal distribution with zero mean and unit standard deviation.
- sampling_paramsdict, optional
Parameters to the sampling approach. The “normal” sampler accepts two parameters: mean and scale.
- orderint, optional
The order of difference. If set, half the groups with corresponding kernels will be convolved with the order discrete difference along the time dimension.
- alphasarray-like of shape (n_alphas,), optional
Array of alpha values to try.
- fit_interceptbool, optional
Whether to calculate the intercept for this model.
- scoringstr, callable, optional
A string or a scorer callable object with signature scorer(estimator, X, y).
- cvint, cross-validation generator or an iterable, optional
Determines the cross-validation splitting strategy.
- class_weightdict or ‘balanced’, optional
Weights associated with classes in the form {class_label: weight}.
- normalizebool, optional
Standardize before fitting. By default use
datasets.preprocess.SparseScaler
to standardize the attributes. Set to False to disable or True to use StandardScaler.- n_jobsint, optional
The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.
- random_stateint or RandomState, optional
Controls the random resampling of the original dataset.
If int, random_state is the seed used by the random number generator.
If
numpy.random.RandomState
instance, random_state is the random number generator.If None, the random number generator is the
numpy.random.RandomState
instance used bynumpy.random
.
References
- Dempster, A., Schmidt, D. F., & Webb, G. I. (2023).
Hydra: competing convolutional kernels for fast and accurate time series classification. Data Mining and Knowledge Discovery
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- score(X, y, sample_weight=None)[source]#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.linear_model.RandomShapeletClassifier(n_shapelets=1000, *, metric='euclidean', metric_params=None, min_shapelet_size=0.1, max_shapelet_size=1.0, alphas=(0.1, 1.0, 10.0), fit_intercept=True, normalize=False, scoring=None, cv=None, class_weight=None, n_jobs=None, random_state=None)[source]#
A classifier that uses random shapelets.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- score(X, y, sample_weight=None)[source]#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.linear_model.RandomShapeletRegressor(n_shapelets=1000, *, metric='euclidean', metric_params=None, min_shapelet_size=0.1, max_shapelet_size=1.0, alphas=(0.1, 1.0, 10.0), fit_intercept=True, normalize=False, scoring=None, cv=None, gcv_mode=None, n_jobs=None, random_state=None)[source]#
A regressor that uses random shapelets.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- score(X, y, sample_weight=None)[source]#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True values for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
\(R^2\) of
self.predict(X)
w.r.t. y.
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.linear_model.RocketClassifier(n_kernels=10000, *, kernel_size=None, sampling='normal', sampling_params=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, alphas=(0.1, 1.0, 10.0), fit_intercept=True, scoring=None, cv=None, class_weight=None, normalize=True, n_jobs=None, random_state=None)[source]#
Implements the ROCKET classifier.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- score(X, y, sample_weight=None)[source]#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.linear_model.RocketRegressor(n_kernels=10000, *, alphas=(0.1, 1.0, 10.0), fit_intercept=True, scoring=None, cv=None, gcv_mode=None, n_jobs=None, random_state=None)[source]#
Implements the ROCKET regressor.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- score(X, y, sample_weight=None)[source]#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True values for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
\(R^2\) of
self.predict(X)
w.r.t. y.
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.