wildboar.linear_model
#
Linear methods for both classification and regression.
Classes#
A dictionary based method using dilated competing shapelets. |
|
A dictionary based method using dilated competing shapelets. |
|
A classifier that uses random dilated shapelets. |
|
A Dictionary based method using convolutional kernels. |
|
A classifier that uses random shapelets. |
|
A regressor that uses random shapelets. |
|
A classifier using Rocket transform. |
|
A regressor using Rocket transform. |
- class wildboar.linear_model.CastorClassifier(n_groups=64, n_shapelets=8, *, metric='euclidean', metric_params=None, normalize_prob=0.8, shapelet_size=11, lower=0.05, upper=0.1, order=1, soft_min=True, soft_max=False, soft_threshold=True, alphas=(0.1, 1.0, 10.0), fit_intercept=True, scoring=None, cv=None, class_weight=None, normalize='sparse', random_state=None, n_jobs=None)[source]#
A dictionary based method using dilated competing shapelets.
- Parameters:
- n_groupsint, optional
The number of groups of dilated shapelets.
- n_shapeletsint, optional
The number of dilated shapelets per group.
- metricstr or callable, optional
The distance metric
See
_METRICS.keys()
for a list of supported metrics.- metric_paramsdict, optional
Parameters to the metric.
Read more about the parameters in the User guide.
- normalize_probfloat, optional
The probability of standardizing a shapelet with zero mean and unit standard deviation.
- shapelet_sizeint, optional
The length of the dilated shapelet.
- lowerfloat, optional
The lower percentile to draw distance thresholds above.
- upperfloat, optional
The upper percentile to draw distance thresholds below.
- orderint or array-like, optional
The order of difference.
If int, half the groups with corresponding shapelets will be convolved with the order discrete difference along the time dimension.
- soft_minbool, optional
If True, use the sum of minimal distances. Otherwise, use the count of minimal distances.
- soft_maxbool, optional
If True, use the sum of maximal distances. Otherwise, use the count of maximal distances.
- soft_thresholdbool, optional
If True, count the time steps below the threshold for all shapelets. Otherwise, count the time steps below the threshold for the shapelet with the minimal distance.
- alphasarray-like of shape (n_alphas,), optional
Array of alpha values to try.
- fit_interceptbool, optional
Whether to calculate the intercept for this model.
- scoringstr, callable, optional
A string or a scorer callable object with signature scorer(estimator, X, y).
- cvint, cross-validation generator or an iterable, optional
Determines the cross-validation splitting strategy.
- class_weightdict or ‘balanced’, optional
Weights associated with classes in the form {class_label: weight}.
- normalize“sparse” or bool, optional
Standardize before fitting. By default use
datasets.preprocess.SparseScaler
to standardize the attributes. Set to False to disable or True to use StandardScaler.- random_stateint or RandomState, optional
Controls the random sampling of kernels.
If int, random_state is the seed used by the random number generator.
If
numpy.random.RandomState
instance, random_state is the random number generator.If None, the random number generator is the
numpy.random.RandomState
instance used bynumpy.random
.
- n_jobsint, optional
The number of parallel jobs.
Notes
For better performance with multivariate datasets, set n_shapelets to n_shapelets * n_dims to ensure feature variability.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- score(X, y, sample_weight=None)[source]#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.linear_model.CastorRegressor(n_groups=64, n_shapelets=8, *, metric='euclidean', metric_params=None, normalize_prob=0.8, shapelet_size=11, lower=0.05, upper=0.1, order=1, soft_min=True, soft_max=False, soft_threshold=True, alphas=(0.1, 1.0, 10.0), fit_intercept=True, scoring=None, cv=None, normalize='sparse', random_state=None, n_jobs=None)[source]#
A dictionary based method using dilated competing shapelets.
- Parameters:
- n_groupsint, optional
The number of groups of dilated shapelets.
- n_shapeletsint, optional
The number of dilated shapelets per group.
- metricstr or callable, optional
The distance metric
See
_METRICS.keys()
for a list of supported metrics.- metric_paramsdict, optional
Parameters to the metric.
Read more about the parameters in the User guide.
- normalize_probfloat, optional
The probability of standardizing a shapelet with zero mean and unit standard deviation.
- shapelet_sizeint, optional
The length of the dilated shapelet.
- lowerfloat, optional
The lower percentile to draw distance thresholds above.
- upperfloat, optional
The upper percentile to draw distance thresholds below.
- orderint or array-like, optional
The order of difference.
If int, half the groups with corresponding shapelets will be convolved with the order discrete difference along the time dimension.
- soft_minbool, optional
If True, use the sum of minimal distances. Otherwise, use the count of minimal distances.
- soft_maxbool, optional
If True, use the sum of maximal distances. Otherwise, use the count of maximal distances.
- soft_thresholdbool, optional
If True, count the time steps below the threshold for all shapelets. Otherwise, count the time steps below the threshold for the shapelet with the minimal distance.
- alphasarray-like of shape (n_alphas,), optional
Array of alpha values to try.
- fit_interceptbool, optional
Whether to calculate the intercept for this model.
- scoringstr, callable, optional
A string or a scorer callable object with signature scorer(estimator, X, y).
- cvint, cross-validation generator or an iterable, optional
Determines the cross-validation splitting strategy.
- normalize“sparse” or bool, optional
Standardize before fitting. By default use
datasets.preprocess.SparseScaler
to standardize the attributes. Set to False to disable or True to use StandardScaler.- random_stateint or RandomState, optional
Controls the random sampling of kernels.
If int, random_state is the seed used by the random number generator.
If
numpy.random.RandomState
instance, random_state is the random number generator.If None, the random number generator is the
numpy.random.RandomState
instance used bynumpy.random
.
- n_jobsint, optional
The number of parallel jobs.
Notes
For better performance with multivariate datasets, set n_shapelets to n_shapelets * n_dims to ensure feature variability.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- score(X, y, sample_weight=None)[source]#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True values for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
\(R^2\) of
self.predict(X)
w.r.t. y.
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.linear_model.DilatedShapeletClassifier(n_shapelets=1000, *, metric='euclidean', metric_params=None, normalize_prob=0.8, min_shapelet_size=None, max_shapelet_size=None, shapelet_size=None, lower=0.05, upper=0.1, alphas=(0.1, 1.0, 10.0), fit_intercept=True, scoring=None, cv=None, class_weight=None, normalize=True, random_state=None, n_jobs=None)[source]#
A classifier that uses random dilated shapelets.
- Parameters:
- n_shapeletsint, optional
The number of dilated shapelets.
- metricstr or callable, optional
The distance metric
See
_METRICS.keys()
for a list of supported metrics.- metric_paramsdict, optional
Parameters to the metric.
Read more about the parameters in the User guide.
- normalize_probfloat, optional
The probability of standardizing a shapelet with zero mean and unit standard deviation.
- min_shapelet_sizefloat, optional
The minimum shapelet size. If None, use the discrete sizes in shapelet_size.
- max_shapelet_sizefloat, optional
The maximum shapelet size. If None, use the discrete sizes in shapelet_size.
- shapelet_sizearray-like, optional
The size of shapelets.
- lowerfloat, optional
The lower percentile to draw distance thresholds above.
- upperfloat, optional
The upper percentile to draw distance thresholds below.
- alphasarray-like of shape (n_alphas,), optional
Array of alpha values to try.
- fit_interceptbool, optional
Whether to calculate the intercept for this model.
- scoringstr, callable, optional
A string or a scorer callable object with signature scorer(estimator, X, y).
- cvint, cross-validation generator or an iterable, optional
Determines the cross-validation splitting strategy.
- class_weightdict or ‘balanced’, optional
Weights associated with classes in the form {class_label: weight}.
- normalizebool, optional
Standardize before fitting.
- random_stateint or RandomState, optional
Controls the random sampling of kernels.
If int, random_state is the seed used by the random number generator.
If
numpy.random.RandomState
instance, random_state is the random number generator.If None, the random number generator is the
numpy.random.RandomState
instance used bynumpy.random
.
- n_jobsint, optional
The number of parallel jobs.
References
- Antoine Guillaume, Christel Vrain, Elloumi Wael
Random Dilated Shapelet Transform: A New Approach for Time Series Shapelets Pattern Recognition and Artificial Intelligence, 2022
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- score(X, y, sample_weight=None)[source]#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.linear_model.HydraClassifier(*, n_groups=64, n_kernels=8, kernel_size=9, sampling='normal', sampling_params=None, order=1, alphas=(0.1, 1.0, 10.0), fit_intercept=True, scoring=None, cv=None, class_weight=None, normalize='sparse', n_jobs=None, random_state=None)[source]#
A Dictionary based method using convolutional kernels.
- Parameters:
- n_groupsint, optional
The number of groups of kernels.
- n_kernelsint, optional
The number of kernels per group.
- kernel_sizeint, optional
The size of the kernel.
- sampling{“normal”}, optional
The strategy for sampling kernels. By default kernel weights are sampled from a normal distribution with zero mean and unit standard deviation.
- sampling_paramsdict, optional
Parameters to the sampling approach. The “normal” sampler accepts two parameters: mean and scale.
- orderint, optional
The order of difference. If set, half the groups with corresponding kernels will be convolved with the order discrete difference along the time dimension.
- alphasarray-like of shape (n_alphas,), optional
Array of alpha values to try.
- fit_interceptbool, optional
Whether to calculate the intercept for this model.
- scoringstr, callable, optional
A string or a scorer callable object with signature scorer(estimator, X, y).
- cvint, cross-validation generator or an iterable, optional
Determines the cross-validation splitting strategy.
- class_weightdict or ‘balanced’, optional
Weights associated with classes in the form {class_label: weight}.
- normalizebool, optional
Standardize before fitting. By default use
datasets.preprocess.SparseScaler
to standardize the attributes. Set to False to disable or True to use StandardScaler.- n_jobsint, optional
The number of jobs to run in parallel. A value of None means using a single core and a value of -1 means using all cores. Positive integers mean the exact number of cores.
- random_stateint or RandomState, optional
Controls the random resampling of the original dataset.
If int, random_state is the seed used by the random number generator.
If
numpy.random.RandomState
instance, random_state is the random number generator.If None, the random number generator is the
numpy.random.RandomState
instance used bynumpy.random
.
References
- Dempster, A., Schmidt, D. F., & Webb, G. I. (2023).
Hydra: competing convolutional kernels for fast and accurate time series classification. Data Mining and Knowledge Discovery
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- score(X, y, sample_weight=None)[source]#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.linear_model.RandomShapeletClassifier(n_shapelets=1000, *, metric='euclidean', metric_params=None, min_shapelet_size=0.1, max_shapelet_size=1.0, coverage_probability=None, variability=None, alphas=(0.1, 1.0, 10.0), fit_intercept=True, normalize=False, scoring=None, cv=None, class_weight=None, random_state=None, n_jobs=None)[source]#
A classifier that uses random shapelets.
- Parameters:
- n_shapeletsint or {“log2”, “sqrt”, “auto”}, optional
The number of shapelets in the resulting transform.
if, “auto” the number of shapelets depend on the value of strategy. For “best” the number is 1; and for “random” it is 1000.
if, “log2”, the number of shaplets is the log2 of the total possible number of shapelets.
if, “sqrt”, the number of shaplets is the square root of the total possible number of shapelets.
- metricstr or list, optional
If str, the distance metric used to identify the best shapelet.
- If list, multiple metrics specified as a list of tuples, where the first
element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specify a grid over the argument ‘r’ with 10 values in the range 0 to 1, we would give the following specification:
dict(min_r=0, max_r=1, num_r=10)
.
Read more about the metrics and their parameters in the User guide.
- metric_paramsdict, optional
Parameters for the distance measure. Ignored unless metric is a string.
Read more about the parameters in the User guide.
- min_shapelet_sizefloat, optional
Minimum shapelet size.
- max_shapelet_sizefloat, optional
Maximum shapelet size.
- coverage_probabilityfloat, optional
The probability that a time step is covered by a shapelet, in the range 0 < coverage_probability <= 1.
For larger coverage_probability, we get larger shapelets.
For smaller coverage_probability, we get shorter shapelets.
- variabilityfloat, optional
Controls the shape of the Beta distribution used to sample shapelets. Defaults to 1.
Higher variability creates more uniform intervals.
Lower variability creates more variable intervals sizes.
- alphasarray-like of shape (n_alphas,), optional
Array of alpha values to try.
- fit_interceptbool, optional
Whether to calculate the intercept for this model.
- normalizebool, optional
Standardize before fitting.
- scoringstr, callable, optional
A string or a scorer callable object with signature scorer(estimator, X, y).
- cvint, cross-validation generator or an iterable, optional
Determines the cross-validation splitting strategy.
- class_weightdict or ‘balanced’, optional
Weights associated with classes in the form {class_label: weight}.
- random_stateint or RandomState, optional
Controls the random sampling of kernels.
If int, random_state is the seed used by the random number generator.
If
numpy.random.RandomState
instance, random_state is the random number generator.If None, the random number generator is the
numpy.random.RandomState
instance used bynumpy.random
.
- n_jobsint, optional
The number of parallel jobs.
References
- Wistuba, Martin, Josif Grabocka, and Lars Schmidt-Thieme.
Ultra-fast shapelets for time series classification. arXiv preprint arXiv:1503.05018 (2015).
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- score(X, y, sample_weight=None)[source]#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.linear_model.RandomShapeletRegressor(n_shapelets=1000, *, metric='euclidean', metric_params=None, min_shapelet_size=0.1, max_shapelet_size=1.0, coverage_probability=None, variability=None, alphas=(0.1, 1.0, 10.0), fit_intercept=True, normalize=False, scoring=None, cv=None, gcv_mode=None, random_state=None, n_jobs=None)[source]#
A regressor that uses random shapelets.
- Parameters:
- n_shapeletsint or {“log2”, “sqrt”, “auto”}, optional
The number of shapelets in the resulting transform.
if, “auto” the number of shapelets depend on the value of strategy. For “best” the number is 1; and for “random” it is 1000.
if, “log2”, the number of shaplets is the log2 of the total possible number of shapelets.
if, “sqrt”, the number of shaplets is the square root of the total possible number of shapelets.
- metricstr or list, optional
If str, the distance metric used to identify the best shapelet.
- If list, multiple metrics specified as a list of tuples, where the first
element of the tuple is a metric name and the second element a dictionary with a parameter grid specification. A parameter grid specification is a dict with two mandatory and one optional key-value pairs defining the lower and upper bound on the values and number of values in the grid. For example, to specify a grid over the argument ‘r’ with 10 values in the range 0 to 1, we would give the following specification:
dict(min_r=0, max_r=1, num_r=10)
.
Read more about the metrics and their parameters in the User guide.
- metric_paramsdict, optional
Parameters for the distance measure. Ignored unless metric is a string.
Read more about the parameters in the User guide.
- min_shapelet_sizefloat, optional
Minimum shapelet size.
- max_shapelet_sizefloat, optional
Maximum shapelet size.
- coverage_probabilityfloat, optional
The probability that a time step is covered by a shapelet, in the range 0 < coverage_probability <= 1.
For larger coverage_probability, we get larger shapelets.
For smaller coverage_probability, we get shorter shapelets.
- variabilityfloat, optional
Controls the shape of the Beta distribution used to sample shapelets. Defaults to 1.
Higher variability creates more uniform intervals.
Lower variability creates more variable intervals sizes.
- alphasarray-like of shape (n_alphas,), optional
Array of alpha values to try.
- fit_interceptbool, optional
Whether to calculate the intercept for this model.
- normalizebool, optional
Standardize before fitting.
- scoringstr, callable, optional
A string or a scorer callable object with signature scorer(estimator, X, y).
- cvint, cross-validation generator or an iterable, optional
Determines the cross-validation splitting strategy.
- gcv_mode{‘auto’, ‘svd’, ‘eigen’}, optional
Flag indicating which strategy to use when performing Leave-One-Out Cross-Validation. Options are:
'auto' : use 'svd' if n_samples > n_features, otherwise use 'eigen' 'svd' : force use of singular value decomposition of X when X is dense, eigenvalue decomposition of X^T.X when X is sparse. 'eigen' : force computation via eigendecomposition of X.X^T
The ‘auto’ mode is the default and is intended to pick the cheaper option of the two depending on the shape of the training data.
- random_stateint or RandomState, optional
Controls the random sampling of kernels.
If int, random_state is the seed used by the random number generator.
If
numpy.random.RandomState
instance, random_state is the random number generator.If None, the random number generator is the
numpy.random.RandomState
instance used bynumpy.random
.
- n_jobsint, optional
The number of parallel jobs.
References
- Wistuba, Martin, Josif Grabocka, and Lars Schmidt-Thieme.
Ultra-fast shapelets for time series classification. arXiv preprint arXiv:1503.05018 (2015).
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- score(X, y, sample_weight=None)[source]#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True values for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
\(R^2\) of
self.predict(X)
w.r.t. y.
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.linear_model.RocketClassifier(n_kernels=10000, *, sampling='normal', sampling_params=None, kernel_size=None, min_size=None, max_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, alphas=(0.1, 1.0, 10.0), fit_intercept=True, scoring=None, cv=None, class_weight=None, normalize=True, random_state=None, n_jobs=None)[source]#
A classifier using Rocket transform.
- Parameters:
- n_kernelsint, optional
The number of kernels to sample at each node.
- sampling{“normal”, “uniform”, “shapelet”}, optional
The sampling of convolutional filters.
if “normal”, sample filter according to a normal distribution with
mean
andscale
.if “uniform”, sample filter according to a uniform distribution with
lower
andupper
.if “shapelet”, sample filters as subsequences in the training data.
- sampling_paramsdict, optional
Parameters for the sampling strategy.
if “normal”,
{"mean": float, "scale": float}
, defaults to{"mean": 0, "scale": 1}
.if “uniform”,
{"lower": float, "upper": float}
, defaults to{"lower": -1, "upper": 1}
.
- kernel_sizearray-like, optional
The kernel size, by default
[7, 11, 13]
.- min_sizefloat, optional
The minimum timestep size used for generating kernel sizes, If set,
kernel_size
is ignored.- max_sizefloat, optional
The maximum timestep size used for generating kernel sizes, If set,
kernel_size
is ignored.- bias_probfloat, optional
The probability of using the bias term.
- normalize_probfloat, optional
The probability of performing normalization.
- padding_probfloat, optional
The probability of padding with zeros.
- alphasarray-like of shape (n_alphas,), optional
Array of alpha values to try.
- fit_interceptbool, optional
Whether to calculate the intercept for this model.
- scoringstr, callable, optional
A string or a scorer callable object with signature scorer(estimator, X, y).
- cvint, cross-validation generator or an iterable, optional
Determines the cross-validation splitting strategy.
- class_weightdict or ‘balanced’, optional
Weights associated with classes in the form {class_label: weight}.
- normalize“sparse” or bool, optional
Standardize before fitting. By default use
datasets.preprocess.SparseScaler
to standardize the attributes. Set to False to disable or True to use StandardScaler.- random_stateint or RandomState, optional
Controls the random resampling of the original dataset.
If
int
,random_state
is the seed used by the random number generator.If
numpy.random.RandomState
instance,random_state
is the random number generator.If
None
, the random number generator is thenumpy.random.RandomState
instance used bynumpy.random
.
- n_jobsint, optional
The number of jobs to run in parallel. A value of
None
means using a single core and a value of-1
means using all cores. Positive integers mean the exact number of cores.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- score(X, y, sample_weight=None)[source]#
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.linear_model.RocketRegressor(n_kernels=10000, *, sampling='normal', sampling_params=None, kernel_size=None, min_size=None, max_size=None, bias_prob=1.0, normalize_prob=1.0, padding_prob=0.5, alphas=(0.1, 1.0, 10.0), fit_intercept=True, scoring=None, cv=None, gcv_mode=None, normalize=True, random_state=None, n_jobs=None)[source]#
A regressor using Rocket transform.
- Parameters:
- n_kernelsint, optional
The number of kernels to sample at each node.
- sampling{“normal”, “uniform”, “shapelet”}, optional
The sampling of convolutional filters.
if “normal”, sample filter according to a normal distribution with
mean
andscale
.if “uniform”, sample filter according to a uniform distribution with
lower
andupper
.if “shapelet”, sample filters as subsequences in the training data.
- sampling_paramsdict, optional
Parameters for the sampling strategy.
if “normal”,
{"mean": float, "scale": float}
, defaults to{"mean": 0, "scale": 1}
.if “uniform”,
{"lower": float, "upper": float}
, defaults to{"lower": -1, "upper": 1}
.
- kernel_sizearray-like, optional
The kernel size, by default
[7, 11, 13]
.- min_sizefloat, optional
The minimum timestep size used for generating kernel sizes, If set,
kernel_size
is ignored.- max_sizefloat, optional
The maximum timestep size used for generating kernel sizes, If set,
kernel_size
is ignored.- bias_probfloat, optional
The probability of using the bias term.
- normalize_probfloat, optional
The probability of performing normalization.
- padding_probfloat, optional
The probability of padding with zeros.
- alphasarray-like of shape (n_alphas,), optional
Array of alpha values to try.
- fit_interceptbool, optional
Whether to calculate the intercept for this model.
- scoringstr, callable, optional
A string or a scorer callable object with signature scorer(estimator, X, y).
- cvint, cross-validation generator or an iterable, optional
Determines the cross-validation splitting strategy.
- gcv_mode{‘auto’, ‘svd’, ‘eigen’}, optional
Flag indicating which strategy to use when performing Leave-One-Out Cross-Validation. Options are:
'auto' : use 'svd' if n_samples > n_features, otherwise use 'eigen' 'svd' : force use of singular value decomposition of X when X is dense, eigenvalue decomposition of X^T.X when X is sparse. 'eigen' : force computation via eigendecomposition of X.X^T
The ‘auto’ mode is the default and is intended to pick the cheaper option of the two depending on the shape of the training data.
- normalize“sparse” or bool, optional
Standardize before fitting. By default use
datasets.preprocess.SparseScaler
to standardize the attributes. Set to False to disable or True to use StandardScaler.- random_stateint or RandomState, optional
Controls the random resampling of the original dataset.
If
int
,random_state
is the seed used by the random number generator.If
numpy.random.RandomState
instance,random_state
is the random number generator.If
None
, the random number generator is thenumpy.random.RandomState
instance used bynumpy.random
.
- n_jobsint, optional
The number of jobs to run in parallel. A value of
None
means using a single core and a value of-1
means using all cores. Positive integers mean the exact number of cores.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- score(X, y, sample_weight=None)[source]#
Return the coefficient of determination of the prediction.
The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares
((y_true - y_pred)** 2).sum()
and \(v\) is the total sum of squares((y_true - y_true.mean()) ** 2).sum()
. The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.- Parameters:
- Xarray-like of shape (n_samples, n_features)
Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape
(n_samples, n_samples_fitted)
, wheren_samples_fitted
is the number of samples used in the fitting for the estimator.- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True values for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns:
- scorefloat
\(R^2\) of
self.predict(X)
w.r.t. y.
Notes
The \(R^2\) score used when calling
score
on a regressor usesmultioutput='uniform_average'
from version 0.23 to keep consistent with default value ofr2_score
. This influences thescore
method of all the multioutput regressors (except forMultiOutputRegressor
).
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.