wildboar.dimension_selection
#
Select a subset of dimensions
Classes#
Distance selector that removes dimensions with low variance. |
|
ElbowClassSum (ECS) dimension selector. |
|
Select the fraction of dimensions with largest score. |
|
Select dimensions with a p-value below alpha. |
|
Select the dimensions with the k highest scores. |
|
Sequentially select a subset of dimensions. |
- class wildboar.dimension_selection.DistanceVarianceThreshold(threshold=0, *, sample=None, metric='euclidean', metric_params=None, random_state=None, n_jobs=None)[source]#
Distance selector that removes dimensions with low variance.
This dimension selector is suitable for unsupervised learning since it only considers the input data and not the labels.
For each dimension, the pairwise distance between time series is computed and dimensions with variance below the specified threshold are removed.
- Parameters:
- thresholdfloat, optional
The variance threshold.
- sampleint or float, optional
Draw a sample of time series for the pairwise distance calculation.
If None, use all samples.
If float, use the specified fraction of samples.
If int, use the specified number of samples.
- metricstr, optional
The distance metric.
- metric_paramsdict, optional
Optional parameters to the distance metric.
Read more about the metrics and their parameters in the User guide.
- random_stateint or RandomState, optional
Controls the random sampling of kernels.
If int, random_state is the seed used by the random number generator.
If
numpy.random.RandomState
instance, random_state is the random number generator.If None, the random number generator is the
numpy.random.RandomState
instance used bynumpy.random
.
- n_jobsint, optional
The number of parallel jobs.
Examples
>>> from wildboar.datasets import load_ering >>> from wildboar.dimension_selection import DistanceVarianceThreshold >>> X, y = load_ering() >>> dv = DistanceVarianceThreshold(threshold=9) >>> dv.fit(X, y) DistanceVarianceThreshold(threshold=9) >>> dv.get_dimensions() array([ True, False, True, True]) >>> dv.transform(X).shape
- fit(X, y=None)[source]#
Learn the dimensions to select.
- Parameters:
- Xarray-like of shape (n_samples, n_dims, n_timestep)
The training samples.
- yarray-like of shape (n_samples, ), optional
Ignored.
- Returns:
- object
The instance itself.
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_dimensions(indices=False)[source]#
Get a boolean mask with the selected dimensions.
- Parameters:
- indicesbool, optional
If True, return the indices instead of a boolean mask.
- Returns:
- ndarray of shape (n_selected_dims, )
An index that selects the retained dimensions.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- inverse_transform(X)[source]#
Reverse the transformation.
- Parameters:
- Xarray-like of shape (n_samples, n_selected_dims, n_timestep)
The samples.
- Returns:
- ndarray of shape (n_samples, n_dims, n_timestep)
The samples with zeros inserted where dimensions would have been removed by
transform
.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.dimension_selection.ECSSelector(prototype='mean', r=None, metric='euclidean', metric_params=None)[source]#
ElbowClassSum (ECS) dimension selector.
Select time series dimensions based on the sum of distances between pairs of classes.
- Parameters:
- prototype{“mean”, “median”, “dtw”}, optional
The method for computing the class prototypes.
- rfloat, optional
The warping width if prototype is “dtw”.
- metricstr, optional
The distance metric.
- metric_paramsdict, optional
Optional parameters to the distance metric.
Read more about the metrics and their parameters in the User guide.
- fit(X, y)[source]#
Learn the dimensions to select.
- Parameters:
- Xarray-like of shape (n_samples, n_dims, n_timestep)
The training samples.
- yarray-like of shape (n_samples, )
Ignored.
- Returns:
- object
The instance itself.
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_dimensions(indices=False)[source]#
Get a boolean mask with the selected dimensions.
- Parameters:
- indicesbool, optional
If True, return the indices instead of a boolean mask.
- Returns:
- ndarray of shape (n_selected_dims, )
An index that selects the retained dimensions.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- inverse_transform(X)[source]#
Reverse the transformation.
- Parameters:
- Xarray-like of shape (n_samples, n_selected_dims, n_timestep)
The samples.
- Returns:
- ndarray of shape (n_samples, n_dims, n_timestep)
The samples with zeros inserted where dimensions would have been removed by
transform
.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.dimension_selection.SelectDimensionPercentile(score_func=f_classif, *, sample=None, percentile=10, metric='euclidean', metric_params=None, n_jobs=None, random_state=None)[source]#
Select the fraction of dimensions with largest score.
For each dimension, the pairwise distance between time series is computed and dimensions with the lowest scores are removed.
- Parameters:
- score_funccallable, optional
Function taking two arrays X and y and returning scores and optionally p-values. The default is
f_classif
.- sampleint or float, optional
Draw a sample of time series for the pairwise distance calculation.
If None, use all samples.
If float, use the specified fraction of samples.
If int, use the specified number of samples.
- percentilefloat, optional
Percent of dimensions to retain.
- metricstr, optional
The distance metric.
- metric_paramsdict, optional
Optional parameters to the distance metric.
Read more about the metrics and their parameters in the User guide.
- random_stateint or RandomState, optional
Controls the random sampling of kernels.
If int, random_state is the seed used by the random number generator.
If
numpy.random.RandomState
instance, random_state is the random number generator.If None, the random number generator is the
numpy.random.RandomState
instance used bynumpy.random
.
- random_stateint or RandomState, optional
Controls the random sampling of kernels.
If int, random_state is the seed used by the random number generator.
If
numpy.random.RandomState
instance, random_state is the random number generator.If None, the random number generator is the
numpy.random.RandomState
instance used bynumpy.random
.
- n_jobsint, optional
The number of parallel jobs.
Examples
>>> from wildboar.datasets import load_ering >>> from wildboar.dimension_selection import SelectDimensionPercentile >>> X, y = load_ering() >>> sdp = SelectDimensionPercentile(percentile=50) >>> sdp.fit(X, y) SelectDimensionPercentile(percentile=50) >>> sdp.get_dimensions() array([False, False, True, True]) >>> sdp.transform(X).shape
- fit(X, y=None)[source]#
Learn the dimensions to select.
- Parameters:
- Xarray-like of shape (n_samples, n_dims, n_timestep)
The training samples.
- yarray-like of shape (n_samples, ), optional
Ignored.
- Returns:
- object
The instance itself.
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_dimensions(indices=False)[source]#
Get a boolean mask with the selected dimensions.
- Parameters:
- indicesbool, optional
If True, return the indices instead of a boolean mask.
- Returns:
- ndarray of shape (n_selected_dims, )
An index that selects the retained dimensions.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- inverse_transform(X)[source]#
Reverse the transformation.
- Parameters:
- Xarray-like of shape (n_samples, n_selected_dims, n_timestep)
The samples.
- Returns:
- ndarray of shape (n_samples, n_dims, n_timestep)
The samples with zeros inserted where dimensions would have been removed by
transform
.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.dimension_selection.SelectDimensionSignificance(score_func=f_classif, *, sample=None, alpha=0.05, method='fpr', metric='euclidean', metric_params=None, random_state=None, n_jobs=None)[source]#
Select dimensions with a p-value below alpha.
For each dimension, the pairwise distance between time series is computed and dimensions with p-values above alpha is removed.
- Parameters:
- score_funccallable, optional
Function taking two arrays X and y and returning scores and optionally p-values. The default is
f_classif
.- sampleint or float, optional
Draw a sample of time series for the pairwise distance calculation.
If None, use all samples.
If float, use the specified fraction of samples.
If int, use the specified number of samples.
- alphaint, optional
Percent of dimensions to retain.
- method{“fpr”, “fdr”, “fwe”}, optional
The method for correcting the alpha value.
If “fpr”, false positive rate, apply no correction.
If “fdr”, false discovery rate, apply the Benjamini-Hochberg procedure.
If “fwer”, family-wise error rate, apply the Bonferroni procedure.
- metricstr, optional
The distance metric.
- metric_paramsdict, optional
Optional parameters to the distance metric.
Read more about the metrics and their parameters in the User guide.
- random_stateint or RandomState, optional
Controls the random sampling of kernels.
If int, random_state is the seed used by the random number generator.
If
numpy.random.RandomState
instance, random_state is the random number generator.If None, the random number generator is the
numpy.random.RandomState
instance used bynumpy.random
.
- n_jobsint, optional
The number of parallel jobs.
Examples
>>> from wildboar.datasets import load_basic_motions >>> from wildboar.dimension_selection import SelectDimensionSignificance >>> X, y = load_basic_motions() >>> sds = SelectDimensionSignificance(alpha=0.01) >>> sds.fit(X, y) SelectDimensionSignificance(alpha=0.01) >>> sds.get_dimensions() array([ True, True, True, True, True, True]) >>> sds.transform(X).shape (80, 6, 100)
- fit(X, y=None)[source]#
Learn the dimensions to select.
- Parameters:
- Xarray-like of shape (n_samples, n_dims, n_timestep)
The training samples.
- yarray-like of shape (n_samples, ), optional
Ignored.
- Returns:
- object
The instance itself.
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_dimensions(indices=False)[source]#
Get a boolean mask with the selected dimensions.
- Parameters:
- indicesbool, optional
If True, return the indices instead of a boolean mask.
- Returns:
- ndarray of shape (n_selected_dims, )
An index that selects the retained dimensions.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- inverse_transform(X)[source]#
Reverse the transformation.
- Parameters:
- Xarray-like of shape (n_samples, n_selected_dims, n_timestep)
The samples.
- Returns:
- ndarray of shape (n_samples, n_dims, n_timestep)
The samples with zeros inserted where dimensions would have been removed by
transform
.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.dimension_selection.SelectDimensionTopK(score_func=f_classif, *, sample=None, k=None, metric='euclidean', metric_params=None, random_state=None, n_jobs=None)[source]#
Select the dimensions with the k highest scores.
For each dimension, the pairwise distance between time series is computed and dimensions with the lowest scores are removed.
- Parameters:
- score_funccallable, optional
Function taking two arrays X and y and returning scores and optionally p-values. The default is
f_classif
.- sampleint or float, optional
Draw a sample of time series for the pairwise distance calculation.
If None, use all samples.
If float, use the specified fraction of samples.
If int, use the specified number of samples.
- kint, optional
The number of top dimensions to select.
- metricstr, optional
The distance metric.
- metric_paramsdict, optional
Optional parameters to the distance metric.
Read more about the metrics and their parameters in the User guide.
- random_stateint or RandomState, optional
Controls the random sampling of kernels.
If int, random_state is the seed used by the random number generator.
If
numpy.random.RandomState
instance, random_state is the random number generator.If None, the random number generator is the
numpy.random.RandomState
instance used bynumpy.random
.
- n_jobsint, optional
The number of parallel jobs.
Examples
>>> from wildboar.datasets import load_ering >>> from wildboar.dimension_selection import SelectDimensionTopK >>> X, y = load_ering() >>> sdt = SelectDimensionTopK(k=1) >>> sdt.fit(X, y) SelectDimensionTopK(k=1) >>> sdt.get_dimensions() array([False, False, True, False]) >>> sdt.transform(X).shape (300, 65)
- fit(X, y=None)[source]#
Learn the dimensions to select.
- Parameters:
- Xarray-like of shape (n_samples, n_dims, n_timestep)
The training samples.
- yarray-like of shape (n_samples, ), optional
Ignored.
- Returns:
- object
The instance itself.
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_dimensions(indices=False)[source]#
Get a boolean mask with the selected dimensions.
- Parameters:
- indicesbool, optional
If True, return the indices instead of a boolean mask.
- Returns:
- ndarray of shape (n_selected_dims, )
An index that selects the retained dimensions.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- inverse_transform(X)[source]#
Reverse the transformation.
- Parameters:
- Xarray-like of shape (n_samples, n_selected_dims, n_timestep)
The samples.
- Returns:
- ndarray of shape (n_samples, n_dims, n_timestep)
The samples with zeros inserted where dimensions would have been removed by
transform
.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- class wildboar.dimension_selection.SequentialDimensionSelector(estimator, *, n_dims='auto', cv=5, scoring=None, direction='forward', tol=None)[source]#
Sequentially select a subset of dimensions.
Sequentially select a set of dimensions by adding (forward) or removing (backward) dimensions to greedily form a subset. At each iteration, the algorithm chooses the best dimension to add or remove based on the cross validation score.
- Parameters:
- estimatorestimator
An unfitted estimator.
- n_dims{“auto”} or int, optional
The number of dimensions to select.
If “auto”, the behavior depends on tol:
if tol is not None, dimensions are selected as long as the increase in performance is larger than tol.
otherwise, we select half of the dimensions.
If integer, n_dims is the number of dimensions to select.
- cvint, cross-validation generator or an iterable, optional
The cross-validation splitting strategy.
- scoringstr or callable, optional
A str (see: The scoring parameter: defining model evaluation rules) or callable to evaluate the predictions on the test set.
- direction{“forward”, “backward”}, optional
Backward of forward selection.
- tolfloat, optional
The tolerance. If the score is not increased by tol between two iterations, return.
If direction=”backward”, tol can be negative to reduce the number of dimensions.
Examples
>>> from wildboar.datasets import load_ering >>> from wildboar.dimension_selection import SequentialDimensionSelector >>> from wildboar.distance import KNeighborsClassifier >>> X, y = load_ering() >>> clf = KNeighborsClassifier() >>> sds = SequentialDimensionSelector(clf, n_dims=2) >>> sds.fit(X, y) SequentialDimensionSelector(estimator=KNeighborsClassifier(), n_dims=2) >>> sds.get_dimensions() array([ True, False, False, True]) >>> sds.transform(X).shape (300, 2, 65)
- fit(X, y)[source]#
Learn the dimensions to select.
- Parameters:
- Xarray-like of shape (n_samples, n_dims, n_timestep)
The training samples.
- yarray-like of shape (n_samples, )
The training labels.
- Returns:
- object
The instance itself.
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_dimensions(indices=False)[source]#
Get a boolean mask with the selected dimensions.
- Parameters:
- indicesbool, optional
If True, return the indices instead of a boolean mask.
- Returns:
- ndarray of shape (n_selected_dims, )
An index that selects the retained dimensions.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- inverse_transform(X)[source]#
Reverse the transformation.
- Parameters:
- Xarray-like of shape (n_samples, n_selected_dims, n_timestep)
The samples.
- Returns:
- ndarray of shape (n_samples, n_dims, n_timestep)
The samples with zeros inserted where dimensions would have been removed by
transform
.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.