wildboar.segment
#
Segment time series into regions.
Classes#
Segmenter using the MatrixProfile and corrected ARC curve. |
- class wildboar.segment.FlussSegmenter(n_segments=1, *, window=1.0, exclude=0.2, boundary=0.1, metric='euclidean', metric_params=None, n_jobs=None)[source]#
Segmenter using the MatrixProfile and corrected ARC curve.
Compute the Fast Low-cost Unipotent Semantic Segmentation (FLUSS) as described by Gharghabi (2017).
The algorithm works by analyzing similarity relationships in time series data:
For each position in the time series: - It finds its nearest neighbor (most similar subsequence) - Creates an “arc” connecting these two positions
The arc curve is computed by: - Counting how many arcs pass over each position (including all
positions between the start and end points of each arc)
Normalizing the counts to account for edge effects
The resulting curve is used to find segment boundaries: - Low points (valleys) in the arc curve indicate natural boundaries - These are positions with few similarity relationships crossing them - High arc counts suggest positions within coherent segments
The intuition is that segment boundaries occur where the time series behavior changes, which is reflected by fewer similarity relationships (arcs) crossing these points.
- Parameters:
- n_segmentsint, optional
The number of segments.
- windowint or float, optional
The window size.
if int, the exact window size.
if float, the window size expressed as a fraction of the time series length.
- excludeint or float, optional
The exclusion zone.
if float, expressed as a fraction of the window size.
if int, exact size.
- boundaryfloat, optional
The boundary of the ignored region around each segment expressed as a fraction of the window size.
- metricstr or callable, optional
The distance metric
See
_METRICS.keys()
for a list of supported metrics.- metric_paramsdict, optional
Parameters to the metric.
Read more about the parameters in the User guide.
- n_jobsint, optional
The number of parallel jobs to compute the matrix profile.
- Attributes:
- labels_list of shape (n_samples, )
A list of n_samples lists with the start index of the segment.
References
- Gharghabi, Shaghayegh, et al. (2017)
Matrix profile VIII: domain agnostic online semantic segmentation at superhuman performance levels. In proceedings of International Conference on Data Mining
- fit(X, y=None)[source]#
Fit the segmenter.
- Parameters:
- Xarray-like of shape (n_samples, n_timesteps)
The samples.
- yignored, optional
Ignored.
- Returns:
- self
The estimator.
- fit_transform(X, y=None, **fit_params)[source]#
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns:
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_metadata_routing()[source]#
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
- routingMetadataRequest
A
MetadataRequest
encapsulating routing information.
- get_params(deep=True)[source]#
Get parameters for this estimator.
- Parameters:
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
- paramsdict
Parameter names mapped to their values.
- predict(X)[source]#
Predict the position with the change point.
The predicted segmentation is based on the closest sample from the training data.
- Parameters:
- Xarray-like of shape (n_samples, n_timesteps)
The input data.
- Returns:
- csr_array of shape (n_samples, n_timesteps)
A boolean array with the start of the change point set to True.
- set_output(*, transform=None)[source]#
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
- transform{“default”, “pandas”, “polars”}, default=None
Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
- selfestimator instance
Estimator instance.
- set_params(**params)[source]#
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
- **paramsdict
Estimator parameters.
- Returns:
- selfestimator instance
Estimator instance.
- transform(X)[source]#
Transform X such that each segment is labeled with a unique label.
The predicted segmentation is based on the closest sample from the training data.
- Parameters:
- Xarray-like of shape (n_samples, n_timesteps)
The input data.
- Returns:
- ndarray of shape (n_samples, n_timesteps)
An array with the segments annotated with a label.