wildboar.segment#

Segment time series into regions.

Classes#

FlussSegmenter

Segmenter using the MatrixProfile and corrected ARC curve.


class wildboar.segment.FlussSegmenter(n_segments=1, *, window=1.0, exclude=0.2, boundary=0.1, metric='euclidean', metric_params=None, n_jobs=None)[source]#

Segmenter using the MatrixProfile and corrected ARC curve.

Compute the Fast Low-cost Unipotent Semantic Segmentation (FLUSS) as described by Gharghabi (2017).

The algorithm works by analyzing similarity relationships in time series data:

  1. For each position in the time series: - It finds its nearest neighbor (most similar subsequence) - Creates an “arc” connecting these two positions

  2. The arc curve is computed by: - Counting how many arcs pass over each position (including all

    positions between the start and end points of each arc)

    • Normalizing the counts to account for edge effects

  3. The resulting curve is used to find segment boundaries: - Low points (valleys) in the arc curve indicate natural boundaries - These are positions with few similarity relationships crossing them - High arc counts suggest positions within coherent segments

The intuition is that segment boundaries occur where the time series behavior changes, which is reflected by fewer similarity relationships (arcs) crossing these points.

Parameters:
n_segmentsint, optional

The number of segments.

windowint or float, optional

The window size.

  • if int, the exact window size.

  • if float, the window size expressed as a fraction of the time series length.

excludeint or float, optional

The exclusion zone.

  • if float, expressed as a fraction of the window size.

  • if int, exact size.

boundaryfloat, optional

The boundary of the ignored region around each segment expressed as a fraction of the window size.

metricstr or callable, optional

The distance metric

See _METRICS.keys() for a list of supported metrics.

metric_paramsdict, optional

Parameters to the metric.

Read more about the parameters in the User guide.

n_jobsint, optional

The number of parallel jobs to compute the matrix profile.

Attributes:
labels_list of shape (n_samples, )

A list of n_samples lists with the start index of the segment.

References

Gharghabi, Shaghayegh, et al. (2017)

Matrix profile VIII: domain agnostic online semantic segmentation at superhuman performance levels. In proceedings of International Conference on Data Mining

fit(X, y=None)[source]#

Fit the segmenter.

Parameters:
Xarray-like of shape (n_samples, n_timesteps)

The samples.

yignored, optional

Ignored.

Returns:
self

The estimator.

fit_transform(X, y=None, **fit_params)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns:
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:
routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
paramsdict

Parameter names mapped to their values.

predict(X)[source]#

Predict the position with the change point.

The predicted segmentation is based on the closest sample from the training data.

Parameters:
Xarray-like of shape (n_samples, n_timesteps)

The input data.

Returns:
csr_array of shape (n_samples, n_timesteps)

A boolean array with the start of the change point set to True.

set_output(*, transform=None)[source]#

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:
transform{“default”, “pandas”, “polars”}, default=None

Configure output of transform and fit_transform.

  • “default”: Default output format of a transformer

  • “pandas”: DataFrame output

  • “polars”: Polars output

  • None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:
selfestimator instance

Estimator instance.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:
**paramsdict

Estimator parameters.

Returns:
selfestimator instance

Estimator instance.

transform(X)[source]#

Transform X such that each segment is labeled with a unique label.

The predicted segmentation is based on the closest sample from the training data.

Parameters:
Xarray-like of shape (n_samples, n_timesteps)

The input data.

Returns:
ndarray of shape (n_samples, n_timesteps)

An array with the segments annotated with a label.