.. currentmodule:: wildboar ################### Ensemble estimators ################### .. _shapelet_forest: **************** Shapelet forests **************** Shapelet forests, implemented in :class:`ensemble.ShapeletForestClassifier` and :class:`ensemble.ShapeletForestRegressor`, construct ensembles of shapelet tree classifiers or regressors respectively. For a large variety of tasks, these estimators are excellent baseline methods. .. execute:: :context: :show-return: from wildboar.datasets import load_gun_point from wildboar.ensemble import ShapeletForestClassifier X_train, X_test, y_train, y_test = load_gun_point(merge_train_test=False) clf = ShapeletForestClassifier(random_state=1) clf.fit(X_train, y_train) The :class:`~wildboar.ensemble.ShapeletForestClassifier` class includes the `n_jobs` parameter, which determines the number of processor cores to be allocated for model fitting and prediction. It is advisable to assign `n_jobs` a value of ``-1`` to utilize all available cores. We can get the predictions by using the `predict`-function (or the `predict_proba`-function): .. execute:: :context: :show-return: clf.predict(X_test) The accuracy of the model is given by the `score`-function. .. execute:: :context: :show-return: clf.score(X_test, y_test) .. _proximity_forest: ***************** Proximity forests ***************** Test :class:`ensemble.ProximityForestClassifier` is an ensemble of highly randomized Proximity Trees. Whereas conventional decision trees branch on attribute values, and shapelet trees on distance thresholds, Proximity Trees is `k`-branching tree that branches on proximity of time series to one of `k` pivot time series. .. execute:: :context: reset :show-return: from wildboar.datasets import load_gun_point from wildboar.ensemble import ProximityForestClassifier X_train, X_test, y_train, y_test = load_gun_point(merge_train_test=False) clf = ProximityForestClassifier(random_state=1) clf.fit(X_train, y_train) By default, :class:`~wildboar.ensemble.ProximityForestClassifier` uses the distance measures suggested in the original paper [#lucas]_. Using these distance measures, we get the following accuracy: .. execute:: :context: :show-return: clf.score(X_test, y_test) We can specify only a single metric: .. execute:: :context: :show-return: clf = ProximityForestClassifier(metric="euclidean", random_state=1) clf.fit(X_train, y_train) This configuration gives the following accuracy: .. execute:: :context: :show-return: clf.score(X_test, y_test) We can also specify more complex configurations by passing a ``dict`` or ``list`` to the metric parameter. You can :ref:`read more about metric specification ` in the corresponding section. .. execute:: :context: :show-return: clf = ProximityForestClassifier( metric={ "ddtw": {"min_r": 0.01, "max_r": 0.1}, "msm": {"min_c": 0.1, "max_c": 100}, }, random_state=1 ) clf.fit(X_train, y_train) This configuration gives the following accuracy: .. execute:: :context: :show-return: clf.score(X_test, y_test) ***************** Elastic Ensemble ***************** The Elastic ensemble is a classifier first described by Lines and Bagnall (2015) [#lines]_. The ensemble consists of one `k`-nearest neighbors classifier per distance metric, with the parameters of the metric optimized through leave one out cross-validation. .. execute:: :context: reset :show-return: from wildboar.datasets import load_gun_point from wildboar.ensemble import ElasticEnsembleClassifier X_train, X_test, y_train, y_test = load_gun_point(merge_train_test=False) clf = ElasticEnsembleClassifier() clf.fit(X_train, y_train) The default configuration uses all available elastic distances measures in Wildboar, which corresponds to a superset of the elastic metrics used by Lines and Bagnall (2015) [#lines]_ but with a smaller grid of metric parameters. The result of the default configuration is: .. execute:: :context: :show-return: clf.score(X_test, y_test) Similar to the :ref:`Proximity Forest `, we can specify a custom metric: .. execute:: :context: :show-return: clf = ElasticEnsembleClassifier( metric={ "ddtw": {"min_r": 0.01, "max_r": 0.1}, "msm": {"min_c": 0.1, "max_c": 100}, }, ) clf.fit(X_train, y_train) This smaller configuration has an accuracy of: .. execute:: :context: :show-return: clf.score(X_test, y_test) *************** Interval Forest *************** The interval forest was first introduced by Deng et al. [#deng]_ and is implemented in the class :class:`~wildboar.ensemble.IntervalForestClassifier` It constructs a forest of interval-based decision trees where each node is constructed using a value aggregate over a (possibly overlapping) interval. In the default formulation a node uses either the `mean`, `variance` or `slope` of the interval. But it is possible to consider other aggregation functions (in Wildboar we call the functions summarization functions). .. execute:: :context: reset :show-return: from wildboar.datasets import load_gun_point from wildboar.ensemble import IntervalForestClassifier X_train, X_test, y_train, y_test = load_gun_point(merge_train_test=False) clf = IntervalForestClassifier(min_size=0.1, max_size=0.3, random_state=1) clf.fit(X_train, y_train) The interval forest uses the default summarization functions mentioned above and `sqrt(n_timestep)` intervals. By default, we randomly select random intervals that are possibly overlapping. The accuracy is: .. execute:: :context: :show-return: clf.score(X_test, y_test) We can also use non-overlapping intervals by setting the `intervals` parameter to `"fixed"`. We can sample a smaller set of intervals by setting the `sample_size` parameter to a float. .. warning:: ``intervals="sample"`` was deprecated in version 1.3 and will be removed in version 1.4. The equivalent functionality can be achieved by setting ``intervals="fixed"`` and specifying `sample_size` as a float. .. execute:: :context: :show-return: from wildboar.datasets import load_gun_point from wildboar.ensemble import IntervalForestClassifier X_train, X_test, y_train, y_test = load_gun_point(merge_train_test=False) clf = IntervalForestClassifier( intervals="fixed", n_intervals=30, sample_size=0.2, random_state=1 ) clf.fit(X_train, y_train) At each node in each tree, we sample 20% of the intervals. The accuracy is: .. execute:: :context: :show-return: clf.score(X_test, y_test) We can also change the summarizer. By setting the `summarizer` parameter to ``"catch22"`` we can sample from the full set of Catch22 [#lubba]_ features. .. execute:: :context: :show-return: X_train, X_test, y_train, y_test = load_gun_point(merge_train_test=False) clf = IntervalForestClassifier( summarizer="catch22", intervals="random", n_intervals=30, random_state=1, ) clf.fit(X_train, y_train) Here, we sample 30 possibly overlapping intervals at each node and randomly selects one of the catch22 features to split the node. The accuracy for this configuration is: .. execute:: :context: :show-return: clf.score(X_test, y_test) ********** References ********** .. [#lucas] Lucas, Benjamin, Ahmed Shifaz, Charlotte Pelletier, Lachlan O'Neill, Nayyar Zaidi, \ Bart Goethals, François Petitjean, and Geoffrey I. Webb. (2019) Proximity forest: an effective and scalable distance-based classifier for time series. Data Mining and Knowledge Discovery .. [#lines] Jason Lines and Anthony Bagnall, Time Series Classification with Ensembles of Elastic Distance Measures, Data Mining and Knowledge Discovery, 29(3), 2015. .. [#lubba] Lubba, C.H., Sethi, S.S., Knaute, P., Schultz, S.R., Fulcher, B.D. and Jones, N.S., 2019. catch22: CAnonical Time-series CHaracteristics: Selected through highly comparative time-series analysis. Data Mining and Knowledge Discovery, 33(6), pp.1821-1852. .. [#deng] Deng, H., Runger, G., Tuv, E. and Vladimir, M., 2013. A time series forest for classification and feature extraction. Information Sciences, 239, pp.142-153.