Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -102,3 +102,4 @@ node_modules/
*.ipr
*.iws
.idea/
/AGENTS.md
16 changes: 12 additions & 4 deletions docs/API/data_preparation/aggregation_function.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,21 @@
# Aggregation Function for Longitudinal Data
# Aggregation Function

??? tip "What is the AggrFunc module?"
The `AggrFunc` module facilitates the application of aggregation functions to feature groups within a longitudinal
dataset, enabling the use of temporal information before applying traditional machine learning algorithms.

We highly recommend reviewing the `Temporal Dependency` page in the documentation for a deeper understanding of
feature groups and the `AggrFunc` module's usage before exploring its API.
??? question "What are features_group and non_longitudinal_features?"
Two key attributes, `features_group` and `non_longitudinal_features`, enable algorithms to interpret the
temporal structure of longitudinal data.

[See The Temporal Dependency Guide ](../../tutorials/temporal_dependency.md){ .md-button }
- **features_group**: A list of lists where each sublist contains indices of a longitudinal attribute's
waves, ordered from oldest to most recent. This captures temporal dependencies.
- **non_longitudinal_features**: A list of indices for static, non-temporal features excluded from the
temporal matrix.

Proper setup of these attributes is critical for leveraging temporal patterns effectively.

[See More In Temporal Dependency Guide :fontawesome-solid-timeline:](../../tutorials/temporal_dependency.md){ .md-button }

## ::: scikit_longitudinal.data_preparation.aggregation_function.AggrFunc
options:
Expand Down
11 changes: 0 additions & 11 deletions docs/API/data_preparation/longitudinal_dataset.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,5 @@
# Longitudinal Dataset

??? tip "What is the LongitudinalDataset module?"
The `LongitudinalDataset` module is a comprehensive container designed for managing and preparing longitudinal datasets.
It provides essential data management and transformation capabilities, facilitating the development and application
of machine learning algorithms tailored to longitudinal data classification tasks. Built around a `pandas` DataFrame,
it enhances functionality while maintaining a familiar interface.

We highly recommend reviewing the `Temporal Dependency` page in the documentation for a deeper understanding
of feature groups and the `LongitudinalDataset` module's usage before exploring its API.

[See the Temporal Dependency Guide](../../tutorials/temporal_dependency.md){ .md-button }

## ::: scikit_longitudinal.data_preparation.longitudinal_dataset.LongitudinalDataset
options:
heading: "LongitudinalDataset"
Expand Down
16 changes: 12 additions & 4 deletions docs/API/data_preparation/merwav_time_minus.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,23 @@
# Merging Waves and Discarding Time Indices for Longitudinal Data
# Merging Waves and Discarding Time Indices

??? tip "What is the MerWavTimeMinus module?"
The `MerWavTimeMinus` module transforms longitudinal data by merging all features across waves into a single set,
discarding temporal information. This simplifies the dataset for traditional machine learning algorithms but loses
temporal dependencies. It provides methods for data preparation and transformation, including `prepare_data` and
`transform`.

We highly recommend reviewing the `Temporal Dependency` page in the documentation for a deeper understanding of
feature groups and the `MerWavTimeMinus` module's usage before exploring its API.
??? question "What are features_group and non_longitudinal_features?"
Two key attributes, `features_group` and `non_longitudinal_features`, enable algorithms to interpret the
temporal structure of longitudinal data.

[See The Temporal Dependency Guide ](../../tutorials/temporal_dependency.md){ .md-button }
- **features_group**: A list of lists where each sublist contains indices of a longitudinal attribute's
waves, ordered from oldest to most recent. This captures temporal dependencies.
- **non_longitudinal_features**: A list of indices for static, non-temporal features excluded from the
temporal matrix.

Proper setup of these attributes is critical for leveraging temporal patterns effectively.

[See More In Temporal Dependency Guide :fontawesome-solid-timeline:](../../tutorials/temporal_dependency.md){ .md-button }

## ::: scikit_longitudinal.data_preparation.merwav_time_minus.MerWavTimeMinus
options:
Expand Down
16 changes: 12 additions & 4 deletions docs/API/data_preparation/merwav_time_plus.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,23 @@
# Merging Waves and Keeping Time Indices for Longitudinal Data
# Merging Waves and Keeping Time Indices

??? tip "What is the MerWavTimePlus module?"
The MerWavTimePlus module transforms longitudinal data by merging all features across waves into a single set while
preserving their time indices. This maintains the temporal structure, enabling longitudinal machine learning methods to
leverage temporal dependencies and patterns. It provides methods for data preparation and transformation, including
prepare_data and transform.

We highly recommend reviewing the `Temporal Dependency` page in the documentation for a deeper understanding of
feature groups and the `MerWavTimePlus` module's usage before exploring its API.
??? question "What are features_group and non_longitudinal_features?"
Two key attributes, `features_group` and `non_longitudinal_features`, enable algorithms to interpret the
temporal structure of longitudinal data.

[See The Temporal Dependency Guide ](../../tutorials/temporal_dependency.md){ .md-button }
- **features_group**: A list of lists where each sublist contains indices of a longitudinal attribute's
waves, ordered from oldest to most recent. This captures temporal dependencies.
- **non_longitudinal_features**: A list of indices for static, non-temporal features excluded from the
temporal matrix.

Proper setup of these attributes is critical for leveraging temporal patterns effectively.

[See More In Temporal Dependency Guide :fontawesome-solid-timeline:](../../tutorials/temporal_dependency.md){ .md-button }

## ::: scikit_longitudinal.data_preparation.merwav_time_plus.MerWavTimePlus
options:
Expand Down
53 changes: 45 additions & 8 deletions docs/API/data_preparation/sepwav.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,58 @@
# Separate Waves Classifier for Longitudinal Data
# Separate Waves Classifier

??? tip "What is the SepWav module?"
The `SepWav` module implements the Separate Waves strategy for longitudinal data analysis.
It trains individual classifiers on each wave (time point) and combines their predictions using ensemble methods
like voting or stacking. This approach leverages temporal information for improved model performance.
??? tip "Abstract of Separate Waves (SepWav)"
*Extracted from "A New Longitudinal Classification Method Based on Stacking Predictions for Separate Time Points" (BCS SGAI AI-2025).*

We highly recommend reviewing the `Temporal Dependency` page in the documentation for a deeper understanding of
feature groups and the `SepWav` module's usage before exploring its API.
Biomedical research often uses longitudinal data with repeated measurements of variables across time (e.g. cholesterol measured across time), which is challenging for standard machine learning algorithms due to intrinsic temporal dependencies. The Separate Waves (SepWav) data-transformation method trains a base classifier for each time point ("wave") and aggregates their predictions via voting. However, the simplicity of the voting mechanism may not be enough to capture complex patterns of time-dependent interactions involving the base classifiers' predictions. Hence, we propose a novel SepWav method where the simple voting mechanism is replaced by a stacking-based meta-classifier that integrates the base classifiers' wave-specific predictions into a final predicted class label, aiming at improving predictive performance. Experiments with 20 datasets of ageing-related diseases have shown that, overall, the proposed Stacking-based SepWav method achieved significantly better predictive performance than two other methods for longitudinal classification in most cases, when using class-weight adjustment as a class-balancing method.

[See The Temporal Dependency Guide ](../../tutorials/temporal_dependency.md){ .md-button }
[See More In References :fontawesome-solid-book:](../../publications.md){ .md-button }

## ::: scikit_longitudinal.data_preparation.separate_waves.SepWav
options:
heading: "SepWav"
inherited_members: true
members:
- get_params
- fit
- predict
- predict_proba
- predict_wave

---

## SepWav ensemble back-ends

`SepWav` delegates the final aggregation of per-wave predictions to one of the
two classifiers below.

### Longitudinal Voting Classifier

Aggregates per-wave predictions with a configurable voting rule: simple
majority, linear or exponential recency decay, or cross-validation-weighted
voting.

#### ::: scikit_longitudinal.estimators.ensemble.longitudinal_voting.longitudinal_voting.LongitudinalVotingClassifier
options:
heading: "LongitudinalVotingClassifier"
inherited_members: true
members:
- fit
- predict
- predict_proba

#### ::: scikit_longitudinal.estimators.ensemble.longitudinal_voting.longitudinal_voting.LongitudinalEnsemblingStrategy

### Longitudinal Stacking Classifier

Trains a meta-learner on the class probabilities emitted by the per-wave
classifiers fitted by `SepWav`.

#### ::: scikit_longitudinal.estimators.ensemble.longitudinal_stacking.longitudinal_stacking.LongitudinalStackingClassifier
options:
heading: "LongitudinalStackingClassifier"
inherited_members: true
members:
- fit
- predict
- predict_proba

39 changes: 24 additions & 15 deletions docs/API/estimators/ensemble/lexico_deep_forest.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,35 @@
# Lexico Deep Forest Classifier

??? tip "What is the Lexico Deep Forest Classifier?"
The Lexico Deep Forest Classifier is an advanced ensemble learning model designed for longitudinal data analysis.
It extends the Deep Forest framework by incorporating longitudinal-adapted base estimators that capture temporal
complexities and interdependencies inherent in longitudinal data. The classifier combines accurate learners
(longitudinal base estimators) and weak learners (diversity non-longitudinal estimators) to improve robustness
and generalization, making it ideal for applications like medical studies or time-series classification.
??? tip "Abstract of LexicoDeepForestClassifier"
*Extracted from Ribeiro & Freitas (2024), "Lexicographical random forests for longitudinal data classification".*

The classifier uses Lexico Random Forest classifiers as base estimators, which are specialized to handle
the temporal structure of longitudinal data.
Standard supervised machine learning methods often ignore the temporal information represented in longitudinal data, but that information can lead to more precise predictions in classification tasks. Data preprocessing techniques and classification algorithms can be adapted to cope directly with longitudinal data inputs, making use of temporal information such as the time-index of features and previous measurements of the class variable. In this article, we propose two changes to the classification task of predicting age-related diseases in a real-world dataset created from the English Longitudinal Study of Ageing. First, we explore the addition of previous measurements of the class variable, and estimating the missing data in those added features using intermediate classifiers. Second, we propose a new split-feature selection procedure for a random forest's decision trees, which considers the candidate features' time-indexes, in addition to the information gain ratio. Our experiments compared the proposed approaches to baseline approaches, in 3 prediction scenarios, varying the "time gap" for the prediction - how many years in advance the class (occurrence of an age-related disease) is predicted. The experiments were performed on 10 datasets varying the class variable, and showed that the proposed approaches increased the random forest's predictive accuracy.

Adapted and integrated into a Deep Forest cascade, this estimator stacks layers of `LexicoRandomForestClassifier`s (and optional diversity learners) so that each layer applies the lexicographic split-selection procedure above while propagating wave-aware predictions through the cascade.

[See More In References :fontawesome-solid-book:](../../../publications.md){ .md-button }

??? question "What are features_group and non_longitudinal_features?"
Two key attributes, `features_group` and `non_longitudinal_features`, enable algorithms to interpret the
temporal structure of longitudinal data.

- **features_group**: A list of lists where each sublist contains indices of a longitudinal attribute's
waves, ordered from oldest to most recent. This captures temporal dependencies.
- **non_longitudinal_features**: A list of indices for static, non-temporal features excluded from the
temporal matrix.

Proper setup of these attributes is critical for leveraging temporal patterns effectively.

[See More In Temporal Dependency Guide :fontawesome-solid-timeline:](../../../tutorials/temporal_dependency.md){ .md-button }

## ::: scikit_longitudinal.estimators.ensemble.lexicographical.lexico_deep_forest.LexicoDeepForestClassifier
options:
heading: "LexicoDeepForestClassifier"
inherited_members: true
members:
- _fit
- _predict
- _predict_proba

!!! note "Use of underscore in method names"
`_predict` should be called via `predict` we handle the call to `_predict` in the `predict` method.
The same applies to `_predict_proba` and `predict_proba`.
- fit
- predict
- predict_proba

## ::: scikit_longitudinal.estimators.ensemble.lexicographical.lexico_deep_forest.LongitudinalClassifierType

Expand Down
38 changes: 24 additions & 14 deletions docs/API/estimators/ensemble/lexico_gradient_boosting.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,33 @@
# Lexico Gradient Boosting Classifier

??? tip "What is the Lexico Gradient Boosting Classifier?"
The Lexico Gradient Boosting Classifier is an advanced ensemble learning model tailored for longitudinal data analysis.
It combines the power of gradient boosting with a lexicographic optimization approach to prioritize more recent
data points (waves) in its decision-making process. This makes it particularly effective for datasets where temporal
recency is crucial, such as medical studies or time-series classification.
??? tip "Abstract of LexicoGradientBoostingClassifier"
*Extracted from Ribeiro & Freitas (2024), "Lexicographical random forests for longitudinal data classification".*

The classifier uses Lexico Decision Tree Regressors as base estimators, which are specialized to handle
the temporal structure of longitudinal data.
Standard supervised machine learning methods often ignore the temporal information represented in longitudinal data, but that information can lead to more precise predictions in classification tasks. Data preprocessing techniques and classification algorithms can be adapted to cope directly with longitudinal data inputs, making use of temporal information such as the time-index of features and previous measurements of the class variable. In this article, we propose two changes to the classification task of predicting age-related diseases in a real-world dataset created from the English Longitudinal Study of Ageing. First, we explore the addition of previous measurements of the class variable, and estimating the missing data in those added features using intermediate classifiers. Second, we propose a new split-feature selection procedure for a random forest's decision trees, which considers the candidate features' time-indexes, in addition to the information gain ratio. Our experiments compared the proposed approaches to baseline approaches, in 3 prediction scenarios, varying the "time gap" for the prediction - how many years in advance the class (occurrence of an age-related disease) is predicted. The experiments were performed on 10 datasets varying the class variable, and showed that the proposed approaches increased the random forest's predictive accuracy.

Adapted and integrated into a Gradient Boosting framework, this estimator boosts `LexicoDecisionTreeRegressor`s as base learners, so each successive tree applies the lexicographic split-selection procedure above while fitting the residuals of the previous iterations.

[See More In References :fontawesome-solid-book:](../../../publications.md){ .md-button }

??? question "What are features_group and non_longitudinal_features?"
Two key attributes, `features_group` and `non_longitudinal_features`, enable algorithms to interpret the
temporal structure of longitudinal data.

- **features_group**: A list of lists where each sublist contains indices of a longitudinal attribute's
waves, ordered from oldest to most recent. This captures temporal dependencies.
- **non_longitudinal_features**: A list of indices for static, non-temporal features excluded from the
temporal matrix.

Proper setup of these attributes is critical for leveraging temporal patterns effectively.

[See More In Temporal Dependency Guide :fontawesome-solid-timeline:](../../../tutorials/temporal_dependency.md){ .md-button }

## ::: scikit_longitudinal.estimators.ensemble.lexicographical.lexico_gradient_boosting.LexicoGradientBoostingClassifier
options:
heading: "LexicoGradientBoostingClassifier"
inherited_members: true
members:
- _fit
- _predict
- _predict_proba
- fit
- predict
- predict_proba
- feature_importances_

!!! note "Use of underscore in method names"
`_predict` should be called via `predict` we handle the call to `_predict` in the `predict` method.
The same applies to `_predict_proba` and `predict_proba`.
Loading
Loading