Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -103,3 +103,6 @@ node_modules/
*.iws
.idea/
/AGENTS.md

# Manim animation sources
_manim_animations/
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
8 changes: 8 additions & 0 deletions docs/tutorials/advanced_temporal_setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,14 @@ others have 4). In `Sklong`, the recommended approach is:

This tutorial shows how to do that in practice.

The animation below summarises the end state you are aiming for: NaN cells for visits that did not take place, and `-1` padding inside `features_group` only when an entire wave column is absent from the schedule.

<figure class="expandable-media" markdown="span" style="text-align: center;">
[![Uneven waves: NaN vs. `-1` padding](../assets/images/tutorials/advanced_temporal_setup/UnevenWavesPadding.avif#only-light){ width="100%" loading="lazy" }](../assets/images/tutorials/advanced_temporal_setup/UnevenWavesPadding.avif){ .expandable-media__trigger }
[![Uneven waves: NaN vs. `-1` padding](../assets/images/tutorials/advanced_temporal_setup/UnevenWavesPaddingDark.avif#only-dark){ width="100%" loading="lazy" }](../assets/images/tutorials/advanced_temporal_setup/UnevenWavesPaddingDark.avif){ .expandable-media__trigger }
<figcaption>Click the image to expand it.</figcaption>
</figure>

## Step 1: Start from a long table (uneven observations) —— Optional

A long-format dataset makes it easy to describe variable visit counts, but it is not what `Sklong` consumes directly.
Expand Down
8 changes: 8 additions & 0 deletions docs/tutorials/binary_vs_multiclass.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,14 @@ The estimators below support both binary and multiclass targets:
- `NestedTreesClassifier`
- `SepWav` with voting or stacking

The animation below summarises what actually changes between the two settings: the fitting workflow is identical, `classes_` simply lists every observed label, and `predict_proba` grows one extra column per added class.

<figure class="expandable-media" markdown="span" style="text-align: center;">
[![Binary vs. multiclass: same workflow, wider predict_proba](../assets/images/tutorials/binary_vs_multiclass/BinaryVsMulticlass.avif#only-light){ width="100%" loading="lazy" }](../assets/images/tutorials/binary_vs_multiclass/BinaryVsMulticlass.avif){ .expandable-media__trigger }
[![Binary vs. multiclass: same workflow, wider predict_proba](../assets/images/tutorials/binary_vs_multiclass/BinaryVsMulticlassDark.avif#only-dark){ width="100%" loading="lazy" }](../assets/images/tutorials/binary_vs_multiclass/BinaryVsMulticlassDark.avif){ .expandable-media__trigger }
<figcaption>Click the image to expand it.</figcaption>
</figure>

## Step 1: Binary classification

This first example uses the original `stroke_w2` target from the tutorial dataset.
Expand Down
6 changes: 4 additions & 2 deletions docs/tutorials/long_wide_reshape.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,8 @@ long_df = pd.DataFrame({
The animation below walks through every long-format row and shows where each value lands in the wide matrix.

<figure class="expandable-media" markdown="span" style="text-align: center;">
[![Long to wide reshape, animated](../assets/images/tutorials/long_wide_reshape/LongToWide.gif){ width="100%" loading="lazy" }](../assets/images/tutorials/long_wide_reshape/LongToWide.gif){ .expandable-media__trigger }
[![Long to wide reshape, animated](../assets/images/tutorials/long_wide_reshape/LongToWide.avif#only-light){ width="100%" loading="lazy" }](../assets/images/tutorials/long_wide_reshape/LongToWide.avif){ .expandable-media__trigger }
[![Long to wide reshape, animated](../assets/images/tutorials/long_wide_reshape/LongToWideDark.avif#only-dark){ width="100%" loading="lazy" }](../assets/images/tutorials/long_wide_reshape/LongToWideDark.avif){ .expandable-media__trigger }
<figcaption>Click the image to expand it.</figcaption>
</figure>

Expand Down Expand Up @@ -99,7 +100,8 @@ A few things to notice:
The other direction, displays as follows:

<figure class="expandable-media" markdown="span" style="text-align: center;">
[![Wide to long reshape, animated](../assets/images/tutorials/long_wide_reshape/WideToLong.gif){ width="100%" loading="lazy" }](../assets/images/tutorials/long_wide_reshape/WideToLong.gif){ .expandable-media__trigger }
[![Wide to long reshape, animated](../assets/images/tutorials/long_wide_reshape/WideToLong.avif#only-light){ width="100%" loading="lazy" }](../assets/images/tutorials/long_wide_reshape/WideToLong.avif){ .expandable-media__trigger }
[![Wide to long reshape, animated](../assets/images/tutorials/long_wide_reshape/WideToLongDark.avif#only-dark){ width="100%" loading="lazy" }](../assets/images/tutorials/long_wide_reshape/WideToLongDark.avif){ .expandable-media__trigger }
<figcaption>Click the image to expand it.</figcaption>
</figure>

Expand Down
16 changes: 8 additions & 8 deletions docs/tutorials/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,6 @@ In order to visualise what the library delivers, the figure below shows the high

[Read the tutorial](temporal_dependency.md)

- __Advanced Feature Group (Temporal) Setup__

---

Handle uneven numbers of observations per subject, including missing waves and padded feature groups.

[Read the tutorial](advanced_temporal_setup.md)

- __Longitudinal Data Format__

---
Expand All @@ -52,6 +44,14 @@ In order to visualise what the library delivers, the figure below shows the high

[Read the tutorial](long_wide_reshape.md)

- __Advanced Feature Group (Temporal) Setup__

---

Handle uneven numbers of observations per subject, including missing waves and padded feature groups.

[Read the tutorial](advanced_temporal_setup.md)

- __Data Preparation: Flatten Temporal Dependency for Scikit-Learn Estimators__

---
Expand Down
8 changes: 8 additions & 0 deletions docs/tutorials/sklong_data_preparation_first_exploration.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,14 @@ icon: lucide/database-zap

Data-preparation workflows flatten longitudinal structure so you can plug the output into standard `scikit-learn` estimators. Follow this step-by-step path with [`AggrFunc`](../API/data_preparation/aggregation_function.md) (mean aggregation) and `LogisticRegression`—no longitudinal-specific pipeline required.

The animation below shows the intuition: each longitudinal group (e.g. all `smoke_*` columns) collapses into a single static column, so the output is a plain tabular matrix ready for any scikit-learn estimator.

<figure class="expandable-media" markdown="span" style="text-align: center;">
[![AggrFunc flattens each longitudinal group](../assets/images/tutorials/sklong_data_preparation_first_exploration/AggrFuncFlatten.avif#only-light){ width="100%" loading="lazy" }](../assets/images/tutorials/sklong_data_preparation_first_exploration/AggrFuncFlatten.avif){ .expandable-media__trigger }
[![AggrFunc flattens each longitudinal group](../assets/images/tutorials/sklong_data_preparation_first_exploration/AggrFuncFlattenDark.avif#only-dark){ width="100%" loading="lazy" }](../assets/images/tutorials/sklong_data_preparation_first_exploration/AggrFuncFlattenDark.avif){ .expandable-media__trigger }
<figcaption>Click the image to expand it.</figcaption>
</figure>

## Step 1: Load data and define temporal dependencies

```python
Expand Down
8 changes: 8 additions & 0 deletions docs/tutorials/sklong_explore_your_first_estimator.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,14 @@ icon: lucide/activity

Algorithm-adaptation workflows keep temporal structure intact. This walkthrough uses [`LexicoDecisionTreeClassifier`](../API/estimators/trees/lexico_decision_tree_classifier.md), which prioritises recent waves while respecting the full sequence.

The animation below gives the intuition behind the split rule: when several candidate waves of the same attribute yield near-identical information gains (within `threshold_gain`), the lexicographic tree picks the **most recent** one rather than the classical "largest gain wins" tie-break.

<figure class="expandable-media" markdown="span" style="text-align: center;">
[![Lexicographic split: recency breaks ties](../assets/images/tutorials/sklong_explore_your_first_estimator/LexicoRecency.avif#only-light){ width="100%" loading="lazy" }](../assets/images/tutorials/sklong_explore_your_first_estimator/LexicoRecency.avif){ .expandable-media__trigger }
[![Lexicographic split: recency breaks ties](../assets/images/tutorials/sklong_explore_your_first_estimator/LexicoRecencyDark.avif#only-dark){ width="100%" loading="lazy" }](../assets/images/tutorials/sklong_explore_your_first_estimator/LexicoRecencyDark.avif){ .expandable-media__trigger }
<figcaption>Click the image to expand it.</figcaption>
</figure>

## Step 1: Load and prepare data

```python
Expand Down
8 changes: 8 additions & 0 deletions docs/tutorials/sklong_explore_your_first_pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,14 @@ icon: lucide/workflow

Pipelines let you chain longitudinal transformations, preprocessing, and estimation in one interface.

The animation below illustrates the key contract that makes this work: `features_group` travels alongside `X` through every step, and `update_feature_groups_callback` rewrites those indices whenever a step reshapes the matrix — so the final estimator still sees a coherent temporal structure.

<figure class="expandable-media" markdown="span" style="text-align: center;">
[![LongitudinalPipeline propagates features_group between steps](../assets/images/tutorials/sklong_explore_your_first_pipeline/PipelineChain.avif#only-light){ width="100%" loading="lazy" }](../assets/images/tutorials/sklong_explore_your_first_pipeline/PipelineChain.avif){ .expandable-media__trigger }
[![LongitudinalPipeline propagates features_group between steps](../assets/images/tutorials/sklong_explore_your_first_pipeline/PipelineChainDark.avif#only-dark){ width="100%" loading="lazy" }](../assets/images/tutorials/sklong_explore_your_first_pipeline/PipelineChainDark.avif){ .expandable-media__trigger }
<figcaption>Click the image to expand it.</figcaption>
</figure>

## Load and prepare the dataset

Using `extended_stroke_longitudinal.csv`:
Expand Down
8 changes: 8 additions & 0 deletions docs/tutorials/sklong_hyperparameter_tuning.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,14 @@ icon: lucide/sliders-horizontal

Tune longitudinal-aware models to squeeze out extra performance. This guide compares grid search and random search for `LexicoRandomForestClassifier`, focusing on `threshold_gain` plus common random-forest hyperparameters.

The animation below summarises the contrast: grid search sweeps a regular lattice of hyperparameter combinations (thorough but expensive), while random search scatters samples across the same plane and, in practice, often lands inside high-performing regions that a coarse grid would miss.

<figure class="expandable-media" markdown="span" style="text-align: center;">
[![Grid search vs. random search on the same plane](../assets/images/tutorials/sklong_hyperparameter_tuning/GridVsRandom.avif#only-light){ width="100%" loading="lazy" }](../assets/images/tutorials/sklong_hyperparameter_tuning/GridVsRandom.avif){ .expandable-media__trigger }
[![Grid search vs. random search on the same plane](../assets/images/tutorials/sklong_hyperparameter_tuning/GridVsRandomDark.avif#only-dark){ width="100%" loading="lazy" }](../assets/images/tutorials/sklong_hyperparameter_tuning/GridVsRandomDark.avif){ .expandable-media__trigger }
<figcaption>Click the image to expand it.</figcaption>
</figure>

## Step 1: Load data and define temporal dependencies

```python
Expand Down
14 changes: 11 additions & 3 deletions docs/tutorials/sklong_longitudinal_data_format.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,14 @@ This tutorial introduces the data formats supported by Scikit-Longitudinal (`Skl

## Wide vs. Long Format

The animation below contrasts the two layouts at split time: a long-format split can cut through a single subject's rows and leak wave-level information across train/test, while a wide-format split always keeps one subject on one side.

<figure class="expandable-media" markdown="span" style="text-align: center;">
[![Wide vs. long format: leakage at split time](../assets/images/tutorials/sklong_longitudinal_data_format/WideVsLongLeakage.avif#only-light){ width="100%" loading="lazy" }](../assets/images/tutorials/sklong_longitudinal_data_format/WideVsLongLeakage.avif){ .expandable-media__trigger }
[![Wide vs. long format: leakage at split time](../assets/images/tutorials/sklong_longitudinal_data_format/WideVsLongLeakageDark.avif#only-dark){ width="100%" loading="lazy" }](../assets/images/tutorials/sklong_longitudinal_data_format/WideVsLongLeakageDark.avif){ .expandable-media__trigger }
<figcaption>Click the image to expand it.</figcaption>
</figure>

Longitudinal data can be represented in two main formats:

- **Long Format**: Each observation for a subject is in a separate row, with a time indicator column. This can lead to data leakage during splitting if rows for the same subject are separated.
Expand All @@ -19,9 +27,6 @@ Longitudinal data can be represented in two main formats:

`Sklong` focuses on wide format for safety and simplicity in machine learning workflows.

!!! tip "Converting Formats"
If your data is in long format, pivot it to wide using pandas: `df.pivot(index='subject_id', columns='time', values='feature')`. Open an issue if you need built-in support.

## Synthetic Dataset Example

The dataset used is synthetic, mimicking health data for illustration.
Expand All @@ -45,3 +50,6 @@ print(dataset.data.head())
```

This wide format ensures safe splitting and temporal integrity.

!!! tip "Need to convert from long to wide?"
See the dedicated [Long ⇄ Wide Reshape](long_wide_reshape.md) tutorial for a step-by-step walkthrough — including the recommended pandas pivot, handling uneven waves, and going back the other way.
8 changes: 8 additions & 0 deletions docs/tutorials/temporal_dependency.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,14 @@ without worrying about temporal dependencies. The primitives will handle this ba

## Understanding `features_group`

Before the formal definition, here is the same idea in one pass: a wide-format table, with each longitudinal attribute's columns collapsed into one inner list of indices (oldest → most recent), and static covariates kept aside in `non_longitudinal_features`.

<figure class="expandable-media" markdown="span" style="text-align: center;">
[![features_group, in one picture](../assets/images/tutorials/temporal_dependency/FeaturesGroupBuilder.avif#only-light){ width="100%" loading="lazy" }](../assets/images/tutorials/temporal_dependency/FeaturesGroupBuilder.avif){ .expandable-media__trigger }
[![features_group, in one picture](../assets/images/tutorials/temporal_dependency/FeaturesGroupBuilderDark.avif#only-dark){ width="100%" loading="lazy" }](../assets/images/tutorials/temporal_dependency/FeaturesGroupBuilderDark.avif){ .expandable-media__trigger }
<figcaption>Click the image to expand it.</figcaption>
</figure>

`features_group` is a list of lists of integers, with each inner list representing a group of features for a specific
longitudinal variable. The inner lists' indices are ordered by wave/time-point sequence, capturing the
temporal dependencies required for longitudinal data algorithms.
Expand Down
2 changes: 1 addition & 1 deletion zensical.toml
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,9 @@ nav = [
{ "Tutorials" = [
{ "Overview" = "tutorials/overview.md" },
{ "Temporal Dependency" = "tutorials/temporal_dependency.md" },
{ "Uneven Temporal Setup" = "tutorials/advanced_temporal_setup.md" },
{ "Data Format" = "tutorials/sklong_longitudinal_data_format.md" },
{ "Long ⇄ Wide Reshape" = "tutorials/long_wide_reshape.md" },
{ "Uneven Temporal Setup" = "tutorials/advanced_temporal_setup.md" },
{ "Data Preparation" = "tutorials/sklong_data_preparation_first_exploration.md" },
{ "Algorithm Adaptation" = "tutorials/sklong_explore_your_first_estimator.md" },
{ "Binary & Multiclass" = "tutorials/binary_vs_multiclass.md" },
Expand Down
Loading