simonprovost · simonprovost · Apr 21, 2026 · Apr 20, 2026 · Apr 20, 2026 · Apr 20, 2026
diff --git a/.gitignore b/.gitignore
@@ -103,3 +103,6 @@ node_modules/
 *.iws
 .idea/
 /AGENTS.md
+
+# Manim animation sources
+_manim_animations/
diff --git a/docs/assets/images/tutorials/advanced_temporal_setup/UnevenWavesPadding.avif b/docs/assets/images/tutorials/advanced_temporal_setup/UnevenWavesPadding.avif
diff --git a/docs/assets/images/tutorials/advanced_temporal_setup/UnevenWavesPaddingDark.avif b/docs/assets/images/tutorials/advanced_temporal_setup/UnevenWavesPaddingDark.avif
diff --git a/docs/assets/images/tutorials/binary_vs_multiclass/BinaryVsMulticlass.avif b/docs/assets/images/tutorials/binary_vs_multiclass/BinaryVsMulticlass.avif
diff --git a/docs/assets/images/tutorials/binary_vs_multiclass/BinaryVsMulticlassDark.avif b/docs/assets/images/tutorials/binary_vs_multiclass/BinaryVsMulticlassDark.avif
diff --git a/docs/assets/images/tutorials/long_wide_reshape/LongToWide.avif b/docs/assets/images/tutorials/long_wide_reshape/LongToWide.avif
diff --git a/docs/assets/images/tutorials/long_wide_reshape/LongToWide.gif b/docs/assets/images/tutorials/long_wide_reshape/LongToWide.gif
diff --git a/docs/assets/images/tutorials/long_wide_reshape/LongToWideDark.avif b/docs/assets/images/tutorials/long_wide_reshape/LongToWideDark.avif
diff --git a/docs/assets/images/tutorials/long_wide_reshape/WideToLong.avif b/docs/assets/images/tutorials/long_wide_reshape/WideToLong.avif
diff --git a/docs/assets/images/tutorials/long_wide_reshape/WideToLong.gif b/docs/assets/images/tutorials/long_wide_reshape/WideToLong.gif
diff --git a/docs/assets/images/tutorials/long_wide_reshape/WideToLongDark.avif b/docs/assets/images/tutorials/long_wide_reshape/WideToLongDark.avif
diff --git a/docs/assets/images/tutorials/sklong_data_preparation_first_exploration/AggrFuncFlatten.avif b/docs/assets/images/tutorials/sklong_data_preparation_first_exploration/AggrFuncFlatten.avif
diff --git a/...ssets/images/tutorials/sklong_data_preparation_first_exploration/AggrFuncFlattenDark.avif b/...ssets/images/tutorials/sklong_data_preparation_first_exploration/AggrFuncFlattenDark.avif
diff --git a/docs/assets/images/tutorials/sklong_explore_your_first_estimator/LexicoRecency.avif b/docs/assets/images/tutorials/sklong_explore_your_first_estimator/LexicoRecency.avif
diff --git a/docs/assets/images/tutorials/sklong_explore_your_first_estimator/LexicoRecencyDark.avif b/docs/assets/images/tutorials/sklong_explore_your_first_estimator/LexicoRecencyDark.avif
diff --git a/docs/assets/images/tutorials/sklong_explore_your_first_pipeline/PipelineChain.avif b/docs/assets/images/tutorials/sklong_explore_your_first_pipeline/PipelineChain.avif
diff --git a/docs/assets/images/tutorials/sklong_explore_your_first_pipeline/PipelineChainDark.avif b/docs/assets/images/tutorials/sklong_explore_your_first_pipeline/PipelineChainDark.avif
diff --git a/docs/assets/images/tutorials/sklong_hyperparameter_tuning/GridVsRandom.avif b/docs/assets/images/tutorials/sklong_hyperparameter_tuning/GridVsRandom.avif
diff --git a/docs/assets/images/tutorials/sklong_hyperparameter_tuning/GridVsRandomDark.avif b/docs/assets/images/tutorials/sklong_hyperparameter_tuning/GridVsRandomDark.avif
diff --git a/docs/assets/images/tutorials/sklong_longitudinal_data_format/WideVsLongLeakage.avif b/docs/assets/images/tutorials/sklong_longitudinal_data_format/WideVsLongLeakage.avif
diff --git a/docs/assets/images/tutorials/sklong_longitudinal_data_format/WideVsLongLeakageDark.avif b/docs/assets/images/tutorials/sklong_longitudinal_data_format/WideVsLongLeakageDark.avif
diff --git a/docs/assets/images/tutorials/temporal_dependency/FeaturesGroupBuilder.avif b/docs/assets/images/tutorials/temporal_dependency/FeaturesGroupBuilder.avif
diff --git a/docs/assets/images/tutorials/temporal_dependency/FeaturesGroupBuilderDark.avif b/docs/assets/images/tutorials/temporal_dependency/FeaturesGroupBuilderDark.avif
diff --git a/docs/tutorials/advanced_temporal_setup.md b/docs/tutorials/advanced_temporal_setup.md
@@ -14,6 +14,14 @@ others have 4). In `Sklong`, the recommended approach is:
 
 This tutorial shows how to do that in practice.
 
+The animation below summarises the end state you are aiming for: NaN cells for visits that did not take place, and `-1` padding inside `features_group` only when an entire wave column is absent from the schedule.
+
+<figure class="expandable-media" markdown="span" style="text-align: center;">
+ [![Uneven waves: NaN vs. `-1` padding](../assets/images/tutorials/advanced_temporal_setup/UnevenWavesPadding.avif#only-light){ width="100%" loading="lazy" }](../assets/images/tutorials/advanced_temporal_setup/UnevenWavesPadding.avif){ .expandable-media__trigger }
+ [![Uneven waves: NaN vs. `-1` padding](../assets/images/tutorials/advanced_temporal_setup/UnevenWavesPaddingDark.avif#only-dark){ width="100%" loading="lazy" }](../assets/images/tutorials/advanced_temporal_setup/UnevenWavesPaddingDark.avif){ .expandable-media__trigger }
+ <figcaption>Click the image to expand it.</figcaption>
+</figure>
+
 ## Step 1: Start from a long table (uneven observations) —— Optional
 
 A long-format dataset makes it easy to describe variable visit counts, but it is not what `Sklong` consumes directly.

diff --git a/docs/tutorials/binary_vs_multiclass.md b/docs/tutorials/binary_vs_multiclass.md
@@ -25,6 +25,14 @@ The estimators below support both binary and multiclass targets:
 - `NestedTreesClassifier`
 - `SepWav` with voting or stacking
 
+The animation below summarises what actually changes between the two settings: the fitting workflow is identical, `classes_` simply lists every observed label, and `predict_proba` grows one extra column per added class.
+
+<figure class="expandable-media" markdown="span" style="text-align: center;">
+ [![Binary vs. multiclass: same workflow, wider predict_proba](../assets/images/tutorials/binary_vs_multiclass/BinaryVsMulticlass.avif#only-light){ width="100%" loading="lazy" }](../assets/images/tutorials/binary_vs_multiclass/BinaryVsMulticlass.avif){ .expandable-media__trigger }
+ [![Binary vs. multiclass: same workflow, wider predict_proba](../assets/images/tutorials/binary_vs_multiclass/BinaryVsMulticlassDark.avif#only-dark){ width="100%" loading="lazy" }](../assets/images/tutorials/binary_vs_multiclass/BinaryVsMulticlassDark.avif){ .expandable-media__trigger }
+ <figcaption>Click the image to expand it.</figcaption>
+</figure>
+
 ## Step 1: Binary classification
 
 This first example uses the original `stroke_w2` target from the tutorial dataset.

diff --git a/docs/tutorials/long_wide_reshape.md b/docs/tutorials/long_wide_reshape.md
@@ -41,7 +41,8 @@ long_df = pd.DataFrame({
 The animation below walks through every long-format row and shows where each value lands in the wide matrix.
 
 <figure class="expandable-media" markdown="span" style="text-align: center;">
- [![Long to wide reshape, animated](../assets/images/tutorials/long_wide_reshape/LongToWide.gif){ width="100%" loading="lazy" }](../assets/images/tutorials/long_wide_reshape/LongToWide.gif){ .expandable-media__trigger }
+ [![Long to wide reshape, animated](../assets/images/tutorials/long_wide_reshape/LongToWide.avif#only-light){ width="100%" loading="lazy" }](../assets/images/tutorials/long_wide_reshape/LongToWide.avif){ .expandable-media__trigger }
+ [![Long to wide reshape, animated](../assets/images/tutorials/long_wide_reshape/LongToWideDark.avif#only-dark){ width="100%" loading="lazy" }](../assets/images/tutorials/long_wide_reshape/LongToWideDark.avif){ .expandable-media__trigger }
  <figcaption>Click the image to expand it.</figcaption>
 </figure>
 
@@ -99,7 +100,8 @@ A few things to notice:
 The other direction, displays as follows:
 
 <figure class="expandable-media" markdown="span" style="text-align: center;">
- [![Wide to long reshape, animated](../assets/images/tutorials/long_wide_reshape/WideToLong.gif){ width="100%" loading="lazy" }](../assets/images/tutorials/long_wide_reshape/WideToLong.gif){ .expandable-media__trigger }
+ [![Wide to long reshape, animated](../assets/images/tutorials/long_wide_reshape/WideToLong.avif#only-light){ width="100%" loading="lazy" }](../assets/images/tutorials/long_wide_reshape/WideToLong.avif){ .expandable-media__trigger }
+ [![Wide to long reshape, animated](../assets/images/tutorials/long_wide_reshape/WideToLongDark.avif#only-dark){ width="100%" loading="lazy" }](../assets/images/tutorials/long_wide_reshape/WideToLongDark.avif){ .expandable-media__trigger }
  <figcaption>Click the image to expand it.</figcaption>
 </figure>
 

diff --git a/docs/tutorials/overview.md b/docs/tutorials/overview.md
@@ -28,14 +28,6 @@ In order to visualise what the library delivers, the figure below shows the high
 
     [Read the tutorial](temporal_dependency.md)
 
-- __Advanced Feature Group (Temporal) Setup__
-
-    ---
-
-    Handle uneven numbers of observations per subject, including missing waves and padded feature groups.
-
-    [Read the tutorial](advanced_temporal_setup.md)
-
 - __Longitudinal Data Format__
 
     ---
@@ -52,6 +44,14 @@ In order to visualise what the library delivers, the figure below shows the high
 
     [Read the tutorial](long_wide_reshape.md)
 
+- __Advanced Feature Group (Temporal) Setup__
+
+    ---
+
+    Handle uneven numbers of observations per subject, including missing waves and padded feature groups.
+
+    [Read the tutorial](advanced_temporal_setup.md)
+
 - __Data Preparation: Flatten Temporal Dependency for Scikit-Learn Estimators__
 
     ---

diff --git a/docs/tutorials/sklong_data_preparation_first_exploration.md b/docs/tutorials/sklong_data_preparation_first_exploration.md
@@ -9,6 +9,14 @@ icon: lucide/database-zap
 
 Data-preparation workflows flatten longitudinal structure so you can plug the output into standard `scikit-learn` estimators. Follow this step-by-step path with [`AggrFunc`](../API/data_preparation/aggregation_function.md) (mean aggregation) and `LogisticRegression`—no longitudinal-specific pipeline required.
 
+The animation below shows the intuition: each longitudinal group (e.g. all `smoke_*` columns) collapses into a single static column, so the output is a plain tabular matrix ready for any scikit-learn estimator.
+
+<figure class="expandable-media" markdown="span" style="text-align: center;">
+ [![AggrFunc flattens each longitudinal group](../assets/images/tutorials/sklong_data_preparation_first_exploration/AggrFuncFlatten.avif#only-light){ width="100%" loading="lazy" }](../assets/images/tutorials/sklong_data_preparation_first_exploration/AggrFuncFlatten.avif){ .expandable-media__trigger }
+ [![AggrFunc flattens each longitudinal group](../assets/images/tutorials/sklong_data_preparation_first_exploration/AggrFuncFlattenDark.avif#only-dark){ width="100%" loading="lazy" }](../assets/images/tutorials/sklong_data_preparation_first_exploration/AggrFuncFlattenDark.avif){ .expandable-media__trigger }
+ <figcaption>Click the image to expand it.</figcaption>
+</figure>
+
 ## Step 1: Load data and define temporal dependencies
 
 ```python

diff --git a/docs/tutorials/sklong_explore_your_first_estimator.md b/docs/tutorials/sklong_explore_your_first_estimator.md
@@ -9,6 +9,14 @@ icon: lucide/activity
 
 Algorithm-adaptation workflows keep temporal structure intact. This walkthrough uses [`LexicoDecisionTreeClassifier`](../API/estimators/trees/lexico_decision_tree_classifier.md), which prioritises recent waves while respecting the full sequence.
 
+The animation below gives the intuition behind the split rule: when several candidate waves of the same attribute yield near-identical information gains (within `threshold_gain`), the lexicographic tree picks the **most recent** one rather than the classical "largest gain wins" tie-break.
+
+<figure class="expandable-media" markdown="span" style="text-align: center;">
+ [![Lexicographic split: recency breaks ties](../assets/images/tutorials/sklong_explore_your_first_estimator/LexicoRecency.avif#only-light){ width="100%" loading="lazy" }](../assets/images/tutorials/sklong_explore_your_first_estimator/LexicoRecency.avif){ .expandable-media__trigger }
+ [![Lexicographic split: recency breaks ties](../assets/images/tutorials/sklong_explore_your_first_estimator/LexicoRecencyDark.avif#only-dark){ width="100%" loading="lazy" }](../assets/images/tutorials/sklong_explore_your_first_estimator/LexicoRecencyDark.avif){ .expandable-media__trigger }
+ <figcaption>Click the image to expand it.</figcaption>
+</figure>
+
 ## Step 1: Load and prepare data
 
 ```python

diff --git a/docs/tutorials/sklong_explore_your_first_pipeline.md b/docs/tutorials/sklong_explore_your_first_pipeline.md
@@ -9,6 +9,14 @@ icon: lucide/workflow
 
 Pipelines let you chain longitudinal transformations, preprocessing, and estimation in one interface.
 
+The animation below illustrates the key contract that makes this work: `features_group` travels alongside `X` through every step, and `update_feature_groups_callback` rewrites those indices whenever a step reshapes the matrix — so the final estimator still sees a coherent temporal structure.
+
+<figure class="expandable-media" markdown="span" style="text-align: center;">
+ [![LongitudinalPipeline propagates features_group between steps](../assets/images/tutorials/sklong_explore_your_first_pipeline/PipelineChain.avif#only-light){ width="100%" loading="lazy" }](../assets/images/tutorials/sklong_explore_your_first_pipeline/PipelineChain.avif){ .expandable-media__trigger }
+ [![LongitudinalPipeline propagates features_group between steps](../assets/images/tutorials/sklong_explore_your_first_pipeline/PipelineChainDark.avif#only-dark){ width="100%" loading="lazy" }](../assets/images/tutorials/sklong_explore_your_first_pipeline/PipelineChainDark.avif){ .expandable-media__trigger }
+ <figcaption>Click the image to expand it.</figcaption>
+</figure>
+
 ## Load and prepare the dataset
 
 Using `extended_stroke_longitudinal.csv`:

diff --git a/docs/tutorials/sklong_hyperparameter_tuning.md b/docs/tutorials/sklong_hyperparameter_tuning.md
@@ -9,6 +9,14 @@ icon: lucide/sliders-horizontal
 
 Tune longitudinal-aware models to squeeze out extra performance. This guide compares grid search and random search for `LexicoRandomForestClassifier`, focusing on `threshold_gain` plus common random-forest hyperparameters.
 
+The animation below summarises the contrast: grid search sweeps a regular lattice of hyperparameter combinations (thorough but expensive), while random search scatters samples across the same plane and, in practice, often lands inside high-performing regions that a coarse grid would miss.
+
+<figure class="expandable-media" markdown="span" style="text-align: center;">
+ [![Grid search vs. random search on the same plane](../assets/images/tutorials/sklong_hyperparameter_tuning/GridVsRandom.avif#only-light){ width="100%" loading="lazy" }](../assets/images/tutorials/sklong_hyperparameter_tuning/GridVsRandom.avif){ .expandable-media__trigger }
+ [![Grid search vs. random search on the same plane](../assets/images/tutorials/sklong_hyperparameter_tuning/GridVsRandomDark.avif#only-dark){ width="100%" loading="lazy" }](../assets/images/tutorials/sklong_hyperparameter_tuning/GridVsRandomDark.avif){ .expandable-media__trigger }
+ <figcaption>Click the image to expand it.</figcaption>
+</figure>
+
 ## Step 1: Load data and define temporal dependencies
 
 ```python

diff --git a/docs/tutorials/sklong_longitudinal_data_format.md b/docs/tutorials/sklong_longitudinal_data_format.md
@@ -11,6 +11,14 @@ This tutorial introduces the data formats supported by Scikit-Longitudinal (`Skl
 
 ## Wide vs. Long Format
 
+The animation below contrasts the two layouts at split time: a long-format split can cut through a single subject's rows and leak wave-level information across train/test, while a wide-format split always keeps one subject on one side.
+
+<figure class="expandable-media" markdown="span" style="text-align: center;">
+ [![Wide vs. long format: leakage at split time](../assets/images/tutorials/sklong_longitudinal_data_format/WideVsLongLeakage.avif#only-light){ width="100%" loading="lazy" }](../assets/images/tutorials/sklong_longitudinal_data_format/WideVsLongLeakage.avif){ .expandable-media__trigger }
+ [![Wide vs. long format: leakage at split time](../assets/images/tutorials/sklong_longitudinal_data_format/WideVsLongLeakageDark.avif#only-dark){ width="100%" loading="lazy" }](../assets/images/tutorials/sklong_longitudinal_data_format/WideVsLongLeakageDark.avif){ .expandable-media__trigger }
+ <figcaption>Click the image to expand it.</figcaption>
+</figure>
+
 Longitudinal data can be represented in two main formats:
 
 - **Long Format**: Each observation for a subject is in a separate row, with a time indicator column. This can lead to data leakage during splitting if rows for the same subject are separated.
@@ -19,9 +27,6 @@ Longitudinal data can be represented in two main formats:
 
 `Sklong` focuses on wide format for safety and simplicity in machine learning workflows.
 
-!!! tip "Converting Formats"
-    If your data is in long format, pivot it to wide using pandas: `df.pivot(index='subject_id', columns='time', values='feature')`. Open an issue if you need built-in support.
-
 ## Synthetic Dataset Example
 
 The dataset used is synthetic, mimicking health data for illustration.
@@ -45,3 +50,6 @@ print(dataset.data.head())
 ```
 
 This wide format ensures safe splitting and temporal integrity.
+
+!!! tip "Need to convert from long to wide?"
+    See the dedicated [Long ⇄ Wide Reshape](long_wide_reshape.md) tutorial for a step-by-step walkthrough — including the recommended pandas pivot, handling uneven waves, and going back the other way.
diff --git a/docs/tutorials/temporal_dependency.md b/docs/tutorials/temporal_dependency.md
@@ -50,6 +50,14 @@ without worrying about temporal dependencies. The primitives will handle this ba
 
 ## Understanding `features_group`
 
+Before the formal definition, here is the same idea in one pass: a wide-format table, with each longitudinal attribute's columns collapsed into one inner list of indices (oldest → most recent), and static covariates kept aside in `non_longitudinal_features`.
+
+<figure class="expandable-media" markdown="span" style="text-align: center;">
+ [![features_group, in one picture](../assets/images/tutorials/temporal_dependency/FeaturesGroupBuilder.avif#only-light){ width="100%" loading="lazy" }](../assets/images/tutorials/temporal_dependency/FeaturesGroupBuilder.avif){ .expandable-media__trigger }
+ [![features_group, in one picture](../assets/images/tutorials/temporal_dependency/FeaturesGroupBuilderDark.avif#only-dark){ width="100%" loading="lazy" }](../assets/images/tutorials/temporal_dependency/FeaturesGroupBuilderDark.avif){ .expandable-media__trigger }
+ <figcaption>Click the image to expand it.</figcaption>
+</figure>
+
 `features_group` is a list of lists of integers, with each inner list representing a group of features for a specific
 longitudinal variable. The inner lists' indices are ordered by wave/time-point sequence, capturing the
 temporal dependencies required for longitudinal data algorithms.

diff --git a/zensical.toml b/zensical.toml
@@ -23,9 +23,9 @@ nav = [
     { "Tutorials" = [
         { "Overview" = "tutorials/overview.md" },
         { "Temporal Dependency" = "tutorials/temporal_dependency.md" },
-        { "Uneven Temporal Setup" = "tutorials/advanced_temporal_setup.md" },
         { "Data Format" = "tutorials/sklong_longitudinal_data_format.md" },
         { "Long ⇄ Wide Reshape" = "tutorials/long_wide_reshape.md" },
+        { "Uneven Temporal Setup" = "tutorials/advanced_temporal_setup.md" },
         { "Data Preparation" = "tutorials/sklong_data_preparation_first_exploration.md" },
         { "Algorithm Adaptation" = "tutorials/sklong_explore_your_first_estimator.md" },
         { "Binary & Multiclass" = "tutorials/binary_vs_multiclass.md" },