Minimal loop + DTW: update data_loader; add smoke test by ImYangYun · Pull Request #17 · sign-language-processing/signwriting-animation

ImYangYun · 2025-10-06T01:33:30Z

This PR adds a minimal training/validation loop that uses masked MSE and a validation-time DTW sanity metric.

The loop keeps tensors in [B,T,J,C] for loss/metrics and only permutes to [B,J,C,T] at the model boundary.
The data loader now filters by split, uses a safe pivot in [1, total_frames-1] so past/future are non-empty, and returns masked tensors.
A small smoke test (with a DummyModel) runs training_step and validation_step without heavy dependencies.
I ran the minimal loop locally for ~500–1000 steps, it trains and validates successfully, and the curves look stable. The smoke test also passes.

And I met some problems when implementing these tasks:
In the loop I convert BTJC → BJCT before calling the model, and then convert the model output BJCT → BTJC after the forward pass. The model was implemented around BJCT (time last), while our masks, zero_pad_collator, and loss/metrics (masked MSE + DTW) are written for BTJC (time at dim=1). So inputs/past are permuted to match the model, and predictions are permuted back so losses operate over [B, T, J, C].
I’m not sure if this is the best place to keep the conversions. Right now it works, but the double permute feels a bit clunky. Can you give some suggestions on this? Thanks!

AmitMY

looks ok, left a few comments.

…ebug logs

ImYangYun · 2025-10-14T12:35:30Z

I looked into the predicted sequences but they are still almost static: mean|Δpred|≈0 while GT > 0, and free-run DTW is ~1.18–1.45. At inference I also see the encoder output has time-std ≈ 0.0 (no temporal variation).

To address this I added time conditioning for future tokens (linear ramp → small MLP) and injected it into the future motion embeddings before the Transformer.
switched the loop to full-sequence prediction (Tf inferred from the mask)
trained with masked position loss + 0.5× masked velocity loss;
added small debug prints (time-std of future/encoder outputs) plus a free-run viz/CSV writer.

Current outcome: future embeddings now have non-zero time-std, but the encoder output over time remains ~0, and free-run predictions are still nearly static (mean|Δpred|≈0 vs GT≈0.03). So I’m unsure whether the time signal should be injected earlier/later (e.g., pre-pos-enc vs inside the Transformer), and whether we should keep next-frame/full-sequence + (later) autoregressive sampling, or move to a true diffusion sampling loop.

ImYangYun · 2025-10-14T12:38:30Z

AmitMY · 2025-10-14T15:09:12Z

So what I understand is that you are unable to overfit to a small dataset.
For the visualization, it would be more useful if you unnormalize the data before visualizing, and visualize using pose-format - which has nice colors etc, so we can see how the body, face, and hands move
See this video for example: https://rotem-shalev.github.io/ham-to-pose/static/videos/vid_1.mp4

ImYangYun · 2025-12-01T04:31:30Z

In this update I focused on stabilizing the training–inference pipeline and diagnosing the long-standing visualization issues. Over the past month I repeatedly encountered inconsistencies between the 586-joint source format and the 178-joint reduced skeleton used for training: GT visualizations were sometimes shifted, flipped, or distorted depending on whether reduce_holistic() was applied, and several mapping mismatches (e.g., wrong component order, inconsistent mean/std shape, and mismatched header joints) caused predicted skeletons to appear stretched or collapsed.
After unifying preprocessing, re-computing the 178-joint(reduced holistic applied) mean/std, and enforcing a consistent BTJC pipeline in the Lightning module, the reduced-skeleton GT now visualizes correctly, confirming that visualization must use the same 178-joint header as training.

At this stage I still have several open questions regarding the remaining prediction distortions:

Should all visualization/evaluation be done on the reduced 178-joint skeleton, or are there scenarios where full 586-joint output is preferred?
Is my current mapping pipeline computing mean/std after reduction and visualizing with the reduced header doing properly?
CAMDM includes GaussianDiffusion utilities such as timestep scaling and p_sample_loop. Am I expected to implement conditioning interfaces myself (SignWriting + past motion), or are there existing CAMDM components I can safely reuse here?

ImYangYun · 2025-12-17T02:06:39Z

Changes

Update model with PositionalEncoding + TimestepEmbedder
Achieves stable motion generation (disp_ratio: 0.30 → 1.02)
Fixed pylint warnings and improved code quality

Results (4-sample test)

Displacement Ratio: 1.02
MPJPE: 0.019
PCK@0.1: 100%

AmitMY · 2025-12-17T14:19:24Z

Great! Now it is time to run a full training for the entire dataset

ImYangYun · 2025-12-30T00:29:22Z

Lint warnings are about "too many arguments" (R0913/R0917) in __init__ methods.
Update models with freeze_clip option to allow unfreezing CLIP for SignWriting-specific fine-tuning.

AmitMY · 2025-12-30T16:10:41Z

Fix the lint issues, and I'd merge

************* Module signwriting_animation.translation.translate
signwriting_animation/translation/translate.py:7:0: E0401: Unable to import 'sockeye.inference' (import-error)
signwriting_animation/translation/translate.py:24:4: W0621: Redefining name 'args' from outer scope (line 73) (redefined-outer-name)
signwriting_animation/translation/translate.py:32:4: W0621: Redefining name 'translator' from outer scope (line 75) (redefined-outer-name)
signwriting_animation/translation/translate.py:18:8: C0415: Import outside toplevel (huggingface_hub.snapshot_download) (import-outside-toplevel)
signwriting_animation/translation/translate.py:21:4: E0401: Unable to import 'sockeye.translate' (import-error)
signwriting_animation/translation/translate.py:21:4: C0415: Import outside toplevel (sockeye.translate.parse_translation_arguments, sockeye.translate.load_translator_from_args) (import-outside-toplevel)
signwriting_animation/translation/translate.py:15:28: W0613: Unused argument 'temperature' (unused-argument)
signwriting_animation/translation/translate.py:46:14: W0621: Redefining name 'translator' from outer scope (line 75) (redefined-outer-name)
signwriting_animation/translation/translate.py:46:26: W0621: Redefining name 'texts' from outer scope (line 79) (redefined-outer-name)
signwriting_animation/translation/translate.py:58:4: W0621: Redefining name 'outputs' from outer scope (line 86) (redefined-outer-name)
signwriting_animation/translation/translate.py:47:4: E0401: Unable to import 'sockeye.inference' (import-error)
signwriting_animation/translation/translate.py:47:4: C0415: Import outside toplevel (sockeye.inference.make_input_from_factored_string) (import-outside-toplevel)
************* Module signwriting_animation.translation.pretraining.create_pretraining_data
signwriting_animation/translation/pretraining/create_pretraining_data.py:19:25: R1732: Consider using 'with' for resource-allocating operations (consider-using-with)
signwriting_animation/translation/pretraining/create_pretraining_data.py:32:37: R1732: Consider using 'with' for resource-allocating operations (consider-using-with)
************* Module signwriting_animation.diffusion.lightning_module
signwriting_animation/diffusion/lightning_module.py:201:4: R0913: Too many arguments (11/5) (too-many-arguments)
signwriting_animation/diffusion/lightning_module.py:201:4: R0917: Too many positional arguments (11/5) (too-many-positional-arguments)
************* Module signwriting_animation.data.data_loader
signwriting_animation/data/data_loader.py:59:4: R0917: Too many positional arguments (9/5) (too-many-positional-arguments)

You may supress R0917 in the pyproject if you'd like.

ImYangYun · 2025-12-30T18:52:05Z

Fixed lint, suppressed R0913 and R0917 in pyproject.toml and added ignore-paths for translation.

Minimal loop + DTW: update data_loader; add smoke test

5a8d6be

AmitMY requested changes Oct 6, 2025

View reviewed changes

Fix static predictions: add time conditioning & full-sequence gen + d…

1230151

…ebug logs

ImYangYun added 2 commits December 1, 2025 04:58

Update lightning_module.py

4646be7

Update minimal_loop.py

0f364bc

ImYangYun added 6 commits December 4, 2025 00:55

Merge branch 'main' into feat/minimal-loop-dtw

06398c5

Delete signwriting_animation/diffusion/core/distribution.py

b5f6590

Update models.py

417e34a

Update lightning_module.py

ef81c54

Update data_loader.py

450394b

Update minimal_loop.py

24d310d

ImYangYun requested a review from AmitMY December 29, 2025 22:48

ImYangYun added 12 commits December 30, 2025 00:07

Update conftest.py

4bdc212

Update models.py

a7da8d1

Update lightning_module.py

e47dc8a

Update test_smoke.py

f373f3a

Update test_dataset.py

ed15e53

Update test_smoke.py

d9d9ba8

Delete signwriting_animation/diffusion/tests/test_smoke.py

19b673d

Update test_dataset.py

5fa021b

Update test_overfitting.py

0db5fbb

Update test_models.py

d6fdb70

Update test_length_prediction.py

e2eb4a3

Update conftest.py

37f2aed

Update pyproject.toml

bd2f7c8

ImYangYun closed this Dec 30, 2025

ImYangYun reopened this Dec 30, 2025

AmitMY merged commit d9c8384 into sign-language-processing:main Dec 31, 2025
4 checks passed

Conversation

ImYangYun commented Oct 6, 2025

Uh oh!

AmitMY left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ImYangYun commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ImYangYun commented Oct 14, 2025

Uh oh!

AmitMY commented Oct 14, 2025

Uh oh!

ImYangYun commented Dec 1, 2025

Uh oh!

ImYangYun commented Dec 17, 2025

Changes

Results (4-sample test)

Uh oh!

AmitMY commented Dec 17, 2025

Uh oh!

ImYangYun commented Dec 30, 2025

Uh oh!

AmitMY commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ImYangYun commented Dec 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ImYangYun commented Oct 14, 2025 •

edited

Loading

AmitMY commented Dec 30, 2025 •

edited

Loading