Skip to content

NMFW-50: Colocated correctness tests#2

Draft
yashaswikarnati wants to merge 1 commit into
ykarnati/nmfw-17-colocated-colocated-bridge-communicatorfrom
ykarnati/nmfw-50-colocated-correctness-tests
Draft

NMFW-50: Colocated correctness tests#2
yashaswikarnati wants to merge 1 commit into
ykarnati/nmfw-17-colocated-colocated-bridge-communicatorfrom
ykarnati/nmfw-50-colocated-correctness-tests

Conversation

@yashaswikarnati
Copy link
Copy Markdown
Owner

Summary

Correctness tests for colocated MIMO training — compares heterogeneous TP/DP against TP=1 baseline.

  • 9 comparisons per iteration (encoder output, post-communicate, LLM output, loss, input grads, encoder param grads, LLM param grads, encoder weights, LLM weights)
  • 3 training iterations with SGD lr=1e-6
  • 3 configs: fan-in (TP2/DP4→TP4/DP2), fan-out (TP4/DP2→TP2/DP4), equal (TP4/DP2→TP4/DP2)
  • Full determinism stack (NVTE flags, CUBLAS, torch.use_deterministic_algorithms)
  • FP32

Test command

uv run python -m torch.distributed.run --nproc_per_node=8 \
    -m pytest "tests/unit_tests/models/test_mimo_colocated_correctness.py::TestColocatedCorrectness::test_correctness[fan_in]" -v

Linear: NMFW-50
Targets: PR #1

🤖 Generated with Claude Code

Multi-iteration correctness test comparing colocated (heterogeneous TP/DP)
against TP=1 baseline. 9 checks per iteration x 3 iterations x 3 configs
(fan-in, fan-out, equal-DP). Full determinism stack, FP32.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant