Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
73a31da
NMFW-464 phase 1 hypercomm expert groups
yashaswikarnati May 9, 2026
34ea260
NMFW-464 phase 2 hetero mock training loop
yashaswikarnati May 9, 2026
7722020
NMFW-464 add Nemotron 20L hetero mock workflow
yashaswikarnati May 10, 2026
7cfb2da
NMFW-464 modularize hetero MIMO training loop
yashaswikarnati May 10, 2026
c619762
NMFW-464 address hetero MIMO PR cleanup
yashaswikarnati May 10, 2026
3cf88ab
NMFW-464 add hetero Energon training path
yashaswikarnati May 11, 2026
76b0d7f
NMFW-464 document e2e parity plan
yashaswikarnati May 11, 2026
74f06cd
NMFW-464 simplify hetero Energon dataloader
yashaswikarnati May 11, 2026
7312481
NMFW-464 address core cleanup comments
yashaswikarnati May 11, 2026
434ee4e
NMFW-464 remove vendored Energon artifacts
yashaswikarnati May 11, 2026
67df146
NMFW-464 avoid partial hybrid logging groups
yashaswikarnati May 11, 2026
7423b4d
NMFW-464 remove hetero runtime wrapper
yashaswikarnati May 11, 2026
532acaa
NMFW-464 keep encoder grad overlap disabled
yashaswikarnati May 11, 2026
db85eaa
NMFW-464 simplify embedding group lifecycle
yashaswikarnati May 11, 2026
8db3fc5
NMFW-464 align hetero logging reductions
yashaswikarnati May 11, 2026
9a30ad8
NMFW-464 organize hetero training modules
yashaswikarnati May 11, 2026
60e07e7
NMFW-464 fix token-count grad scaling
yashaswikarnati May 11, 2026
c608209
NMFW-464 remove eager param broadcast
yashaswikarnati May 11, 2026
f80ccbd
NMFW-464 keep embedding groups language-only
yashaswikarnati May 11, 2026
226dcc3
NMFW-464 guard hetero parallel state ownership
yashaswikarnati May 11, 2026
de2ca39
NMFW-464 simplify hetero loss contract
yashaswikarnati May 11, 2026
8856092
NMFW-464 clarify hetero loss function signature
yashaswikarnati May 11, 2026
69fa21f
NMFW-464 clarify hetero runtime setup
yashaswikarnati May 11, 2026
4eb48fb
NMFW-464 support Energon encoder DP fan-out
yashaswikarnati May 12, 2026
9318746
NMFW-464 simplify MIMO partition layout
yashaswikarnati May 12, 2026
3d6da4a
NMFW-464 address MIMO base review comments
yashaswikarnati May 12, 2026
ad7a517
NMFW-464 clarify text-only encoder bridge payload
yashaswikarnati May 12, 2026
e61d7ea
Pass attention mask through MIMO language model
yashaswikarnati May 12, 2026
65026c7
Add 54L Nemotron MoE VLM provider (#20)
yashaswikarnati May 13, 2026
ba52765
Guard Energon alignment validation (#21)
yashaswikarnati May 13, 2026
e032dd7
Add hetero pipeline timeline tracing (#22)
yashaswikarnati May 14, 2026
e03f8aa
Add HEL MIMO launch scripts (#23)
yashaswikarnati May 14, 2026
23bf04a
NMFW-464: Distributed checkpoint save/load for the hetero MIMO traini…
yashaswikarnati May 15, 2026
edc6037
NMFW-464: Skip MIMO optimizer build for all-frozen modules; scope _ge…
yashaswikarnati May 16, 2026
2491b43
NMFW-464: Add LLM-only hetero MIMO launch path
yashaswikarnati May 15, 2026
2f6b2c0
NMFW-478: Wire hetero MIMO training parity with pre-vlm-05 VLM recipe
yashaswikarnati May 16, 2026
6f3deb3
Hetero MIMO: arg-parity + correctness fixes (Sanjeev-202967) (#29)
yashaswikarnati May 18, 2026
aef90f1
NMFW-464: route encoder samples through one Energon iterator per enco…
yashaswikarnati May 19, 2026
7e3223d
NMFW-464: dynamic-resolution + RADIO final_layernorm parity for heter…
yashaswikarnati May 19, 2026
7e0834d
Add tensorboard logging to hetero train loop (#33)
yashaswikarnati May 19, 2026
1e24f4d
NMFW-464: emit seq_load_balancing_loss to TB on the hetero side (#34)
yashaswikarnati May 20, 2026
01e5491
NMFW-464: fix routed encoder iterator merge for dynres PackedSeqParam…
yashaswikarnati May 20, 2026
b759183
NMFW-464: explicit per-key schema for routed-iter modality_inputs mer…
yashaswikarnati May 21, 2026
59155fd
NMFW-464: production hetero scaling sbatches (33n / 68n / 100n, EP=8,…
yashaswikarnati May 23, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions examples/mimo/blend_files/1t_phase1var_moresft_wrapper.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# RKarimi 3B-nano SOTA 1T text subset blend.
# The 3B-nano baseline uses TRAIN_SAMPLES=122070313 and SEQ_LEN=8192,
# which is 1,000,000,004,096 tokens.
__module__: megatron.energon
__class__: McoreBlend
mcore_json: /scratch/fsw/portfolios/llmservice/projects/llmservice_fm_text/users/rkarimimahab/workspace/blends/1T-phase1var-moresft.json
365 changes: 365 additions & 0 deletions examples/mimo/blend_files/text_omnicorpus_blend_10_90_hel.yaml

Large diffs are not rendered by default.

8 changes: 8 additions & 0 deletions examples/mimo/blend_files/text_only_1t_hel.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# HEL text-only Energon blend for MIMO jitter isolation.
__module__: megatron.energon
__class__: MetadatasetV2
splits:
train:
blend:
- weight: 1.0
path: __MEGATRON_ROOT__/examples/mimo/blend_files/1t_phase1var_moresft_wrapper.yaml
14 changes: 10 additions & 4 deletions examples/mimo/data/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
from .energon_avlm_task_encoder import VisionAudioQASample
"""MIMO data providers and task encoders."""

all = [
VisionAudioQASample,
]
__all__ = ["VisionAudioQASample"]


def __getattr__(name):
if name == "VisionAudioQASample":
from .energon_avlm_task_encoder import VisionAudioQASample

return VisionAudioQASample
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
382 changes: 382 additions & 0 deletions examples/mimo/data/energon_multimodal_provider.py

Large diffs are not rendered by default.

Loading