Skip to content

Add hetero MIMO DDP performance flags#24

Draft
yashaswikarnati wants to merge 1 commit into
ykarnati/nmfw-464-nemotron-vlm-with-hetero-parallelfrom
ykarnati/nmfw-464-ddp-perf-flags
Draft

Add hetero MIMO DDP performance flags#24
yashaswikarnati wants to merge 1 commit into
ykarnati/nmfw-464-nemotron-vlm-with-hetero-parallelfrom
ykarnati/nmfw-464-ddp-perf-flags

Conversation

@yashaswikarnati
Copy link
Copy Markdown
Owner

Summary

  • add hetero training CLI flags for language DDP parameter gather overlap, bucket count, and high-NCCL-bandwidth bucket padding
  • apply the new DDP knobs to the language module while keeping the vision encoder DDP conservative
  • wire the 54L HEL run script to pass the new flags from environment variables

Testing

  • python3 -m py_compile examples/mimo/training/hetero/args.py examples/mimo/training/hetero/runtime.py examples/mimo/training/hetero/optimizer.py
  • bash -n examples/mimo/scripts/run_hetero_nemotron_54l_hel_train.sh
  • git diff --check

LLM_EXPT_TP="${LLM_EXPT_TP:-1}"
ENABLE_EXPERIMENTAL="${ENABLE_EXPERIMENTAL:-1}"
MOE_ROUTER_FORCE_LOAD_BALANCING="${MOE_ROUTER_FORCE_LOAD_BALANCING:-0}"
OVERLAP_GRAD_REDUCE="${OVERLAP_GRAD_REDUCE:-0}"
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we default to 1? as we need them for perf ?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: defaulted OVERLAP_GRAD_REDUCE to 1 in the HEL 54L run script.

MOE_ROUTER_FORCE_LOAD_BALANCING="${MOE_ROUTER_FORCE_LOAD_BALANCING:-0}"
OVERLAP_GRAD_REDUCE="${OVERLAP_GRAD_REDUCE:-0}"
OVERLAP_PARAM_GATHER="${OVERLAP_PARAM_GATHER:-0}"
DDP_NUM_BUCKETS="${DDP_NUM_BUCKETS:-}"
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use the same defaults that we used for good perf ?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: set the run-script perf defaults to DDP_NUM_BUCKETS=8 with DDP_BUCKET_SIZE unset, matching the final tested config.

OVERLAP_PARAM_GATHER="${OVERLAP_PARAM_GATHER:-0}"
DDP_NUM_BUCKETS="${DDP_NUM_BUCKETS:-}"
DDP_BUCKET_SIZE="${DDP_BUCKET_SIZE:-0}"
DDP_PAD_BUCKETS_FOR_HIGH_NCCL_BUSBW="${DDP_PAD_BUCKETS_FOR_HIGH_NCCL_BUSBW:-0}"
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enable by default ?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: enabled DDP_PAD_BUCKETS_FOR_HIGH_NCCL_BUSBW by default in the HEL 54L run script.

@yashaswikarnati yashaswikarnati force-pushed the ykarnati/nmfw-464-ddp-perf-flags branch from 6b73a4c to ab585d7 Compare May 14, 2026 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant