Merge `Release/v26.4` back to main by GeneDer · Pull Request #759 · AMD-AGI/Primus

GeneDer · 2026-06-09T16:23:56Z

No description provided.

…a3.1_8b_enable

1：Adds a complete native SFT (Supervised Fine-Tuning) training stack to Primus，on the Megatron backend, parallel to the existing pretrain path. 2：Implements custom dataset, packing, forward_step, LoRA/PEFT, multi-turn conversation, and offline JSONL/JSON loaders without depending on Megatron-Bridge at runtime ,while keeping a megatron_bridge_adapter.py for users who still want the Bridge path. 3：In terms of performance results: with memory alignment, llama3_8b and llama2_70b outperform the third-party library megatron-bridge by 4%, deepseek_v2_lite and qwen30-30b-a3b outperform the third-party library megatron-bridge by 6%, and are comparable to mlperf_llama2_70b_lora. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Xiaoming-AMD <198007710+Xiaoming-AMD@users.noreply.github.com> Co-authored-by: Xiaoming <xiaoming@primus.dev> Co-authored-by: WangLingxun <linxwang@amd.com> Co-authored-by: botahu_qle <botahu@amd.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Botao Hu <botahu@smc300x-ccs-aus-a16-19.prov.aus.ccs.cpe.ice.amd.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR merges Release/v26.4 back into main, bringing in Megatron-native SFT (datasets/formatters/runtime wiring), stage-based trainer registration in BackendRegistry, and supporting scripts/configs for SFT runs and diagnostics.

Changes:

Add Megatron-native SFT stack (schema/formatters/tokenization/datasets/forward_step/runtime) plus unit tests and example configs for SFT + packed sequences.
Introduce stage-aware trainer registration/lookup in BackendRegistry and update adapters/backends (megatron, megatron_bridge, torchtitan) + related tests.
Add operational hooks/tools: HF→Megatron checkpoint conversion hook, diagnostics scripts, and various training launch examples/config updates (including FP4/Turbo knobs).

Reviewed changes

Copilot reviewed 83 out of 83 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
tests/unit_tests/backends/megatron/test_sft_dataset_offline.py	Unit tests for offline JSON/JSONL SFT dataset loading.
tests/unit_tests/backends/megatron/test_sft_abstractions.py	Tests for SFT normalization/tokenization and forward_step behavior.
tests/unit_tests/backends/megatron/test_messages_format.py	Tests for OpenAI messages format and formatter selection.
tests/unit_tests/backends/megatron/test_megatron_sft_trainer.py	Tests for MegatronSFTTrainer wiring into runtime factories/entrypoints.
tests/unit_tests/backends/megatron/test_megatron_registration.py	Verifies megatron adapter + stage trainers are registered in registry.
tests/unit_tests/backends/megatron/test_megatron_adapter.py	Adapter tests updated for stage-based trainer lookup and errors.
runner/helpers/hooks/train/posttrain/megatron/01_convert_checkpoints.py	Hook to convert HF checkpoints to Megatron format via Megatron-Bridge.
runner/helpers/hooks/train/posttrain/megatron/00_install_requirements.sh	Installs dependencies needed for checkpoint conversion hook.
run.sh	Convenience launcher script with quieter Torch/NCCL logging defaults.
primus/tools/diag/verify_mlp_merge_fix.py	GPU allocator diagnostic for MLP shard-merge fragmentation fix.
primus/tools/diag/verify_mlp_merge_fix_realistic.py	“Realistic” scale allocator diagnostic with large context prealloc.
primus/tools/diag/inspect_sft_data.py	Diagnostic comparing Bridge packed parquet vs Native packed cache.
primus/tools/diag/init.py	Declares diag utilities package.
primus/core/launcher/config.py	Adds pre_trainer/post_trainer aliasing for SFT configs.
primus/core/config/primus_config.py	Adds module-name aliasing in `get_module_config`.
primus/core/backend/backend_registry.py	Implements stage-based trainer registry and debug dump.
primus/core/backend/backend_adapter.py	Minor doc/log string normalization (`->`).
primus/configs/modules/megatron/sft_trainer.yaml	New Megatron SFT trainer module config (packing, bridge parity flags, LoRA).
primus/configs/modules/megatron/primus_turbo.yaml	Adds `use_turbo_fp4_autocast` flag.
primus/configs/models/megatron/qwen3_235B_A22B_4layer.yaml	Adds 4-layer smoke-test model variant.
primus/configs/models/megatron/llama3_8B.yaml	Switches tokenizer_type to HuggingFaceTokenizer.
primus/configs/models/megatron_bridge/qwen3_30b_a3b.yaml	Adds Bridge model config for Qwen3-30B-A3B.
primus/configs/models/megatron_bridge/llama3_8b.yaml	Adds Bridge model config for Llama3-8B.
primus/backends/torchtitan/torchtitan_adapter.py	Uses `BackendRegistry.get_trainer_class` for trainer loading.
primus/backends/torchtitan/init.py	Registers torchtitan pretrain trainer in stage registry.
primus/backends/megatron/training/evaluator.py	Handles `[loss, num_tokens]` tensor metric shape for averaging.
primus/backends/megatron/sft/schema.py	Defines normalized SFT sample/message schema and formatted spans.
primus/backends/megatron/sft/runtime.py	Provides dataset provider + pretrain entrypoint wrapper with signature probing.
primus/backends/megatron/sft/preprocessing.py	Adds local record loading + tokenization + loss mask/label shifting.
primus/backends/megatron/sft/formatters.py	Adds Alpaca/ChatML/OpenAI-messages/SQuAD formatters + selector.
primus/backends/megatron/sft/dataset.py	Implements `SFTDataset` and dataset-builder with packed/mlperf dispatch.
primus/backends/megatron/sft/init.py	Exposes SFT public API surface.
primus/backends/megatron/peft/recompute.py	Adds adapter-only recompute grad fix hook for PP=1 cases.
primus/backends/megatron/peft/module_matcher.py	PEFT module matcher utility (ported/adjusted).
primus/backends/megatron/peft/lora.py	LoRA implementation/transformations (ported/adjusted).
primus/backends/megatron/peft/import_utils.py	Safe import helpers for optional dependencies.
primus/backends/megatron/peft/base.py	Base PEFT API + freeze/walk + adapter save filtering.
primus/backends/megatron/peft/adapter_wrapper.py	Adapter wrapper state_dict/sharded_state_dict handling.
primus/backends/megatron/peft/init.py	PEFT package exports.
primus/backends/megatron/patches/turbo/fp4_patches.py	Changes FP4 patch gating condition to `fp4` enabled.
primus/backends/megatron/patches/sft_grad_sanitize_patches.py	Adds optional NaN/Inf grad sanitization patch for benchmark configs.
primus/backends/megatron/patches/checkpoint_patches.py	Adds tolerant factory merge + torch_dist load_checkpoint fixes.
primus/backends/megatron/megatron_adapter.py	Uses stage-based trainer registry (raises RuntimeError on missing).
primus/backends/megatron/core/transformer/moe/router.py	Removes inconsistent force-LB routing_map override; adds rationale.
primus/backends/megatron/core/fp4_utils.py	Lazier Turbo imports; TE fallback autocast; improved recipe handling.
primus/backends/megatron/core/datasets/sft_dataset.py	Compatibility shim re-exporting new SFT dataset APIs.
primus/backends/megatron/init.py	Registers megatron pretrain + sft trainers in stage registry.
primus/backends/megatron_bridge/init.py	Registers bridge pretrain + sft trainers in stage registry.
examples/moe_package/start_training_qwen_30B_a3B.sh	Example pretrain launch script for Qwen3-30B-A3B.
examples/moe_package/start_training_dsv2_lite.sh	Example pretrain launch script for DeepSeek-V2-Lite.
examples/megatron/prepare.py	Skips pretrain dataset tokenization for stage=sft; handles empty submodule.
examples/megatron/convert_to_jsonl.py	Utility to export HF/CSV datasets into JSONL for offline SFT.
examples/megatron/configs/MI355X/qwen3_235B_A22B-BF16-sft.yaml	Example native SFT config for Qwen3-235B-A22B.
examples/megatron/configs/MI355X/qwen3_235B_A22B_4layer-BF16-sft.yaml	Smoke-test SFT config for 4-layer Qwen3-235B-A22B.
examples/megatron/configs/MI355X/llama3.1_8B-MXFP8-pretrain.yaml	Adds Llama3.1 MXFP8 pretrain config.
examples/megatron/configs/MI355X/llama3.1_8B-MXFP4-pretrain.yaml	Adds Llama3.1 MXFP4 pretrain config.
examples/megatron/configs/MI355X/llama3_8B-BF16-sft.yaml	Example native SFT config for Llama3-8B.
examples/megatron/configs/MI355X/llama3_8B-BF16-sft-packed.yaml	Example packed-sequence SFT config for Llama3-8B.
examples/megatron/configs/MI355X/llama3_8B-BF16-sft-packed-squad.yaml	Packed SFT SQuAD config for Bridge-vs-Native benchmarking.
examples/megatron/configs/MI355X/llama3_8B-BF16-sft-packed-bridge_aligned.yaml	Bridge-aligned packed SFT benchmark config (native path).
examples/megatron/configs/MI355X/llama3_8B-BF16-lora-sft.yaml	LoRA-focused SFT config variant for Llama3-8B.
examples/megatron/configs/MI355X/llama2_70B-FP8-sft-packed-perf.yaml	FP8 performance-oriented packed SFT config for Llama2-70B.
examples/megatron/configs/MI355X/deepseek_v2_lite-BF16-sft.yaml	Example SFT config for DeepSeek-V2-Lite.
examples/megatron/configs/MI355X/deepseek_v2_lite-BF16-sft-packed.yaml	Packed SFT config for DeepSeek-V2-Lite with extensive perf notes.
examples/megatron_bridge/configs/MI355X/qwen3_30b_a3b_lora_posttrain_packed.yaml	Bridge packed LoRA SFT benchmark config for Qwen3-30B-A3B.
examples/megatron_bridge/configs/MI355X/llama3_8b_lora_posttrain_packed.yaml	Bridge packed LoRA SFT config for Llama3-8B.

+
+        _pre_forward_canary(model)
+


+    while not done_file.exists() and elapsed < timeout:
+        if not lock_file.exists() and not done_file.exists():
+            time.sleep(2)
+        else:
+            time.sleep(5)
+        elapsed += 5


+        except Exception as e:
+            # If torch or datasets not available, skip
+            if "No module named" in str(e):
+                self.skipTest(f"Required module not available: {e}")
+            raise


+        except Exception as e:
+            # If torch or datasets not available, skip
+            if "No module named" in str(e):
+                self.skipTest(f"Required module not available: {e}")
+            raise


+        except Exception as e:
+            if "No module named" in str(e):
+                self.skipTest(f"Required module not available: {e}")
+            raise


+        except Exception as e:
+            if "No module named" in str(e):
+                self.skipTest(f"Required module not available: {e}")
+            raise


Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 10 comments.

+# Default config if no argument is provided
+CONFIG_FILE=${1:-"./examples/megatron/configs/MI355X/llama3_8B-BF16-sft.yaml"}
+


+echo "Starting training with config: $CONFIG_FILE"
+echo "Experiment Name: $PRIMUS_EXP_NAME"
+
+PRIMUS_TRAIN_RUNTIME=core ./primus-cli --debug direct -- train posttrain --config "$CONFIG_FILE"


+# Primus Native SFT LoRA — Quick Start
+
+> **Branch**: `feat/megatron/support-sft-native` (PR701)
+> **Backend**: Megatron-LM **native** (no Megatron-Bridge runtime dependency)
+> **Hardware**: AMD MI355X / MI300X
+> **Models verified**: Llama2-70B, Llama3-8B, Llama3-70B, Qwen3-30B-A3B, Qwen3-235B-A22B, DeepSeek-V2-Lite
+
+This README walks through how to launch training on Primus's **native SFT LoRA** path, and explains exactly which fields to change when switching from BF16 / FP8 to FP4 (NVFP4 / MXFP4).


+  -e EXP=examples/megatron/configs/MI355X/llama2_70B-BF16-sft-packed-mlperf_aligned.yaml \
+  sft_primus_0507_native \
+  bash -c 'cd /workspace/Primus && bash examples/run_pretrain.sh' \
+  2>&1 | tee /home/botahu/llama2_70b_500iter_runs/${EXP_NAME}.log


+  modules:
+    pre_trainer:
+      framework: megatron
+      config: sft_trainer.yaml
+      model: llama2_70B.yaml


+# Recommended invocation:
+#   export PRIMUS_EXP_NAME=native_llama2_70b_fp4_perf_$(date +%Y%m%d_%H%M%S)
+#   EXP=examples/megatron/configs/MI355X/llama2_70B-FP4-sft-packed-perf.yaml \
+#       bash examples/run_pretrain.sh
+# =============================================================================


+modules:
+  pre_trainer:
+    framework: megatron
+    config: sft_trainer.yaml
+    model: llama2_70B.yaml


+    overrides:
+      data_path: null
+      sft_dataset_name: rajpurkar/squad
+      sft_dataset_formatter: squad


+  -e EXP=examples/megatron/configs/MI355X/llama2_70B-FP4-sft-packed-perf.yaml \
+  sft_primus_0507_native \
+  bash -c 'cd /workspace/Primus && bash examples/run_pretrain.sh' \
+  2>&1 | tee /home/botahu/llama2_70b_500iter_runs/${EXP_NAME}.log


+modules:
+  sft_trainer:
+    framework: megatron
+    config: sft_trainer.yaml
+    model: llama3_8B.yaml


GeneDer · 2026-06-10T15:40:24Z

These files are not needed

Vidushi Goyal and others added 7 commits June 3, 2026 20:14

add example of mxfp8 recipe

cf2300a

Merge branch 'main' into dev/vidgoyal/mxfp8_enable

292eb30

Merge branch 'dev/vidgoyal/mxfp8_enable' into dev/vidgoyal/mxfp4_llam…

2af89ff

…a3.1_8b_enable

add support for TE mxfp4 recipe

342fb6b

add mxfp4 yaml

8b3fbc6

Merge branch 'main' into dev/vidgoyal/mxfp4_llama3.1_8b_enable

01213d1

Copilot AI review requested due to automatic review settings June 9, 2026 16:23

Merge branch 'main' into release/v26.4

dab4dfc

Copilot started reviewing on behalf of GeneDer June 9, 2026 16:24 View session

github-code-quality Bot found potential problems Jun 9, 2026

View reviewed changes

Copilot AI reviewed Jun 9, 2026

View reviewed changes

GeneDer added 2 commits June 9, 2026 19:59

Merge branch 'main' into release/v26.4

08e80ef

Merge branch 'main' into release/v26.4

9ae01c4

Copilot AI review requested due to automatic review settings June 10, 2026 15:27

Copilot started reviewing on behalf of GeneDer June 10, 2026 15:27 View session

Copilot AI reviewed Jun 10, 2026

View reviewed changes

GeneDer closed this Jun 10, 2026

GeneDer deleted the release/v26.4 branch June 10, 2026 15:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge `Release/v26.4` back to main#759

Merge `Release/v26.4` back to main#759
GeneDer wants to merge 10 commits into
mainfrom
release/v26.4

GeneDer commented Jun 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

GeneDer commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		# Default config if no argument is provided
		CONFIG_FILE=${1:-"./examples/megatron/configs/MI355X/llama3_8B-BF16-sft.yaml"}

Conversation

GeneDer commented Jun 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

GeneDer commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants