Releases: NVIDIA/dgxc-benchmarking
Releases · NVIDIA/dgxc-benchmarking
v26.05
Added
- Kimi K2 MXFP8 pretrain support.
- Nemotron 3 Nano (30B) and Super (120B) pretrain recipes.
- Slurm topology checks and CPU governor reporting in the system info microbenchmark.
llmb-runjob history and log handling.llmb-runflags:--envfor container env overrides, additional Slurm pass-through flags, anddump-envMegatron-Bridge mode.
Changed
- Updated recipes to NeMo 26.04.00 where applicable.
- Refreshed DeepSeek V3, Nemotron 3, and Qwen3 configurations.
Fixed
- Legacy-parser grad-norm NaN handling.
- Archive exclusion for
nsys_profileand PyTorch profiling output directories. - Torchtitan container compatibility.
Removed
- Deprecated Grok1 and Nemotron4 recipes.
- Legacy
setup_scriptinstaller path and Conda support. - Deprecated
llmb-runcommands.
Known Issues
- DeepSeek V3 Megatron-Bridge on H100 requires
uv <=0.9.28during setup. - EFA limitations remain for DeepSeek V3 (Megatron-Bridge H100, TorchTitan) and Qwen3 (30B H100, 235B H100); see Known Issues section of README for details.
- Optional PCT fixed-core CPU binding may improve select workloads on Granite Rapids systems where PCT is enabled. See the README Known Issues section before applying the patch.
End of Support
- LLMB
v25.12.xand earlier are no longer supported as ofv26.05.00. These release lines will not receive further updates, fixes, or support.
v26.02.01
Added
- Llama3 LoRa finetuning support for B300 and B200.
- PyTorch Profiler support for selected Megatron-Bridge recipes, including DeepSeek V3, GPT-OSS 120B, Llama3.1, Nemotron-H, Qwen3, and Llama3 LoRa finetuning.
Changed
- Updated recipes to NeMo 26.02.01 where applicable.
- Refreshed Blackwell recipe configurations, including GPT-OSS 120B, Qwen3, and Llama3.1.
Fixed
- Improved
llmb-installreliability when resuming failed installs, creating virtual environments, and auto-detecting SLURM GRES on heterogeneous partitions. - Improved
llmb-run submitvalidation and error messages for explicit workload selections.
Known Issues
- Qwen3 on select B300 Granite Rapids systems may benefit from the optional
qwen3/pretrain/b300_numa_cpu_pinning.patchworkaround when PCT is available and enabled. - EFA incompatibility for certain recipes, see Known Issues section of README
for more details.
v26.02
Added
- B300 support
- Pretrain recipes: Llama 3.1, DeepSeek V3, Nemotron-H, Qwen3
- NCCL benchmark
- CPU overhead microbenchmark
- GPT-OSS pretrain recipe.
- DeepSeek V3 Torchtitan FP8 support for GB300 and GB200.
- DeepSeek V3 proxy models for 64 GB300/GB200 GPUs.
- System info script for IB, container, and enroot diagnostics.
llmb-run archivecommand to package experiment logs into tarball.- Exemplar program documentation and tooling.
Changed
- Updated recipes to NeMo 26.02.00 where applicable.
- Llama3 LoRa finetuning ported to Megatron Bridge.
- Torchtitan optimizations for DeepSeek V3.
- Centralized peak throughput (TFLOP/GPU) as primary performance metric in READMEs.
- Qwen3 235B GB200 removed FP8 support.
Removed
- Run:ai support.
Known Issues
- Recipes using NeMo 26.02.00 container will not work with EFA, see Known Issues section of README for workaround.
- DeepSeek V3 on EFA clusters may encounter connectivity issues.
v25.12.02
v25.10.02
v25.08.02
v25.12.01
[v25.12.01] - 2026-02-05
Changed
- For Megatron Bridge models, download model configs in addition to tokenizers.
- Add
--container-writableflag to Megatron Bridge SLURM job scripts. - Use the passthrough packager for Megatron Bridge recipes.
- Standardize Torchtitan log location and naming.
- DSV3 B200 scales to match tested configurations.
Fixed
- Inference and microbenchmark job submission.
- Headless installation.
- Ensure Qwen handles custom mounts correctly.
- Resolve
llmb-installTransformers version issues. - Llama3.1 70b scale documentation for H100.
Known Issues
- Qwen3 requires internet connectivity and may encounter Hugging Face Hub access or rate limit errors during benchmark runs.
v25.12
v25.10.01
Added
- NVCF support to inference recipes deployable via Helm Charts.
- Offline mode support for Grok1 and Nemotron4 (15B and 340B) pretrain recipes on SLURM clusters. Tokenizers are pre-downloaded during installation and mounted into containers at runtime, eliminating the need for HuggingFace API access during workload execution.
Fixed
- Fixed Nemotron 340B runtime failures caused by rate limiting (HTTP 429 errors) when connecting to HuggingFace Hub. The workload now operates in offline mode using pre-downloaded tokenizer files, preventing API rate limit exhaustion during training runs.
v25.10
[v25.10] - 2025-12-03
Added
- GB300 support
- Pretrain recipes: Nemotron4, Llama3.1, DS V3, Grok1 and Nemotron-H
- Micro-benchmark for measuring CPU overhead
- NCCL benchmark
- Inference recipes deployable via Helm Charts for K8s platform
- GPT OSS inference recipes for Dynamo K8s platform
- Llama3 LoRa finetuning recipe
Changed
- Updated DS V3, Grok1, Llama 3.1, Nemotron4 and Nemotron-H pretrain and finetune recipes to reduce install footprint
- Updated to NeMo 25.09.00 where applicable
Removed
- DeepSeek R1 NIM inference recipe
- RAG Blueprint inference recipe
- Llama4 pretrain, fine tuning, inference recipes