Skip to content

Releases: NVIDIA/dgxc-benchmarking

v26.05

22 May 18:32
ef2f6c2

Choose a tag to compare

Added

  • Kimi K2 MXFP8 pretrain support.
  • Nemotron 3 Nano (30B) and Super (120B) pretrain recipes.
  • Slurm topology checks and CPU governor reporting in the system info microbenchmark.
  • llmb-run job history and log handling.
  • llmb-run flags: --env for container env overrides, additional Slurm pass-through flags, and dump-env Megatron-Bridge mode.

Changed

  • Updated recipes to NeMo 26.04.00 where applicable.
  • Refreshed DeepSeek V3, Nemotron 3, and Qwen3 configurations.

Fixed

  • Legacy-parser grad-norm NaN handling.
  • Archive exclusion for nsys_profile and PyTorch profiling output directories.
  • Torchtitan container compatibility.

Removed

  • Deprecated Grok1 and Nemotron4 recipes.
  • Legacy setup_script installer path and Conda support.
  • Deprecated llmb-run commands.

Known Issues

  • DeepSeek V3 Megatron-Bridge on H100 requires uv <=0.9.28 during setup.
  • EFA limitations remain for DeepSeek V3 (Megatron-Bridge H100, TorchTitan) and Qwen3 (30B H100, 235B H100); see Known Issues section of README for details.
  • Optional PCT fixed-core CPU binding may improve select workloads on Granite Rapids systems where PCT is enabled. See the README Known Issues section before applying the patch.

End of Support

  • LLMB v25.12.x and earlier are no longer supported as of v26.05.00. These release lines will not receive further updates, fixes, or support.

v26.02.01

24 Apr 20:46
b8f2b1a

Choose a tag to compare

Added

  • Llama3 LoRa finetuning support for B300 and B200.
  • PyTorch Profiler support for selected Megatron-Bridge recipes, including DeepSeek V3, GPT-OSS 120B, Llama3.1, Nemotron-H, Qwen3, and Llama3 LoRa finetuning.

Changed

  • Updated recipes to NeMo 26.02.01 where applicable.
  • Refreshed Blackwell recipe configurations, including GPT-OSS 120B, Qwen3, and Llama3.1.

Fixed

  • Improved llmb-install reliability when resuming failed installs, creating virtual environments, and auto-detecting SLURM GRES on heterogeneous partitions.
  • Improved llmb-run submit validation and error messages for explicit workload selections.

Known Issues

  • Qwen3 on select B300 Granite Rapids systems may benefit from the optional qwen3/pretrain/b300_numa_cpu_pinning.patch workaround when PCT is available and enabled.
  • EFA incompatibility for certain recipes, see Known Issues section of README
    for more details.

v26.02

24 Mar 16:23
1b172ed

Choose a tag to compare

Added

  • B300 support
    • Pretrain recipes: Llama 3.1, DeepSeek V3, Nemotron-H, Qwen3
    • NCCL benchmark
    • CPU overhead microbenchmark
  • GPT-OSS pretrain recipe.
  • DeepSeek V3 Torchtitan FP8 support for GB300 and GB200.
  • DeepSeek V3 proxy models for 64 GB300/GB200 GPUs.
  • System info script for IB, container, and enroot diagnostics.
  • llmb-run archive command to package experiment logs into tarball.
  • Exemplar program documentation and tooling.

Changed

  • Updated recipes to NeMo 26.02.00 where applicable.
  • Llama3 LoRa finetuning ported to Megatron Bridge.
  • Torchtitan optimizations for DeepSeek V3.
  • Centralized peak throughput (TFLOP/GPU) as primary performance metric in READMEs.
  • Qwen3 235B GB200 removed FP8 support.

Removed

  • Run:ai support.

Known Issues

  • Recipes using NeMo 26.02.00 container will not work with EFA, see Known Issues section of README for workaround.
  • DeepSeek V3 on EFA clusters may encounter connectivity issues.

v25.12.02

12 Feb 17:07
42ff42a

Choose a tag to compare

Fixed

  • Pin uv to <=0.9.28 in install.sh to avoid strict parsing failures when installing pinned nemo_run commits with uv 0.9.29+.

v25.10.02

12 Feb 17:06
e9153ec

Choose a tag to compare

Fixed

  • Pin uv to <=0.9.28 in install.sh to avoid strict parsing failures when installing pinned nemo_run commits with uv 0.9.29+.

v25.08.02

12 Feb 17:05
4eca801

Choose a tag to compare

Fixed

  • Pin uv to <=0.9.28 in install.sh to avoid strict parsing failures when installing pinned nemo_run commits with uv 0.9.29+.

v25.12.01

05 Feb 21:46
cf259e0

Choose a tag to compare

[v25.12.01] - 2026-02-05

Changed

  • For Megatron Bridge models, download model configs in addition to tokenizers.
  • Add --container-writable flag to Megatron Bridge SLURM job scripts.
  • Use the passthrough packager for Megatron Bridge recipes.
  • Standardize Torchtitan log location and naming.
  • DSV3 B200 scales to match tested configurations.

Fixed

  • Inference and microbenchmark job submission.
  • Headless installation.
  • Ensure Qwen handles custom mounts correctly.
  • Resolve llmb-install Transformers version issues.
  • Llama3.1 70b scale documentation for H100.

Known Issues

  • Qwen3 requires internet connectivity and may encounter Hugging Face Hub access or rate limit errors during benchmark runs.

v25.12

08 Jan 00:02
db1b14c

Choose a tag to compare

Added

  • Qwen3 pretrain recipes 30B-A3B and 235B-A22B.
  • DeepSeek V3 Torchtitan pretrain recipe.

Changed

  • Updated recipes to NeMo 25.11.01 where applicable.
  • Consolidated llmb-run submit commands (see cli/llmb-run/CHANGELOG.md for details).

v25.10.01

06 Jan 18:09
90f511b

Choose a tag to compare

Added

  • NVCF support to inference recipes deployable via Helm Charts.
  • Offline mode support for Grok1 and Nemotron4 (15B and 340B) pretrain recipes on SLURM clusters. Tokenizers are pre-downloaded during installation and mounted into containers at runtime, eliminating the need for HuggingFace API access during workload execution.

Fixed

  • Fixed Nemotron 340B runtime failures caused by rate limiting (HTTP 429 errors) when connecting to HuggingFace Hub. The workload now operates in offline mode using pre-downloaded tokenizer files, preventing API rate limit exhaustion during training runs.

v25.10

04 Dec 18:54
8e77d60

Choose a tag to compare

[v25.10] - 2025-12-03

Added

  • GB300 support
    • Pretrain recipes: Nemotron4, Llama3.1, DS V3, Grok1 and Nemotron-H
  • Micro-benchmark for measuring CPU overhead
  • NCCL benchmark
  • Inference recipes deployable via Helm Charts for K8s platform
  • GPT OSS inference recipes for Dynamo K8s platform
  • Llama3 LoRa finetuning recipe

Changed

  • Updated DS V3, Grok1, Llama 3.1, Nemotron4 and Nemotron-H pretrain and finetune recipes to reduce install footprint
  • Updated to NeMo 25.09.00 where applicable

Removed

  • DeepSeek R1 NIM inference recipe
  • RAG Blueprint inference recipe
  • Llama4 pretrain, fine tuning, inference recipes