Releases · NVIDIA/dgxc-benchmarking · GitHub

22 May 18:32

nvmarnold

v26.05 Latest

Latest

Added

Kimi K2 MXFP8 pretrain support.
Nemotron 3 Nano (30B) and Super (120B) pretrain recipes.
Slurm topology checks and CPU governor reporting in the system info microbenchmark.
llmb-run job history and log handling.
llmb-run flags: --env for container env overrides, additional Slurm pass-through flags, and dump-env Megatron-Bridge mode.

Changed

Updated recipes to NeMo 26.04.00 where applicable.
Refreshed DeepSeek V3, Nemotron 3, and Qwen3 configurations.

Fixed

Legacy-parser grad-norm NaN handling.
Archive exclusion for nsys_profile and PyTorch profiling output directories.
Torchtitan container compatibility.

Removed

Deprecated Grok1 and Nemotron4 recipes.
Legacy setup_script installer path and Conda support.
Deprecated llmb-run commands.

Known Issues

DeepSeek V3 Megatron-Bridge on H100 requires uv <=0.9.28 during setup.
EFA limitations remain for DeepSeek V3 (Megatron-Bridge H100, TorchTitan) and Qwen3 (30B H100, 235B H100); see Known Issues section of README for details.
Optional PCT fixed-core CPU binding may improve select workloads on Granite Rapids systems where PCT is enabled. See the README Known Issues section before applying the patch.

End of Support

LLMB v25.12.x and earlier are no longer supported as of v26.05.00. These release lines will not receive further updates, fixes, or support.

Assets 2

24 Apr 20:46

sudostock

v26.02.01

Added

Llama3 LoRa finetuning support for B300 and B200.
PyTorch Profiler support for selected Megatron-Bridge recipes, including DeepSeek V3, GPT-OSS 120B, Llama3.1, Nemotron-H, Qwen3, and Llama3 LoRa finetuning.

Changed

Updated recipes to NeMo 26.02.01 where applicable.
Refreshed Blackwell recipe configurations, including GPT-OSS 120B, Qwen3, and Llama3.1.

Fixed

Improved llmb-install reliability when resuming failed installs, creating virtual environments, and auto-detecting SLURM GRES on heterogeneous partitions.
Improved llmb-run submit validation and error messages for explicit workload selections.

Known Issues

Qwen3 on select B300 Granite Rapids systems may benefit from the optional qwen3/pretrain/b300_numa_cpu_pinning.patch workaround when PCT is available and enabled.
EFA incompatibility for certain recipes, see Known Issues section of README
for more details.

Assets 2

24 Mar 16:23

sudostock

v26.02

Added

B300 support
- Pretrain recipes: Llama 3.1, DeepSeek V3, Nemotron-H, Qwen3
- NCCL benchmark
- CPU overhead microbenchmark
GPT-OSS pretrain recipe.
DeepSeek V3 Torchtitan FP8 support for GB300 and GB200.
DeepSeek V3 proxy models for 64 GB300/GB200 GPUs.
System info script for IB, container, and enroot diagnostics.
llmb-run archive command to package experiment logs into tarball.
Exemplar program documentation and tooling.

Changed

Updated recipes to NeMo 26.02.00 where applicable.
Llama3 LoRa finetuning ported to Megatron Bridge.
Torchtitan optimizations for DeepSeek V3.
Centralized peak throughput (TFLOP/GPU) as primary performance metric in READMEs.
Qwen3 235B GB200 removed FP8 support.

Removed

Run:ai support.

Known Issues

Recipes using NeMo 26.02.00 container will not work with EFA, see Known Issues section of README for workaround.
DeepSeek V3 on EFA clusters may encounter connectivity issues.

Assets 2

12 Feb 17:07

sudostock

v25.12.02

Fixed

Pin uv to <=0.9.28 in install.sh to avoid strict parsing failures when installing pinned nemo_run commits with uv 0.9.29+.

Assets 2

12 Feb 17:06

sudostock

v25.10.02

Fixed

Pin uv to <=0.9.28 in install.sh to avoid strict parsing failures when installing pinned nemo_run commits with uv 0.9.29+.

Assets 2

12 Feb 17:05

sudostock

v25.08.02

Fixed

Pin uv to <=0.9.28 in install.sh to avoid strict parsing failures when installing pinned nemo_run commits with uv 0.9.29+.

Assets 2

05 Feb 21:46

sudostock

v25.12.01

[v25.12.01] - 2026-02-05

Changed

For Megatron Bridge models, download model configs in addition to tokenizers.
Add --container-writable flag to Megatron Bridge SLURM job scripts.
Use the passthrough packager for Megatron Bridge recipes.
Standardize Torchtitan log location and naming.
DSV3 B200 scales to match tested configurations.

Fixed

Inference and microbenchmark job submission.
Headless installation.
Ensure Qwen handles custom mounts correctly.
Resolve llmb-install Transformers version issues.
Llama3.1 70b scale documentation for H100.

Known Issues

Qwen3 requires internet connectivity and may encounter Hugging Face Hub access or rate limit errors during benchmark runs.

Assets 2

08 Jan 00:02

sudostock

v25.12

Added

Qwen3 pretrain recipes 30B-A3B and 235B-A22B.
DeepSeek V3 Torchtitan pretrain recipe.

Changed

Updated recipes to NeMo 25.11.01 where applicable.
Consolidated llmb-run submit commands (see cli/llmb-run/CHANGELOG.md for details).

Assets 2

06 Jan 18:09

sudostock

v25.10.01

Added

NVCF support to inference recipes deployable via Helm Charts.
Offline mode support for Grok1 and Nemotron4 (15B and 340B) pretrain recipes on SLURM clusters. Tokenizers are pre-downloaded during installation and mounted into containers at runtime, eliminating the need for HuggingFace API access during workload execution.

Fixed

Fixed Nemotron 340B runtime failures caused by rate limiting (HTTP 429 errors) when connecting to HuggingFace Hub. The workload now operates in offline mode using pre-downloaded tokenizer files, preventing API rate limit exhaustion during training runs.

Assets 2

04 Dec 18:54

sudostock

v25.10

[v25.10] - 2025-12-03

Added

GB300 support
- Pretrain recipes: Nemotron4, Llama3.1, DS V3, Grok1 and Nemotron-H
Micro-benchmark for measuring CPU overhead
NCCL benchmark
Inference recipes deployable via Helm Charts for K8s platform
GPT OSS inference recipes for Dynamo K8s platform
Llama3 LoRa finetuning recipe

Changed

Updated DS V3, Grok1, Llama 3.1, Nemotron4 and Nemotron-H pretrain and finetune recipes to reduce install footprint
Updated to NeMo 25.09.00 where applicable

Removed

DeepSeek R1 NIM inference recipe
RAG Blueprint inference recipe
Llama4 pretrain, fine tuning, inference recipes

Assets 2