For cloud partners using this repo to work toward NVIDIA Exemplar Cloud status.
Benchmarking your cloud's performance across a suite of real AI workloads ensures consistent, high-quality GPU performance for end users.
Validating the performance of your platform is a transparent way to demonstrate that your unique cloud delivers cutting-edge throughput and provides excellent perf/TCO value.
There are two common usage patterns for this repo:
- Achieving NVIDIA-validated Exemplar performance by demonstrating > 95% baseline performance for all workloads in a given test suite.
- Using Recipes ad hoc independently for hardware, cluster, or software performance checks, e.g. during maintenance windows or during new hardware bring-up / NPI.
- Exemplar performance is validated per chip, with test suites available for GB300, GB200, B300, B200, and H100 chips.
- Benchmarking scripts ("Recipes") are released every ~2 months, evolving as common AI workload patterns evolve and new models ship. You can find Recipe versions as tags/releases in this repo.
While the benchmarks can be run independently, we recommend looping in your NVIDIA account team before you begin. We have performance experts available to help with questions about the test suite and to help investigate tuning opportunities as needed.
- At this time, Exemplar tests require Slurm clusters. Instructions for system and cluster requirements can be found on the main README of this repo.
- You must use the latest version of the recipes available at the time you begin testing and continue to use that same version for the entirety of your exemplar certification process.
- During install, you will be prompted for workload selection. If you select 'Exemplar Cloud', the full Exemplar test suite for your selected GPU type will be installed.
- Validate the system by running the prescreen test.
llmb-run exemplarThis launches the full Exemplar test suite for the installed GPU type. llmb-run is the recommended tool for executing the suite; the exemplar subcommand is a convenience that launches every required workload in one go.
If individual workloads fail, you can re-run them on their own — Exemplar requires a passing run for each workload in the suite, not a single end-to-end execution. See the llmb-run README for repeat and profiling behavior.
- Package your results for submission using
llmb-run archive - This creates a compressed
.tar.zstfile under$LLMB_INSTALL/containing all experiment logs and configuration metadata. Profiling data is excluded to keep the archive compact — share profiles separately if requested. See the llmb-run README for options. - Submit the archive to your NVIDIA account team for review.
Work with your NVIDIA account team to investigate any tuning opportunities with NVIDIA performance experts.
If approved, your cloud is recognized as an NVIDIA Exemplar Cloud for the selected platform(s). NVIDIA is happy to collaborate to support downstream efforts highlighting your achievement.
- Periodically re-run recipes and maintain performance vs. updated baselines to ensure the platform is delivering optimal perf/value for end users.
- An Exemplar validation from NVIDIA is valid for 12 months.
To start, contact your NVIDIA account team and reference this DGX Cloud Benchmarking repo.
Scale: 512 GPUs | Repeats: 1 | Profiling: disabled
| Model | Size | Dtypes |
|---|---|---|
| DeepSeek-V3 | 671B | BF16, FP8, NVFP4 |
| GPT (OSS) | 120B | BF16 |
| Kimi-K2 | 1T | FP8 |
| Llama 3.1 | 405B | FP8, NVFP4 |
| Llama 3.1 | 70B | FP8, NVFP4 |
| Nemotron-H | 56B | FP8 |
| Nemotron 3 | 120B | BF16, FP8, NVFP4 |
| Qwen3 | 235B | BF16 |
| Model | Size | Dtypes |
|---|---|---|
| DeepSeek-V3 | 671B | BF16, FP8, NVFP4 |
| GPT (OSS) | 120B | BF16 |
| Kimi-K2 | 1T | FP8 |
| Llama 3.1 | 405B | FP8, NVFP4 |
| Llama 3.1 | 70B | FP8, NVFP4 |
| Nemotron-H | 56B | FP8 |
| Qwen3 | 235B | BF16 |
| Model | Size | Dtypes |
|---|---|---|
| DeepSeek-V3 | 671B | BF16 |
| GPT (OSS) | 120B | BF16 |
| Llama 3.1 | 405B | FP8, NVFP4 |
| Llama 3.1 | 70B | FP8, NVFP4 |
| Nemotron-H | 56B | FP8 |
| Nemotron 3 | 120B | BF16 |
| Qwen3 | 235B | BF16 |
| Model | Size | Dtypes |
|---|---|---|
| DeepSeek-V3 | 671B | BF16, FP8 |
| GPT (OSS) | 120B | BF16 |
| Kimi-K2 | 1T | FP8 |
| Llama 3.1 | 405B | FP8, NVFP4 |
| Llama 3.1 | 70B | FP8, NVFP4 |
| Nemotron-H | 56B | FP8 |
| Nemotron 3 | 120B | BF16, FP8 |
| Qwen3 | 235B | BF16 |
| Model | Size | Dtypes |
|---|---|---|
| DeepSeek-V3 | 671B | FP8 |
| GPT (OSS) | 120B | BF16 |
| Llama 3.1 | 70B | BF16, FP8 |
| Nemotron-H | 56B | FP8 |
| Qwen3 | 235B | BF16 |