Skip to content

Commit fe1e20b

Browse files
feat(blog): GB300 vs GB200 NVL72 on DSv4-Pro — up to 2.83x throughput/GPU (#391)
* feat(blog): GB300 vs GB200 NVL72 on DSv4-Pro — up to 2.83x throughput/GPU Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(blog): use semianalysis.com/ai-cloud-tco-model link for TCO citation Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(blog): update date to 2026-05-27 --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 7fbbd12 commit fe1e20b

6 files changed

Lines changed: 182 additions & 1 deletion

File tree

.claude/skills/write-inferencex-blog/SKILL.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ Common gotchas:
3636
- **Workload mismatch**: chart headers can mislead. Verify ISL/OSL from the data itself — 1k/1k and 8k/1k give wildly different `tok/s/GPU` and `$/M tokens` numbers. The blog title, lede, tables, and chart caption must all use the same ISL/OSL.
3737
- **Latest run only**: filter to the highest `run_attempt` per `github_run_id`, then take the latest `date` per `(config_id, conc, isl, osl)`. See the `inferencex-data` skill for the exact filter.
3838
- **Model spec verification**: never invent parameter counts. Always `WebSearch` the model's released specs (total params, active params, expert count, attention type) before writing the architecture paragraph. Cite sources. GLM-5 is _not_ GLM-4.5 — the numbers changed.
39-
- **TCO values**: pull from the [SemiAnalysis AI Cloud TCO Model](https://newsletter.semianalysis.com/p/ai-cloud-economics). Current values (verify if older than a quarter):
39+
- **TCO values**: pull from the [SemiAnalysis AI Cloud TCO Model](https://semianalysis.com/ai-cloud-tco-model/). Current values (verify if older than a quarter):
4040
- H100 $1.30, H200 $1.41, B200 $1.95, B300 $2.34, GB200 $2.21, GB300 $2.652
4141
- MI300X $1.12, MI325X $1.28, MI355X $1.48
4242
- **Cost per million tokens formula**: `$/M tok = TCO_$/GPU/hr * 1e6 / (3600 * tput_per_gpu)`. Equivalently in Python: `cost = tco / (3600 * tput / 1e6)`. Throughput is per-GPU, so GPU count cancels out for aggregated configs.

0 commit comments

Comments
 (0)