From 34387e386d72caa544d324599430b3e844c939f4 Mon Sep 17 00:00:00 2001 From: akrishnakanth Date: Tue, 21 Apr 2026 13:13:24 +0530 Subject: [PATCH 1/2] Update README.md --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 871237e2..4defadb7 100644 --- a/README.md +++ b/README.md @@ -95,7 +95,8 @@ Models | GPU Machine Type | **Qwen3 32B** | [G4 (NVIDIA RTX PRO 6000 Blackwell)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#g4-series) | vLLM | Inference | GCE | [Link](./inference/g4/single-host-serving/vllm/README.md) | **Llama3.1 70B** | [G4 (NVIDIA RTX PRO 6000 Blackwell)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#g4-series) | TensorRT-LLM | Inference | GCE | [Link](./inference/g4/llama3_1_70b/single-host-serving/tensorrt-llm/README.md) | **DeepSeek R1** | [G4 (NVIDIA RTX PRO 6000 Blackwell)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#g4-series) | TensorRT-LLM | Inference | GCE | [Link](./inference/g4/deepseek_r1/single-host-serving/tensorrt-llm/README.md) - +| **Qwen3 235B** | [G4 (NVIDIA RTX PRO 6000 Blackwell)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#g4-series) | TensorRT-LLM | Inference | GCE | [Link](./inference/g4/qwen3_235b/single-host-serving/tensorrt-llm/README.md) +| **Wan2.2 14B** | [G4 (NVIDIA RTX PRO 6000 Blackwell)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#g4-series) | TensorRT-LLM | Inference | GCE | [Link](./inference/g4/wan2.2/sglang/README.md) ### Checkpointing benchmarks Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe From 3e6a8cb7f2ef3ee412242082359645a23b01c711 Mon Sep 17 00:00:00 2001 From: akrishnakanth Date: Tue, 21 Apr 2026 14:08:09 +0530 Subject: [PATCH 2/2] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 4defadb7..35cea74e 100644 --- a/README.md +++ b/README.md @@ -96,7 +96,7 @@ Models | GPU Machine Type | **Llama3.1 70B** | [G4 (NVIDIA RTX PRO 6000 Blackwell)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#g4-series) | TensorRT-LLM | Inference | GCE | [Link](./inference/g4/llama3_1_70b/single-host-serving/tensorrt-llm/README.md) | **DeepSeek R1** | [G4 (NVIDIA RTX PRO 6000 Blackwell)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#g4-series) | TensorRT-LLM | Inference | GCE | [Link](./inference/g4/deepseek_r1/single-host-serving/tensorrt-llm/README.md) | **Qwen3 235B** | [G4 (NVIDIA RTX PRO 6000 Blackwell)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#g4-series) | TensorRT-LLM | Inference | GCE | [Link](./inference/g4/qwen3_235b/single-host-serving/tensorrt-llm/README.md) -| **Wan2.2 14B** | [G4 (NVIDIA RTX PRO 6000 Blackwell)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#g4-series) | TensorRT-LLM | Inference | GCE | [Link](./inference/g4/wan2.2/sglang/README.md) +| **Wan2.2 14B** | [G4 (NVIDIA RTX PRO 6000 Blackwell)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#g4-series) | SGLang | Inference | GCE | [Link](./inference/g4/wan2.2/sglang/README.md) ### Checkpointing benchmarks Models | GPU Machine Type | Framework | Workload Type | Orchestrator | Link to the recipe