Skip to content

Commit 79dca63

Browse files
JulianCloudNTHfacebook-github-bot
authored andcommitted
Refresh backend README with progress timeline (#20115)
Summary: Update the WebGPU backend README to reflect the current state of the backend: - Add a Progress section listing milestones landed on `main` (#18808, #19963, #19964, #19981, #20036) and work in review (#20079, #20080), each linking its pull request. - Update the operator support table to include `rms_norm` and refresh the planned/roadmap list toward end-to-end LLM inference. - Update the directory structure to match the current layout. Docs-only change; no code or build impact. Differential Revision: D107742574
1 parent 189ffaa commit 79dca63

1 file changed

Lines changed: 41 additions & 11 deletions

File tree

backends/webgpu/README.md

Lines changed: 41 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,26 @@
22

33
Run ExecuTorch models on the GPU via [WebGPU](https://www.w3.org/TR/webgpu/). The backend compiles delegated subgraphs into WGSL compute shaders executed natively through [wgpu-native](https://github.com/gfx-rs/wgpu-native) (Metal on macOS, Vulkan on Linux/Windows).
44

5-
> **Status: Prototype.** The backend supports a single operator today and is under active development. See [TODO.md](TODO.md) for the roadmap.
5+
> **Status: Prototype.** The backend supports `add` and `rms_norm` today and is under active development. See [Progress](#progress) for shipped milestones.
6+
7+
## Progress
8+
9+
Milestones landed on `main`:
10+
11+
| Date | Milestone | Pull Request |
12+
|---|---|---|
13+
| 2026-04 | Made it possible to run ExecuTorch models on the GPU through WebGPU — built the backend from the ground up, including the runtime delegate that builds the GPU graph (buffers, pipelines, bind groups) and runs the model on Metal and Vulkan | [#18808](https://github.com/pytorch/executorch/pull/18808) |
14+
| 2026-06 | Grew model support beyond element-wise operators — added the root-mean-square normalization operator (`rms_norm`) and named-data weight loading | [#19963](https://github.com/pytorch/executorch/pull/19963) |
15+
| 2026-06 | Made sure every change is automatically tested — added WebGPU to ExecuTorch's standard backend test suite, running on Linux/x86 in CI | [#19964](https://github.com/pytorch/executorch/pull/19964) |
16+
| 2026-06 | Removed a class of bugs and manual upkeep — the WGSL shaders are now generated automatically, with a build-time check that fails the build on shader/source drift | [#19981](https://github.com/pytorch/executorch/pull/19981) |
17+
| 2026-06 | Got the test suite to actually run work on the GPU — added operator-allowlist delegation (unsupported operations fall back to the CPU) and a process-wide GPU device context, so models execute on the GPU during testing | [#20036](https://github.com/pytorch/executorch/pull/20036) |
18+
19+
In review:
20+
21+
| Milestone | Pull Request |
22+
|---|---|
23+
| Makes testing match the WebGPU standard exactly — switches the tests to Google's Dawn shader compiler (Tint, the source-of-truth WGSL implementation) running on SwiftShader for headless GPU execution | [#20079](https://github.com/pytorch/executorch/pull/20079) |
24+
| Strengthens correctness for models that run in several GPU passes — adds dispatch-ordering and scratch-buffer (temporary GPU memory) tests | [#20080](https://github.com/pytorch/executorch/pull/20080) |
625

726
## Architecture
827

@@ -36,8 +55,9 @@ Key design choices:
3655
| Operator | WGSL Shader | Notes |
3756
|---|---|---|
3857
| `aten.add.Tensor` | `binary_add.wgsl` | Element-wise with alpha: `out = in1 + alpha * in2` |
58+
| `et_vk.rms_norm.default` | `rms_norm.wgsl` | Root-mean-square normalization |
3959

40-
**Planned:** `sub`, `mul`, `relu`, `linear` (matmul), `softmax`, `layer_norm`
60+
**Planned:** scaled-dot-product attention (KV cache), quantized linear (4-bit weight-only and 8da4w post-training quantization), quantized embedding, RoPE, `mul`, `sigmoid`, and shape ops (`view`, `permute`, `slice`, `select`, `cat`, `squeeze`/`unsqueeze`).
4161

4262
## Quick Start
4363

@@ -83,27 +103,37 @@ This runs Python export tests, exports a .pte, builds the native runtime, and va
83103
backends/webgpu/
84104
├── CMakeLists.txt
85105
├── README.md
86-
├── TODO.md
87106
├── runtime/
88107
│ ├── WebGPUBackend.h/cpp # BackendInterface (init/execute)
89108
│ ├── WebGPUGraph.h/cpp # GPU graph: buffers, pipelines, dispatch
90109
│ ├── WebGPUDelegateHeader.h/cpp # VH00 header parser
91110
│ ├── WebGPUDevice.h/cpp # wgpu-native device abstraction
111+
│ ├── WebGPUUtils.h # Workgroup-size helpers
92112
│ └── ops/
93113
│ ├── OperatorRegistry.h/cpp # Op dispatch table
94-
│ └── add/
95-
│ ├── BinaryOp.cpp # aten.add.Tensor implementation
96-
│ ├── binary_add.wgsl # WGSL shader source
97-
│ └── binary_add_wgsl.h # Shader as C++ string constant
114+
│ ├── add/
115+
│ │ ├── BinaryOp.cpp # aten.add.Tensor implementation
116+
│ │ ├── binary_add.wgsl # WGSL shader source
117+
│ │ └── binary_add_wgsl.h # Shader as C++ string constant
118+
│ └── rms_norm/
119+
│ ├── RmsNorm.cpp # et_vk.rms_norm implementation
120+
│ ├── rms_norm.wgsl # WGSL shader source
121+
│ └── rms_norm_wgsl.h # Shader as C++ string constant
98122
├── scripts/
99-
│ └── setup-wgpu-native.sh # Download wgpu-native binaries
123+
│ ├── setup-wgpu-native.sh # Download wgpu-native binaries
124+
│ └── gen_wgsl_headers.py # Generate the embedded *_wgsl.h shader headers
100125
└── test/
101126
├── conftest.py
127+
├── tester.py # Partitioner stages + supported-op list
102128
├── test_build_webgpu.sh # End-to-end build + test
103129
├── test_webgpu_native.cpp # C++ native test runner
104-
└── ops/
105-
└── add/
106-
└── test_add.py # Python export tests
130+
├── test_wgsl_codegen.py # Shader codegen check
131+
├── native/ # C++ operator tests
132+
└── ops/ # Python export tests
133+
├── add/
134+
│ └── test_add.py # add export tests
135+
└── rms_norm/
136+
└── test_rms_norm.py # rms_norm export tests
107137
```
108138

109139
## Requirements

0 commit comments

Comments
 (0)