Enable backend test suite + x86 CI (#19964) by JulianCloudNTH · Pull Request #19986 · pytorch/executorch

JulianCloudNTH · 2026-06-03T20:56:39Z

Summary:

Wires the WebGPU backend into the standard ExecuTorch backend test suite and adds an x86 Linux CI job, mirroring the Vulkan delegate: backends/test/suite/flows/webgpu.py plus a WebGPUTester, run by oss/.github/workflows/test-backend-webgpu.yml on SwiftShader (a software Vulkan adapter, via wgpu-native, minimal dependencies, no GPU).

Two fixes were needed for SwiftShader's downlevel limits: request the adapter's full requiredLimits at device creation (software adapters default storage-buffer limits to 0), and make the add op's workgroup size dynamic instead of a hardcoded constant. The WGSL now declares a pipeline-overridable override wg_size: u32 = 256 and the host clamps it to the device's maxComputeInvocationsPerWorkgroup (256 on real GPUs and lavapipe, 128 on SwiftShader), so SwiftShader's 128-invocation cap no longer forces a smaller workgroup size on real hardware. This mirrors the dynamic-workgroup-sizing approach in D107259348 and opens the door to selecting device/algorithm-optimal sizes later. The add op also validates its 1D dispatch count before allocating any GPU objects, against the device's queried maxComputeWorkgroupsPerDimension (falling back to the WebGPU spec-default floor of 65535 only when the limit query fails). Per Stephen's review, the workgroup-size clamp and the dispatch-count computation are factored into reusable inline helpers in runtime/WebGPUUtils.h (clamp_workgroup_size and compute_1d_workgroup_count, mirroring the Vulkan delegate's utils::div_up) so the other ops can share them rather than re-inlining the logic. The editable CMake build additionally marks the vulkan_schema subdirectory EXCLUDE_FROM_ALL so the WebGPU ALL build does not pull in targets that need glslc.
ghstack-source-id: 389222646
exported-using-ghexport

Differential Revision: D107288999

Summary: The Vulkan serializer that the WebGPU backend reuses stores every non-empty constant in the PTE's named-data map with `offset == UINT64_MAX` and a `named_key`, rather than inline in the VK00 blob. `WebGPUGraph::build` previously handled only inline constants, so a delegated op's constant weights were never uploaded and the op produced all zeros. `build` now also fetches named-data constants via `NamedDataMap::get_data`, mirroring the path `VulkanBackend` already uses. `aten.add` was unaffected since it has no constant tensors; the first consumer is the `rms_norm` op in the child diff. ghstack-source-id: 389182397 exported-using-ghexport Reviewed By: SS-JIA Differential Revision: D107288998

Summary: Adds the `et_vk.rms_norm.default` operator to the WebGPU backend: a WGSL compute shader using a cooperative tree reduction, one workgroup per row. The shader mirrors the Vulkan implementation (`backends/vulkan/runtime/graph/ops/impl/RmsNorm.cpp`, `backends/vulkan/runtime/graph/ops/glsl/rms_norm_buffer.glsl`); indexing assumes contiguous fp32 inputs. The handler fails loud (throws, mirroring Vulkan's `VK_CHECK_COND`) on invalid shape/dtype/dispatch-limit conditions, and defaults `eps` to the float32 machine epsilon. The weight constant is uploaded via the named-data path added in the parent diff. ghstack-source-id: 389206169 exported-using-ghexport Reviewed By: SS-JIA Differential Revision: D106887028

Summary: Wires the WebGPU backend into the standard ExecuTorch backend test suite and adds an x86 Linux CI job, mirroring the Vulkan delegate: `backends/test/suite/flows/webgpu.py` plus a `WebGPUTester`, run by `oss/.github/workflows/test-backend-webgpu.yml` on SwiftShader (a software Vulkan adapter, via `wgpu-native`, minimal dependencies, no GPU). Two fixes were needed for SwiftShader's downlevel limits: request the adapter's full `requiredLimits` at device creation (software adapters default storage-buffer limits to 0), and make the `add` op's workgroup size dynamic instead of a hardcoded constant. The WGSL now declares a pipeline-overridable `override wg_size: u32 = 256` and the host clamps it to the device's `maxComputeInvocationsPerWorkgroup` (256 on real GPUs and lavapipe, 128 on SwiftShader), so SwiftShader's 128-invocation cap no longer forces a smaller workgroup size on real hardware. This mirrors the dynamic-workgroup-sizing approach in D107259348 and opens the door to selecting device/algorithm-optimal sizes later. The `add` op also validates its 1D dispatch count before allocating any GPU objects, against the device's queried `maxComputeWorkgroupsPerDimension` (falling back to the WebGPU spec-default floor of 65535 only when the limit query fails). Per Stephen's review, the workgroup-size clamp and the dispatch-count computation are factored into reusable `inline` helpers in `runtime/WebGPUUtils.h` (`clamp_workgroup_size` and `compute_1d_workgroup_count`, mirroring the Vulkan delegate's `utils::div_up`) so the other ops can share them rather than re-inlining the logic. The editable CMake build additionally marks the `vulkan_schema` subdirectory `EXCLUDE_FROM_ALL` so the WebGPU `ALL` build does not pull in targets that need glslc. ghstack-source-id: 389222646 exported-using-ghexport Differential Revision: D107288999

pytorch-bot · 2026-06-03T20:56:44Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19986

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ You can merge normally! (1 Unrelated Failure), 2 Unclassified Failures

As of commit 2310c6d with merge base 22a2daf ():

UNCLASSIFIED FAILURES - DrCI could not classify the following jobs because the workflow did not run on the merge base. The failures may be pre-existing on trunk or introduced by this PR:

Test WebGPU Backend / test-webgpu / test-backend-linux (webgpu, models) / linux-job (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
RuntimeError: Command docker exec -t 69fd10082d8df717b86f19fc4de9baa738133808de8394a51c2bade1df2ed0ad /exec failed with exit code 1
Test WebGPU Backend / test-webgpu / test-backend-linux (webgpu, operators) / linux-job (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
RuntimeError: Command docker exec -t 9db1d29c566252c208ac1b9a76d6e1430266066462cfaf2bf67bcbe847411c99 /exec failed with exit code 1

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

trunk / test-arm-backend-ethos-u (test_smaller_stories_llama) / linux-job (gh) (trunk failure)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-06-03T20:57:36Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

JulianCloudNTH added 3 commits June 3, 2026 13:55

JulianCloudNTH force-pushed the export-D107288999 branch from eac1473 to 2310c6d Compare June 3, 2026 20:56

JulianCloudNTH requested review from kirklandsign and larryliu0820 as code owners June 3, 2026 20:56

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 3, 2026

JulianCloudNTH closed this Jun 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable backend test suite + x86 CI (#19964)#19986

Enable backend test suite + x86 CI (#19964)#19986
JulianCloudNTH wants to merge 3 commits into
pytorch:mainfrom
JulianCloudNTH:export-D107288999

JulianCloudNTH commented Jun 3, 2026

Uh oh!

pytorch-bot Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JulianCloudNTH commented Jun 3, 2026

Uh oh!

pytorch-bot Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19986

❌ You can merge normally! (1 Unrelated Failure), 2 Unclassified Failures

Uh oh!

github-actions Bot commented Jun 3, 2026

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pytorch-bot Bot commented Jun 3, 2026 •

edited

Loading

This PR needs a `release notes:` label