Skip to content

Enable backend test suite + x86 CI (#19964)#19986

Closed
JulianCloudNTH wants to merge 3 commits into
pytorch:mainfrom
JulianCloudNTH:export-D107288999
Closed

Enable backend test suite + x86 CI (#19964)#19986
JulianCloudNTH wants to merge 3 commits into
pytorch:mainfrom
JulianCloudNTH:export-D107288999

Conversation

@JulianCloudNTH

Copy link
Copy Markdown
Contributor

Summary:

Wires the WebGPU backend into the standard ExecuTorch backend test suite and adds an x86 Linux CI job, mirroring the Vulkan delegate: backends/test/suite/flows/webgpu.py plus a WebGPUTester, run by oss/.github/workflows/test-backend-webgpu.yml on SwiftShader (a software Vulkan adapter, via wgpu-native, minimal dependencies, no GPU).

Two fixes were needed for SwiftShader's downlevel limits: request the adapter's full requiredLimits at device creation (software adapters default storage-buffer limits to 0), and make the add op's workgroup size dynamic instead of a hardcoded constant. The WGSL now declares a pipeline-overridable override wg_size: u32 = 256 and the host clamps it to the device's maxComputeInvocationsPerWorkgroup (256 on real GPUs and lavapipe, 128 on SwiftShader), so SwiftShader's 128-invocation cap no longer forces a smaller workgroup size on real hardware. This mirrors the dynamic-workgroup-sizing approach in D107259348 and opens the door to selecting device/algorithm-optimal sizes later. The add op also validates its 1D dispatch count before allocating any GPU objects, against the device's queried maxComputeWorkgroupsPerDimension (falling back to the WebGPU spec-default floor of 65535 only when the limit query fails). Per Stephen's review, the workgroup-size clamp and the dispatch-count computation are factored into reusable inline helpers in runtime/WebGPUUtils.h (clamp_workgroup_size and compute_1d_workgroup_count, mirroring the Vulkan delegate's utils::div_up) so the other ops can share them rather than re-inlining the logic. The editable CMake build additionally marks the vulkan_schema subdirectory EXCLUDE_FROM_ALL so the WebGPU ALL build does not pull in targets that need glslc.
ghstack-source-id: 389222646
exported-using-ghexport

Differential Revision: D107288999

Summary:

The Vulkan serializer that the WebGPU backend reuses stores every non-empty constant in the PTE's named-data map with `offset == UINT64_MAX` and a `named_key`, rather than inline in the VK00 blob. `WebGPUGraph::build` previously handled only inline constants, so a delegated op's constant weights were never uploaded and the op produced all zeros. `build` now also fetches named-data constants via `NamedDataMap::get_data`, mirroring the path `VulkanBackend` already uses. `aten.add` was unaffected since it has no constant tensors; the first consumer is the `rms_norm` op in the child diff.
ghstack-source-id: 389182397
exported-using-ghexport

Reviewed By: SS-JIA

Differential Revision: D107288998
Summary:

Adds the `et_vk.rms_norm.default` operator to the WebGPU backend: a WGSL compute shader using a cooperative tree reduction, one workgroup per row. The shader mirrors the Vulkan implementation (`backends/vulkan/runtime/graph/ops/impl/RmsNorm.cpp`, `backends/vulkan/runtime/graph/ops/glsl/rms_norm_buffer.glsl`); indexing assumes contiguous fp32 inputs. The handler fails loud (throws, mirroring Vulkan's `VK_CHECK_COND`) on invalid shape/dtype/dispatch-limit conditions, and defaults `eps` to the float32 machine epsilon.

The weight constant is uploaded via the named-data path added in the parent diff.
ghstack-source-id: 389206169
exported-using-ghexport

Reviewed By: SS-JIA

Differential Revision: D106887028
Summary:

Wires the WebGPU backend into the standard ExecuTorch backend test suite and adds an x86 Linux CI job, mirroring the Vulkan delegate: `backends/test/suite/flows/webgpu.py` plus a `WebGPUTester`, run by `oss/.github/workflows/test-backend-webgpu.yml` on SwiftShader (a software Vulkan adapter, via `wgpu-native`, minimal dependencies, no GPU).

Two fixes were needed for SwiftShader's downlevel limits: request the adapter's full `requiredLimits` at device creation (software adapters default storage-buffer limits to 0), and make the `add` op's workgroup size dynamic instead of a hardcoded constant. The WGSL now declares a pipeline-overridable `override wg_size: u32 = 256` and the host clamps it to the device's `maxComputeInvocationsPerWorkgroup` (256 on real GPUs and lavapipe, 128 on SwiftShader), so SwiftShader's 128-invocation cap no longer forces a smaller workgroup size on real hardware. This mirrors the dynamic-workgroup-sizing approach in D107259348 and opens the door to selecting device/algorithm-optimal sizes later. The `add` op also validates its 1D dispatch count before allocating any GPU objects, against the device's queried `maxComputeWorkgroupsPerDimension` (falling back to the WebGPU spec-default floor of 65535 only when the limit query fails). Per Stephen's review, the workgroup-size clamp and the dispatch-count computation are factored into reusable `inline` helpers in `runtime/WebGPUUtils.h` (`clamp_workgroup_size` and `compute_1d_workgroup_count`, mirroring the Vulkan delegate's `utils::div_up`) so the other ops can share them rather than re-inlining the logic. The editable CMake build additionally marks the `vulkan_schema` subdirectory `EXCLUDE_FROM_ALL` so the WebGPU `ALL` build does not pull in targets that need glslc.
ghstack-source-id: 389222646
exported-using-ghexport

Differential Revision: D107288999
@pytorch-bot

pytorch-bot Bot commented Jun 3, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19986

Note: Links to docs will display an error until the docs builds have been completed.

❌ You can merge normally! (1 Unrelated Failure), 2 Unclassified Failures

As of commit 2310c6d with merge base 22a2daf (image):

UNCLASSIFIED FAILURES - DrCI could not classify the following jobs because the workflow did not run on the merge base. The failures may be pre-existing on trunk or introduced by this PR:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 3, 2026
@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant