[ExecuTorch][WebGPU] Add mul op with full broadcast (aten.mul.Tensor) by JulianCloudNTH · Pull Request #20358 · pytorch/executorch

JulianCloudNTH · 2026-06-17T23:59:29Z

Stack from ghstack (oldest at bottom):

-> [ExecuTorch][WebGPU] Add mul op with full broadcast (aten.mul.Tensor) #20358
[ExecuTorch][WebGPU] Consolidate landed-op tests into the cases.py op-test framework #20357
[ExecuTorch][WebGPU] Op-test codegen framework (cases.py -> generated .pte+golden -> gtest driver) #20339

Adds aten.mul.Tensor to the WebGPU delegate with full PyTorch broadcast, plus the shared runtime/ops/TensorMeta.h per-tensor uniform that broadcast ops reuse. Mul is on the Llama critical path — F.silu decomposes to sigmoid + mul, and SwiGLU multiplies two same-shape activations (the fast path).

Composition (single dispatch):

TensorMeta.h (NEW) — 48-byte std140 {ndim, numel, sizes[4], strides[4]} UBO mirroring Vulkan's per-tensor BufferMetadata; fill_tensor_meta_broadcast right-aligns operand dims (rank>4 throws); static_assert(sizeof==48).
mul/BinaryOp.cpp — builds 3 TensorMeta UBOs (out/in1/in2 at bindings 3/4/5), guards fp32 + rank≤4, 1D-dispatches over compute_1d_workgroup_count(numel), releases all uniforms after the bind group.
mul/binary_mul.wgsl — same-shape fast path + a broadcast path (delinearize output index, clamp each input coord per-dim to size-1, relinearize on input strides).
WebGPUUtils.h — adds the shared utils::make_uniform helper (first use).
@exported-using-ghexport

Differential Revision: D108793167

[ghstack-poisoned]

pytorch-bot · 2026-06-17T23:59:33Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20358

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures, 6 Pending, 2 Unrelated Failures

As of commit b2df1c9 with merge base eb7473b ():

NEW FAILURES - The following jobs have failed:

Propose to merge ghstack orig PRs to main / Try to create a PR with ghstack /orig branch (gh)
Process completed with exit code 1.
pull / android / run-emulator (gh)
The process '/usr/bin/sh' failed with exit code 1
pull / test-arm-backend-no-driver (test_pytest_models_tosa) / linux-job (gh)
RuntimeError: Command docker exec -t 87e3eca68f37b25d53b605730ad71a24ca258f68017fc4a66d0bedb53d80627d /exec failed with exit code 1
pull / test-voxtral-realtime-xnnpack-linux / linux-job (gh)
RuntimeError: Command docker exec -t e316b415060de10bea4d15d193523435e3e7b2d8978a407d27b21b73068296c3 /exec failed with exit code 1
pull / unittest / linux / linux-job (gh)
examples/models/test/test_export.py::ExportTest::test_efficient_sam_export_to_executorch

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / test-moshi-linux / linux-job (gh) (matched linux rule in flaky-rules.json)
E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/m/mesa/mesa-vdpau-drivers_23.2.1-1ubuntu3.1%7e22.04.3_amd64.deb 404 Not Found [IP: 104.20.28.246 80]
pull / test-qnn-testsuite-linux / test-backend-linux (qnn, models) / linux-job (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-06-18T00:05:14Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

[ghstack-poisoned]

Pull Request resolved: #20358 Adds `aten.mul.Tensor` to the WebGPU delegate with full PyTorch broadcast, plus the shared `runtime/ops/TensorMeta.h` per-tensor uniform that broadcast ops reuse. Mul is on the Llama critical path — `F.silu` decomposes to `sigmoid` + `mul`, and SwiGLU multiplies two same-shape activations (the fast path). Composition (single dispatch): - `TensorMeta.h` (NEW) — 48-byte std140 `{ndim, numel, sizes[4], strides[4]}` UBO mirroring Vulkan's per-tensor `BufferMetadata`; `fill_tensor_meta_broadcast` right-aligns operand dims (rank>4 throws); `static_assert(sizeof==48)`. - `mul/BinaryOp.cpp` — builds 3 `TensorMeta` UBOs (out/in1/in2 at bindings 3/4/5), guards fp32 + rank≤4, 1D-dispatches over `compute_1d_workgroup_count(numel)`, releases all uniforms after the bind group. - `mul/binary_mul.wgsl` — same-shape fast path + a broadcast path (delinearize output index, clamp each input coord per-dim to size-1, relinearize on input strides). - `WebGPUUtils.h` — adds the shared `utils::make_uniform` helper (first use). ghstack-source-id: 394848336 @exported-using-ghexport Differential Revision: [D108793167](https://our.internmc.facebook.com/intern/diff/D108793167/)

meta-codesync · 2026-06-18T21:16:23Z

This pull request has been merged in 0e65ba6.

Update

a226873

[ghstack-poisoned]

JulianCloudNTH requested review from kirklandsign and larryliu0820 as code owners June 17, 2026 23:59

JulianCloudNTH temporarily deployed to cadence June 17, 2026 23:59 — with GitHub Actions Inactive

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 17, 2026

JulianCloudNTH requested a review from psiddh June 18, 2026 15:54

Update

b2df1c9

[ghstack-poisoned]

JulianCloudNTH temporarily deployed to cadence June 18, 2026 16:07 — with GitHub Actions Inactive

meta-codesync Bot added the meta-exported label Jun 18, 2026

psiddh approved these changes Jun 18, 2026

View reviewed changes

JulianCloudNTH mentioned this pull request Jun 18, 2026

WebGPU op-test framework + mul op manual merge #20389

Merged

SS-JIA closed this in 0e65ba6 Jun 18, 2026

SS-JIA had a problem deploying to cherry-pick-bot June 18, 2026 21:15 — with GitHub Actions Failure

meta-codesync Bot added the Merged label Jun 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ExecuTorch][WebGPU] Add mul op with full broadcast (aten.mul.Tensor)#20358

[ExecuTorch][WebGPU] Add mul op with full broadcast (aten.mul.Tensor)#20358
JulianCloudNTH wants to merge 2 commits into
gh/JulianCloudNTH/33/basefrom
gh/JulianCloudNTH/33/head

JulianCloudNTH commented Jun 17, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 18, 2026

Uh oh!

meta-codesync Bot commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

JulianCloudNTH commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20358

❌ 5 New Failures, 6 Pending, 2 Unrelated Failures

Uh oh!

github-actions Bot commented Jun 18, 2026

This PR needs a release notes: label

Uh oh!

meta-codesync Bot commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JulianCloudNTH commented Jun 17, 2026 •

edited

Loading

pytorch-bot Bot commented Jun 17, 2026 •

edited

Loading

This PR needs a `release notes:` label