Skip to content

[ET-VK][patterns] Fuse torchao 4-bit quantized embedding to embedding_q4gsw#20381

Open
SS-JIA wants to merge 1 commit into
gh/SS-JIA/560/basefrom
gh/SS-JIA/560/head
Open

[ET-VK][patterns] Fuse torchao 4-bit quantized embedding to embedding_q4gsw#20381
SS-JIA wants to merge 1 commit into
gh/SS-JIA/560/basefrom
gh/SS-JIA/560/head

Conversation

@SS-JIA

@SS-JIA SS-JIA commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Stack from ghstack (oldest at bottom):

TISO and other torchao-quantized models emit a torchao.dequantize_affine -> aten.embedding subgraph for their weight-only int4 quantized embedding. The existing QuantizedEmbeddingMatch only matches the quantized_decomposed.embedding_4bit.dtype fused op, so the torchao embedding never fused: its dequantize_affine const-folded to an fp32 weight, the resulting aten.embedding exceeded the buffer-element limit and fell back to CPU, and the fp32 constant bloated the serialized model.

This adds a separate TorchAOQuantizedEmbeddingMatch matcher that recognizes the torchao int4 dequantize_affine -> aten.embedding shape (qmin=-8/qmax=7, per-row group block_size [1, G]) and rewrites it to the existing et_vk.embedding_q4gsw.default op, repacking the unpacked int8 weight into the packed 4-bit layout. It asserts symmetric quantization (zero_point == 0, which the shader assumes) and guards against repacking a shared/tied weight more than once via an et_vk_embedding_q4gsw_packed meta flag. It is kept as a separate class from QuantizedEmbeddingMatch because the two dialects produce different graph shapes (one fused op vs a split dequant+gather), so a single class would only co-locate two disjoint parse paths.

On the en_US TISO backbone the embedding now delegates to Vulkan instead of falling back to CPU, and the serialized .pte drops from 418 MiB to 348 MiB.

This change was authored with Claude.

Differential Revision: D108457797

[ghstack-poisoned]
@pytorch-bot

pytorch-bot Bot commented Jun 18, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20381

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures, 3 Unrelated Failures

As of commit 7783e2f with merge base 23f9021 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 18, 2026
@linux-foundation-easycla

Copy link
Copy Markdown

CLA Missing ID

  • ❌ The email address for the commit (7783e2f) is not linked to the GitHub account, preventing the EasyCLA check. Consult this Help Article and GitHub Help to resolve. (To view the commit's email address, add .patch at the end of this PR page's URL.) For further assistance with EasyCLA, please visit our EasyCLA portal and chat with our support bot.

@github-actions

Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant