[ET-VK] Add dynamic-shape resize to q8ta ops#20312
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20312
Note: Links to docs will display an error until the docs builds have been completed. ❌ 6 New Failures, 1 Unrelated FailureAs of commit 8530ff7 with merge base 0eb8247 ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This PR needs a
|
Pull Request resolved: #20312 The q8ta (quantized int8) op `DynamicDispatchNode`s were constructed with an empty resize-args list and no resize function, so their output tensors were never `virtual_resize`d on `trigger_resize()`. On a dynamic-shape graph this froze the q8ta outputs at the build-time upper-bound shape — the same failure mode the fp32 ops already avoid. Concretely, in a quantized Vulkan-delegated graph the terminal pointwise conv produces the graph output, so a smaller input (e.g. 238 rows fed into a graph allocated at the 241-row upper bound) left stale rows that propagate downstream, where GroupNorm's global per-group statistics smear them across the whole tensor. Add resize functions across the q8ta op family, each matching that op's output-shape semantics (mirroring the corresponding fp32 op's resize): - `q8ta_conv2d` / `q8ta_conv2d_dw`: output H/W recomputed from the input via `calc_out_sizes_hw`. - `q8ta_conv2d_pw`: 1x1 conv preserves spatial dims (out H/W == in H/W). - `q8ta_conv2d_transposed`: transposed output formula via `calc_out_sizes_hw(transposed=true)` (threads `output_padding` through the dispatch, which was previously dropped). - `q8ta` im2col scratch: flattened-window `K` from channels/kernel/groups, `H_out`/`W_out` from the current input. - `q8ta_linear`: `[*input.shape[:-1], out_features]`. - `q8ta` binary: `broadcast(in_a, in_b)`. - `q8ta` quantize / dequantize: elementwise, output shape == input shape. The quantized conv/quant path now honors dynamic input shapes like the fp32 path. ghstack-source-id: 394207699 @exported-using-ghexport Differential Revision: [D108788845](https://our.internmc.facebook.com/intern/diff/D108788845/)
Pull Request resolved: #20312 The q8ta (quantized int8) op `DynamicDispatchNode`s were constructed with an empty resize-args list and no resize function, so their output tensors were never `virtual_resize`d on `trigger_resize()`. On a dynamic-shape graph this froze the q8ta outputs at the build-time upper-bound shape — the same failure mode the fp32 ops already avoid. Concretely, in a quantized Vulkan-delegated graph the terminal pointwise conv produces the graph output, so a smaller input (e.g. 238 rows fed into a graph allocated at the 241-row upper bound) left stale rows that propagate downstream, where GroupNorm's global per-group statistics smear them across the whole tensor. Add resize functions across the q8ta op family, each matching that op's output-shape semantics (mirroring the corresponding fp32 op's resize): - `q8ta_conv2d` / `q8ta_conv2d_dw`: output H/W recomputed from the input via `calc_out_sizes_hw`. - `q8ta_conv2d_pw`: 1x1 conv preserves spatial dims (out H/W == in H/W). - `q8ta_conv2d_transposed`: transposed output formula via `calc_out_sizes_hw(transposed=true)` (threads `output_padding` through the dispatch, which was previously dropped). - `q8ta` im2col scratch: flattened-window `K` from channels/kernel/groups, `H_out`/`W_out` from the current input. - `q8ta_linear`: `[*input.shape[:-1], out_features]`. - `q8ta` binary: `broadcast(in_a, in_b)`. - `q8ta` quantize / dequantize: elementwise, output shape == input shape. The quantized conv/quant path now honors dynamic input shapes like the fp32 path. ghstack-source-id: 394212019 @exported-using-ghexport Differential Revision: [D108788845](https://our.internmc.facebook.com/intern/diff/D108788845/)
Pull Request resolved: #20312 The q8ta (quantized int8) op `DynamicDispatchNode`s were constructed with an empty resize-args list and no resize function, so their output tensors were never `virtual_resize`d on `trigger_resize()`. On a dynamic-shape graph this froze the q8ta outputs at the build-time upper-bound shape — the same failure mode the fp32 ops already avoid. Concretely, in a quantized Vulkan-delegated graph the terminal pointwise conv produces the graph output, so a smaller input (e.g. 238 rows fed into a graph allocated at the 241-row upper bound) left stale rows that propagate downstream, where GroupNorm's global per-group statistics smear them across the whole tensor. Add resize functions across the q8ta op family, each matching that op's output-shape semantics (mirroring the corresponding fp32 op's resize): - `q8ta_conv2d` / `q8ta_conv2d_dw`: output H/W recomputed from the input via `calc_out_sizes_hw`. - `q8ta_conv2d_pw`: 1x1 conv preserves spatial dims (out H/W == in H/W). - `q8ta_conv2d_transposed`: transposed output formula via `calc_out_sizes_hw(transposed=true)` (threads `output_padding` through the dispatch, which was previously dropped). - `q8ta` im2col scratch: flattened-window `K` from channels/kernel/groups, `H_out`/`W_out` from the current input. - `q8ta_linear`: `[*input.shape[:-1], out_features]`. - `q8ta` binary: `broadcast(in_a, in_b)`. - `q8ta` quantize / dequantize: elementwise, output shape == input shape. The quantized conv/quant path now honors dynamic input shapes like the fp32 path. ghstack-source-id: 394414502 @exported-using-ghexport Differential Revision: [D108788845](https://our.internmc.facebook.com/intern/diff/D108788845/)
Pull Request resolved: #20312 The q8ta (quantized int8) op `DynamicDispatchNode`s were constructed with an empty resize-args list and no resize function, so their output tensors were never `virtual_resize`d on `trigger_resize()`. On a dynamic-shape graph this froze the q8ta outputs at the build-time upper-bound shape — the same failure mode the fp32 ops already avoid. Concretely, in a quantized Vulkan-delegated graph the terminal pointwise conv produces the graph output, so a smaller input (e.g. 238 rows fed into a graph allocated at the 241-row upper bound) left stale rows that propagate downstream, where GroupNorm's global per-group statistics smear them across the whole tensor. Add resize functions across the q8ta op family, each matching that op's output-shape semantics (mirroring the corresponding fp32 op's resize): - `q8ta_conv2d` / `q8ta_conv2d_dw`: output H/W recomputed from the input via `calc_out_sizes_hw`. - `q8ta_conv2d_pw`: 1x1 conv preserves spatial dims (out H/W == in H/W). - `q8ta_conv2d_transposed`: transposed output formula via `calc_out_sizes_hw(transposed=true)` (threads `output_padding` through the dispatch, which was previously dropped). - `q8ta` im2col scratch: flattened-window `K` from channels/kernel/groups, `H_out`/`W_out` from the current input. - `q8ta_linear`: `[*input.shape[:-1], out_features]`. - `q8ta` binary: `broadcast(in_a, in_b)`. - `q8ta` quantize / dequantize: elementwise, output shape == input shape. The quantized conv/quant path now honors dynamic input shapes like the fp32 path. ghstack-source-id: 394480015 @exported-using-ghexport Differential Revision: [D108788845](https://our.internmc.facebook.com/intern/diff/D108788845/)
|
@claude Review this PR |
|
Claude finished @SS-JIA's task in 8m 59s —— View job PR Review: Add dynamic-shape resize to q8ta ops
Reviewed against I verified the supporting infra and ArgGroup layouts, and the resize logic checks out:
A few minor, non-blocking notes: 1. The 2. Resize-function duplication. 3. 4. CI failures appear unrelated. The Dr. CI failures (arm-tosa, qnn-llama, lora-multimethod, voxtral) are infra/docker exit-code failures and SQNR jobs on other backends — none touch Vulkan q8ta. Confirm the Vulkan custom_ops job specifically passed. No correctness issues found. Nice fix. |
e751675
into
gh/SS-JIA/558/base
Pull Request resolved: #20312 The q8ta (quantized int8) op `DynamicDispatchNode`s were constructed with an empty resize-args list and no resize function, so their output tensors were never `virtual_resize`d on `trigger_resize()`. On a dynamic-shape graph this froze the q8ta outputs at the build-time upper-bound shape — the same failure mode the fp32 ops already avoid. Concretely, in a quantized Vulkan-delegated graph the terminal pointwise conv produces the graph output, so a smaller input (e.g. 238 rows fed into a graph allocated at the 241-row upper bound) left stale rows that propagate downstream, where GroupNorm's global per-group statistics smear them across the whole tensor. Add resize functions across the q8ta op family, each matching that op's output-shape semantics (mirroring the corresponding fp32 op's resize): - `q8ta_conv2d` / `q8ta_conv2d_dw`: output H/W recomputed from the input via `calc_out_sizes_hw`. - `q8ta_conv2d_pw`: 1x1 conv preserves spatial dims (out H/W == in H/W). - `q8ta_conv2d_transposed`: transposed output formula via `calc_out_sizes_hw(transposed=true)` (threads `output_padding` through the dispatch, which was previously dropped). - `q8ta` im2col scratch: flattened-window `K` from channels/kernel/groups, `H_out`/`W_out` from the current input. - `q8ta_linear`: `[*input.shape[:-1], out_features]`. - `q8ta` binary: `broadcast(in_a, in_b)`. - `q8ta` quantize / dequantize: elementwise, output shape == input shape. The quantized conv/quant path now honors dynamic input shapes like the fp32 path. ghstack-source-id: 394480015 @exported-using-ghexport Differential Revision: [D108788845](https://our.internmc.facebook.com/intern/diff/D108788845/)
Pull Request resolved: #20312 The q8ta (quantized int8) op `DynamicDispatchNode`s were constructed with an empty resize-args list and no resize function, so their output tensors were never `virtual_resize`d on `trigger_resize()`. On a dynamic-shape graph this froze the q8ta outputs at the build-time upper-bound shape — the same failure mode the fp32 ops already avoid. Concretely, in a quantized Vulkan-delegated graph the terminal pointwise conv produces the graph output, so a smaller input (e.g. 238 rows fed into a graph allocated at the 241-row upper bound) left stale rows that propagate downstream, where GroupNorm's global per-group statistics smear them across the whole tensor. Add resize functions across the q8ta op family, each matching that op's output-shape semantics (mirroring the corresponding fp32 op's resize): - `q8ta_conv2d` / `q8ta_conv2d_dw`: output H/W recomputed from the input via `calc_out_sizes_hw`. - `q8ta_conv2d_pw`: 1x1 conv preserves spatial dims (out H/W == in H/W). - `q8ta_conv2d_transposed`: transposed output formula via `calc_out_sizes_hw(transposed=true)` (threads `output_padding` through the dispatch, which was previously dropped). - `q8ta` im2col scratch: flattened-window `K` from channels/kernel/groups, `H_out`/`W_out` from the current input. - `q8ta_linear`: `[*input.shape[:-1], out_features]`. - `q8ta` binary: `broadcast(in_a, in_b)`. - `q8ta` quantize / dequantize: elementwise, output shape == input shape. The quantized conv/quant path now honors dynamic input shapes like the fp32 path. ghstack-source-id: 394480015 @exported-using-ghexport Differential Revision: [D108788845](https://our.internmc.facebook.com/intern/diff/D108788845/)
Stack from ghstack (oldest at bottom):
The q8ta (quantized int8) op
DynamicDispatchNodes were constructed with an empty resize-args list and no resize function, so their output tensors were nevervirtual_resized ontrigger_resize(). On a dynamic-shape graph this froze the q8ta outputs at the build-time upper-bound shape — the same failure mode the fp32 ops already avoid. Concretely, in a quantized Vulkan-delegated graph the terminal pointwise conv produces the graph output, so a smaller input (e.g. 238 rows fed into a graph allocated at the 241-row upper bound) left stale rows that propagate downstream, where GroupNorm's global per-group statistics smear them across the whole tensor.Add resize functions across the q8ta op family, each matching that op's output-shape semantics (mirroring the corresponding fp32 op's resize):
q8ta_conv2d/q8ta_conv2d_dw: output H/W recomputed from the input viacalc_out_sizes_hw.q8ta_conv2d_pw: 1x1 conv preserves spatial dims (out H/W == in H/W).q8ta_conv2d_transposed: transposed output formula viacalc_out_sizes_hw(transposed=true)(threadsoutput_paddingthrough the dispatch, which was previously dropped).q8taim2col scratch: flattened-windowKfrom channels/kernel/groups,H_out/W_outfrom the current input.q8ta_linear:[*input.shape[:-1], out_features].q8tabinary:broadcast(in_a, in_b).q8taquantize / dequantize: elementwise, output shape == input shape.The quantized conv/quant path now honors dynamic input shapes like the fp32 path.
Differential Revision: D108788845