-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
Describe the issue
As the title says, when Conv node is used on WebGPU with strides>1 and auto_pad e.g. "SAME_UPPER", it produces values that are very much off the expected ones. Wasm and WebNN backends match the Python values well. I'm guessing padding in WebGPU gone wrong?
To reproduce
Here's the absolutely minimal onnxscript-based repro.
To run, just use repro.py to generate two .onnx models and start a static server with repro.html and repro.js.
You'll see a table like this (I've included WebGL just for completeness, I'm aware it's deprecated):
| wasm | webgl | webgpu | webnn (cpu) | webnn (gpu) | |
|---|---|---|---|---|---|
| Stride 1 | (reference) Range: [8.925, 51.957] |
MAD: 0.000 Range: [8.925, 51.957] |
MAD: 0.000 Range: [8.925, 51.957] |
MAD: 0.000 Range: [8.925, 51.957] |
MAD: 0.024 Range: [8.919, 51.927] |
| Stride 2 | (reference) Range: [9.601, 54.654] |
MAD: 1.658 Range: [10.279, 53.070] |
MAD: 1.658 Range: [10.279, 53.070] |
MAD: 0.000 Range: [9.601, 54.654] |
MAD: 0.024 Range: [9.595, 54.622] |
plus the dumps of raw data in the console.
P.S. This one took a long time to narrow down for silly reasons, especially since the original symptom was like "the images processed by a unet model looking slightly but visibly wrong, but only when I'm using dynamic shape model despite passing same dimensions as for the static one".
Initially I suspected the Resize nodes, which proved a waste of time, turns out for static model onnxslim was just optimising out autopad with explicit correct pads.
In the end I ended up writing throwaway scripts to bisect the ONNX graph and dump all intermediate outputs. This helped immensely and allowed to narrow down on the specific node where values diverge between backends very quickly, not counting the time of writing those scripts.
It would be great if ONNXRuntime had similar built-in bisect & comparison tools, like Git or C-Reduce, as it would allow to save days of debugging and help report detailed issues a lot quicker.
Urgency
No response
Platform
Web Browser
OS Version
Chromium 142.0.7444.220
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.23.2; tried the latest @dev available on npm too
ONNX Runtime API
JavaScript
Architecture
Other / Unknown
Execution Provider
Other / Unknown
Execution Provider Library Version
No response