Skip to content

[Bug] [Web] [WebGPU] [WebGL] Conv with auto_pad=SAME_UPPER produces wrong values #26734

@RReverser

Description

@RReverser

Describe the issue

As the title says, when Conv node is used on WebGPU with strides>1 and auto_pad e.g. "SAME_UPPER", it produces values that are very much off the expected ones. Wasm and WebNN backends match the Python values well. I'm guessing padding in WebGPU gone wrong?

To reproduce

Here's the absolutely minimal onnxscript-based repro.

repro.py
repro.html
repro.js

To run, just use repro.py to generate two .onnx models and start a static server with repro.html and repro.js.

You'll see a table like this (I've included WebGL just for completeness, I'm aware it's deprecated):

  wasm webgl webgpu webnn (cpu) webnn (gpu)
Stride 1 (reference)
Range: [8.925, 51.957]
MAD: 0.000
Range: [8.925, 51.957]
MAD: 0.000
Range: [8.925, 51.957]
MAD: 0.000
Range: [8.925, 51.957]
MAD: 0.024
Range: [8.919, 51.927]
Stride 2 (reference)
Range: [9.601, 54.654]
MAD: 1.658
Range: [10.279, 53.070]
MAD: 1.658
Range: [10.279, 53.070]
MAD: 0.000
Range: [9.601, 54.654]
MAD: 0.024
Range: [9.595, 54.622]

plus the dumps of raw data in the console.

P.S. This one took a long time to narrow down for silly reasons, especially since the original symptom was like "the images processed by a unet model looking slightly but visibly wrong, but only when I'm using dynamic shape model despite passing same dimensions as for the static one".

Initially I suspected the Resize nodes, which proved a waste of time, turns out for static model onnxslim was just optimising out autopad with explicit correct pads.

In the end I ended up writing throwaway scripts to bisect the ONNX graph and dump all intermediate outputs. This helped immensely and allowed to narrow down on the specific node where values diverge between backends very quickly, not counting the time of writing those scripts.

It would be great if ONNXRuntime had similar built-in bisect & comparison tools, like Git or C-Reduce, as it would allow to save days of debugging and help report detailed issues a lot quicker.

Urgency

No response

Platform

Web Browser

OS Version

Chromium 142.0.7444.220

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.23.2; tried the latest @dev available on npm too

ONNX Runtime API

JavaScript

Architecture

Other / Unknown

Execution Provider

Other / Unknown

Execution Provider Library Version

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    api:Javascriptissues related to the Javascript APIep:WebGPUort-web webgpu providerep:WebNNWebNN execution providerplatform:webissues related to ONNX Runtime web; typically submitted using template

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions