[Web] fp16 and q4f16 Gemma 3 models produce invalid outputs on WebGPU due to overflow in ONNX runtime

### Describe the issue

When running Gemma 3 (270M) models exported to ONNX produces unusable outputs using WebGPU and fp16.

The same model work correctly on:

* ONNX Runtime CPU / WASM
* WebGPU when using fp32

This strongly suggests an overflow / numerical stability issue inside ONNX Runtime’s WebGPU fp16 execution path, likely identical or related to issue [[WebGPU] fp16 nanochat produces NaNs (CPU works fine)](https://github.com/microsoft/onnxruntime/issues/26367)

Maybe this has also to do with the [unsloth finding](https://docs.unsloth.ai/models/gemma-3-how-to-run-and-fine-tune#gemma-3-fixes-analysis) that activations become infinity for `float16`?

### To reproduce

The failure can be reproduced with a minimal Transformers.js script:

`main.js`:
```
import { pipeline } from "@huggingface/transformers";

const generator = await pipeline(
  "text-generation",
  "onnx-community/gemma-3-270m-it-ONNX",
  { 
    dtype: 'fp16',
    device: 'webgpu',
  },
);

const messages = [
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "What is the capital of France?" },
];

const output = await generator(messages, { max_new_tokens: 128 });
console.log(output[0].generated_text.at(-1).content);
```

HTML to try it out:
```
<!doctype html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title>Transformers.js Browser Demo</title>
  <script type="importmap">
  {
    "imports": {
      "@huggingface/transformers": "https://cdn.jsdelivr.net/npm/@huggingface/transformers"
    }
  }
  </script>
</head>
<body>
  <h1>Transformers.js Browser Demo</h1>
  <p>Open the browser console to see the generated output.</p>
  <script type="module" src="./main.js"></script>
</body>
</html>
```

For model `HuggingFaceTB/SmolLM2-360M-Instruct` with `dtype: 'fp16'` and `device: 'webgpu'` it returns:
```
The capital of France is Paris.
```

For model `onnx-community/gemma-3-270m-it-ONNX` with `dtype: 'fp16'` and `device: 'webgpu'` it returns:
```
<unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56>
```

For model `onnx-community/gemma-3-270m-it-ONNX` with `dtype: 'fp16'` and `device: 'wasm'` it returns:
```
The capital of France is Paris.
```

For model  `onnx-community/gemma-3-270m-it-ONNX` with `dtype: 'q4f16'` and `device: 'webgpu'` it returns:
```
<unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56><unused56>
```

For model  `onnx-community/gemma-3-270m-it-ONNX` with `dtype: 'q4f16'` and `device: 'wasm'` it returns:
```
The capital of France is Paris.
```

### Urgency

High.

Gemma-3-270M is designed for edge and client-side inference, especially WebGPU.
Because FP16 and Q4F16 are the intended fast modes, this overflow prevents the model from being usable on ONNX Runtime WebGPU in real deployments.

This affects:

* ONNX Runtime-based inference pipelines in browsers
* Transformers.js WebGPU backend
* Any WebGPU client expecting FP16 performance


### ONNX Runtime Installation

Released Package

### ONNX Runtime Version or Commit ID

1.23.2

### Execution Provider

'webgpu' (WebGPU)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Web] fp16 and q4f16 Gemma 3 models produce invalid outputs on WebGPU due to overflow in ONNX runtime #26732

Describe the issue

To reproduce

Urgency

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

Execution Provider

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Web] fp16 and q4f16 Gemma 3 models produce invalid outputs on WebGPU due to overflow in ONNX runtime #26732

Description

Describe the issue

To reproduce

Urgency

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

Execution Provider

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions