Skip to content

[cuda backend] reduce memory consumption on gemma4_31b by running embedding in int8 #1407

[cuda backend] reduce memory consumption on gemma4_31b by running embedding in int8

[cuda backend] reduce memory consumption on gemma4_31b by running embedding in int8 #1407

Triggered via pull request June 17, 2026 20:16
Status Success
Total duration 51m 15s
Artifacts 3

test-backend-webgpu.yml

on: pull_request
Matrix: test-webgpu / test-backend-linux
Matrix: test-webgpu / test-backend-macos
Waiting for pending jobs
test-webgpu  /  package-golden-artifacts
1m 35s
test-webgpu / package-golden-artifacts
Fit to window
Zoom out
Zoom in

Annotations

3 warnings
test-webgpu / test-backend-linux (webgpu, models) / linux-job
Node.js 20 is deprecated. The following actions target Node.js 20 but are being forced to run on Node.js 24: ./test-infra/.github/actions/setup-ssh, actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683, actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02, pmeier/pytest-results-action@a2c1430e2bddadbad9f49a6f9b879f062c6b19b1. For more information see: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/
test-webgpu / test-backend-linux (webgpu, operators) / linux-job
Node.js 20 is deprecated. The following actions target Node.js 20 but are being forced to run on Node.js 24: ./test-infra/.github/actions/setup-ssh, actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683, actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02, pmeier/pytest-results-action@a2c1430e2bddadbad9f49a6f9b879f062c6b19b1. For more information see: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/
test-webgpu / package-golden-artifacts
Node.js 20 is deprecated. The following actions target Node.js 20 but are being forced to run on Node.js 24: actions/download-artifact@v4, actions/upload-artifact@v4, seemethere/upload-artifact-s3@v5. For more information see: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/

Artifacts

Produced during runtime
Name Size Digest
golden-artifacts-webgpu
414 MB
sha256:d10fd6a3013c63729998f9ff3156d28b2e44ac0c36a8cfa71a28e35476e530a4
test-report-webgpu-models
414 MB
sha256:1f6d131568d4865b9df531c9b51504845675e1e15d3f35548d55a1863344cc31
test-report-webgpu-operators
2.35 MB
sha256:1b082039ccf35158c0b58358e2aefa9c2e7d12da8ac4a71717a19cfddc2cc1c1