fix(zisk): export CUDA_ARCH as CUDA_ARCHS env for cargo build by qu0b · Pull Request #336 · eth-act/ere

qu0b · 2026-04-21T10:08:48Z

Summary

ARG CUDA_ARCH=sm_120 in docker/zisk/Dockerfile.server (and Dockerfile.cluster) is dead plumbing — the ARG is declared but never exported into the RUN cargo build … environment, so it has no effect.

As a result, the --cuda-archs N flag accepted by .github/scripts/build-image.sh is silently ignored for zisk. The build always produces an image whose embedded CUDA kernels target sm_120 (the committed default in pil2-stark/src/goldilocks/CudaArch.mk), regardless of what was requested.

Symptom

The published ghcr.io/eth-act/ere/ere-server-zisk:*-cuda images fail on any GPU that is not compute capability 12.0 (RTX 5090 / consumer Blackwell) with:

[CUDA] cudaMemcpyToSymbol(GPU_C_4, …) failed due to:
  no kernel image is available for execution on the device (209)
  at src/goldilocks/src/poseidon2_goldilocks.cu:66

This was hit locally on 2× RTX 4090 (sm_89) even when --cuda-archs 89 was explicitly passed to build-image.sh.

Fix

Strip the sm_ prefix off CUDA_ARCH and export it as CUDA_ARCHS (plural, numeric) inline on the cargo build command. That env var is what proofman-starks-lib-c/build.rs reads to generate the correct nvcc -gencode flags.

-RUN cargo build --release --package ere-server --bin ere-server --features zisk${CUDA:+,cuda} \
+RUN CUDA_ARCHS="${CUDA_ARCH#sm_}" \
+    cargo build --release --package ere-server --bin ere-server --features zisk${CUDA:+,cuda} \

Same one-line change in Dockerfile.cluster.

Root cause detail

The committed pil2-stark/src/goldilocks/CudaArch.mk hardcodes CUDA_ARCH = sm_120. The auto-detect path in configure.sh requires deviceQuery, which needs a runtime GPU — not available during a Docker build. So without CUDA_ARCHS being set, the Makefile falls back to the committed default and silently produces an sm_120 image.

Verification

Built locally with build-image.sh --zkvm zisk --tag local-sm89-cuda --base --server --cuda-archs 89 and confirmed via cuobjdump --list-elf that all 15 embedded .cubin ELF sections in /ere/bin/ere-server now report sm_89.

Running the resulting image on 2× RTX 4090 via the ethpandaops/ethereum-package zkboost stack (EIP-8025 testnet) produced real proofs:

zkboost_prove_total{proof_type="reth-zisk", status="success"} 3
zkboost_prove_duration_seconds_sum = 34.47   # ~11.5s per proof

Test plan

build-image.sh … --cuda-archs 89 produces an image whose kernels are sm_89 (verified with cuobjdump)
Resulting image runs on RTX 4090 without CUDA error 209
End-to-end zkboost + lighthouse + reth pipeline generates and verifies proofs
Upstream CI builds cleanly for --cuda-archs 120 (unchanged default behavior)

The `ARG CUDA_ARCH=sm_120` declared in docker/zisk/Dockerfile.server and Dockerfile.cluster is never actually passed to the cargo build: an ARG is a build-time variable and does not become a RUN env var without explicit export. As a result, the zisk CUDA kernel build silently falls back to the committed sm_120 default in pil2-stark/src/goldilocks/CudaArch.mk regardless of what --cuda-archs is passed to build-image.sh. The symptom is that the published ere-server-zisk:*-cuda images only contain sm_120 kernels and fail on any other GPU with CUDA error 209 (no kernel image is available for execution on the device), even when CUDA_ARCH was supposedly overridden at build time. This plumbs the value through: strip the `sm_` prefix and set CUDA_ARCHS (plural, numeric) for the RUN, which is the env var read by proofman-starks-lib-c/build.rs to generate nvcc -gencode flags. Verified by building with `--cuda-archs 89` and confirming all 15 embedded .cubin ELF sections in the resulting ere-server binary report sm_89 (via `cuobjdump --list-elf`). Running on an RTX 4090 now produces proofs instead of failing with CUDA 209.

han0110 · 2026-04-21T11:11:03Z

Thanks for the fix! I was not aware that ZisK now supports multiple cuda archs codgen by setting CUDA_ARCHS, and the old CUDA_ARCH env is ignored so you encountered the error even rebuilding with --cuda-archs 89.

I opened another PR in #337 to build and publish the image with CUDA_ARCHS=89,120, and after that we should be able to run the published image on 4090 directly.

han0110 · 2026-04-21T13:43:01Z

Could you try again on the image with fix of #337? It should support 4090 now without building it locally. https://github.com/eth-act/ere/pkgs/container/ere%2Fere-server-zisk/811652646?tag=8401f02-cuda

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(zisk): export CUDA_ARCH as CUDA_ARCHS env for cargo build#336

fix(zisk): export CUDA_ARCH as CUDA_ARCHS env for cargo build#336
qu0b wants to merge 1 commit intoeth-act:masterfrom
qu0b:fix/zisk-dockerfile-cuda-arch-dead-plumbing

qu0b commented Apr 21, 2026

Uh oh!

han0110 commented Apr 21, 2026 •

edited

Loading

Uh oh!

han0110 commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

qu0b commented Apr 21, 2026

Summary

Symptom

Fix

Root cause detail

Verification

Test plan

Uh oh!

han0110 commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

han0110 commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

han0110 commented Apr 21, 2026 •

edited

Loading