feat(docker): ROCm/HIP image variant (:rocm)#355
Merged
Conversation
AMD/HIP sibling of the cuda12 image so we publish both ghcr.io/luce-org/lucebox-hub:cuda12 and :rocm. - Dockerfile.rocm: rocm/dev-ubuntu base, DFLASH27B_GPU_BACKEND=hip, hipblas/rocblas, ggml-hip rpath + ld.so, gfx1151 default. Carries the same COPY server/share status.html fix as the cuda Dockerfile. - docker-bake.hcl: DFLASH_HIP_ARCHES + ROCM_VERSION vars; _rocm-base / rocm / rocm-local targets; all group builds cuda + rocm. - docker.yml: rocm joins the build matrix; Dockerfile.rocm in PR paths. - README: AMD run command (--device /dev/kfd --device /dev/dri) in the Docker quick-start. gfx1151 (Strix Halo) by default; widen via DFLASH_HIP_ARCHES. BSA is CUDA-only and disabled for HIP. Co-Authored-By: WOZCODE <contact@withwoz.com>
Contributor
There was a problem hiding this comment.
7 issues found across 15 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name=".github/workflows/ci.yml">
<violation number="1" location=".github/workflows/ci.yml:23">
P2: `ruff check .` is a no-op because `[tool.ruff] include = []` excludes every file. The step passes with zero files checked, giving a false sense of lint coverage.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
| run: bash scripts/check_uv_workspace.sh | ||
|
|
||
| - name: Lint Python surfaces touched by lucebox tooling | ||
| run: uv run --frozen --extra dev ruff check . |
Contributor
There was a problem hiding this comment.
P2: ruff check . is a no-op because [tool.ruff] include = [] excludes every file. The step passes with zero files checked, giving a false sense of lint coverage.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .github/workflows/ci.yml, line 23:
<comment>`ruff check .` is a no-op because `[tool.ruff] include = []` excludes every file. The step passes with zero files checked, giving a false sense of lint coverage.</comment>
<file context>
@@ -10,20 +10,23 @@ jobs:
run: bash scripts/check_uv_workspace.sh
+ - name: Lint Python surfaces touched by lucebox tooling
+ run: uv run --frozen --extra dev ruff check .
+
build:
</file context>
- docker.yml: add a paths filter to the push:[main] trigger so the publish build only fires when an image-affecting file changes, not on every main commit (was a ~2h full-arch rebuild per commit, e.g. a docs typo). - README: widen the Docker image to 62% with a framed border so it reads at harness-hero size instead of a thin strip; GPU/tag table beside it; add explicit install steps (docker pull + model download + run) for both cuda12 and rocm, not just the run command. Co-Authored-By: WOZCODE <contact@withwoz.com>
- Makefile serve: publish on host :8000 (was :8080) to match the documented OpenAI endpoint. (cubic P2) - Dockerfile + Dockerfile.rocm: drop the unused docker.io runtime package (it pulls in containerd + iptables); the entrypoint never shells out to docker, only references it in comments. Slims the image and trims privileged tooling. (cubic P3) Co-Authored-By: WOZCODE <contact@withwoz.com>
- pyproject: drop the conflicting [dependency-groups] dev (pytest-only) so [project.optional-dependencies] dev (pytest+mypy+ruff, used via uv sync --extra dev) is the single source of truth. Re-locked. - Dockerfile + Dockerfile.rocm: install uv via a pinned COPY --from=ghcr.io/astral-sh/uv:0.11.2 instead of curl | sh (no remote installer runs at build time; version fixed). - ruff gate is now real instead of a no-op: include scoped to the host-CLI tooling (harness/, scripts/) with F/I/UP/B (line-length/style staged out). Auto-fixed + hand-fixed the 15 violations there; server internals stay staged-excluded until cleaned. - build_image.sh: document the untagged-tree pinned sha tag. ruff check . and uv lock --check both pass. Co-Authored-By: WOZCODE <contact@withwoz.com>
easel
pushed a commit
to easel/lucebox-hub
that referenced
this pull request
Jun 9, 2026
…x hosts) Runtime verification on lucebox2 (gfx1151 Strix Halo, host ROCm 7.2.2): the 6.4.1-userspace image finds the device but SIGSEGVs during model load (and reports a bogus 1.28 TB VRAM total) — a 6.4.x-userspace / 7.x-host-driver mismatch. Rebuilding the same image with ROCM_VERSION=7.2.2 works end-to-end: server up, /health + /props OK, coherent chat completion at 12 tok/s decode on the iGPU. Default both Dockerfile.rocm and docker-bake.hcl to 7.2.2 so the published :rocm image runs on current ROCm 7.x host stacks; 6.4.x remains available via the build arg for hosts still on a 6.x driver. Co-Authored-By: WOZCODE <contact@withwoz.com>
…trix Halo) Revert the 7.2.2 default: the 7.2.x stack has shown intermittent problems on Strix Halo, so the published :rocm image stays on 6.4.1. The 6.4.x-userspace / 7.x-host-driver segfault from the previous commit remains documented in Dockerfile.rocm + docker-bake.hcl: on a ROCm 7.x host, rebuild with ROCM_VERSION=7.2.2 to match the host driver. Co-Authored-By: WOZCODE <contact@withwoz.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds the AMD/ROCm sibling of the cuda12 docker image so we publish both
:cuda12and:rocmtoghcr.io/luce-org/lucebox-hub.What's in it
Dockerfile.rocm:rocm/dev-ubuntubase,DFLASH27B_GPU_BACKEND=hip, hipblas/rocblas, ggml-hip rpath + ld.so, gfx1151 default. Carries the sameCOPY server/sharestatus.html fix as the cuda Dockerfile (the bug fixed in build(docker): lucebox-hub container image + CI release pipeline #334).docker-bake.hcl:DFLASH_HIP_ARCHES+ROCM_VERSIONvars;_rocm-base/rocm/rocm-localtargets; anallgroup that builds cuda + rocm.docker.yml:rocmjoins the build matrix;Dockerfile.rocmadded to the PR paths filter.--device /dev/kfd --device /dev/dri) in the Docker quick-start.Notes
DFLASH_HIP_ARCHES(gfx1100/gfx1200/gfx942/gfx90a) for a broader image.🧙 Built with WOZCODE