Skip to content

[FEA]: agent-go Dockerfile and container CI #220

Description

@rice-riley

Summary

Add a new containers/agent-go.Dockerfile that produces a small static Go binary on a distroless base from the agent/go/ source tree, plus container build + push jobs in .github/workflows/agent-go-ci.yaml. The image is published under a separate name (agent-go) — the production agent image and its Python build pipeline at containers/agent.Dockerfile and .github/workflows/agent-ci.yaml remain completely untouched. (#213 already added the agent/.dockerignore and paths: exclusion that keep the new subdir invisible to the Python image and Python CI; this PR builds on top of those guards, it does not re-introduce them.)

Depends on #219 (a complete, working agent binary).

Motivation

Until this PR there's no shippable artifact for the Go agent — only go test in CI. #221 needs a container image to run the existing operator-agent chainsaw suite against, and that suite is the parity safety net for the cutover. So this PR is the bridge between "Go code passes unit tests" and "Go agent runs in a real cluster against the real operator".

Publishing under a separate name (agent-go rather than overwriting agent) means the production image keeps moving normally even if this CI lane breaks, and lets #221 explicitly pin both images to compare behavior.

Feature description

Proposed direction

1. Dockerfile

A two-stage build:

ARG GO_VERSION=1.26.2
FROM golang:${GO_VERSION} AS builder

ARG AGENT_VERSION
ARG GIT_SHA

WORKDIR /code
COPY agent/go/ ./
RUN CGO_ENABLED=0 GOOS=linux go build \
    -mod=vendor \
    -ldflags "-s -w -X main.version=${AGENT_VERSION} -X main.gitSHA=${GIT_SHA}" \
    -o /out/agent ./cmd/agent

FROM nvcr.io/nvidia/distroless/base:v${DISTROLESS_VERSION}

ARG AGENT_VERSION
ARG GIT_SHA
ARG DISTROLESS_VERSION

LABEL org.opencontainers.image.base.name="nvcr.io/nvidia/distroless/base:v${DISTROLESS_VERSION}" \
      org.opencontainers.image.licenses="Apache-2.0" \
      org.opencontainers.image.title="skyhook-agent-go" \
      org.opencontainers.image.version="${AGENT_VERSION}" \
      org.opencontainers.image.revision="${GIT_SHA}"

COPY --from=builder /out/agent /usr/local/bin/agent

USER 0:0

ENTRYPOINT ["/usr/local/bin/agent"]

Notes:

  • Build context is the repo root, not agent/go/ — that's why the COPY is agent/go/ ./ rather than ./ ./. Doing it this way keeps the buildx invocation symmetrical with the Python one (which runs from agent/ with context .) and lets us keep the Dockerfile under containers/. The CI docker buildx build invocation should pass . (repo root) as context with -f containers/agent-go.Dockerfile.
  • Base image: distroless base (not static) — the agent calls syscall.Chroot and exec's host binaries, so it needs basic libc-equivalent paths; base is the safer choice while still being small. If pure static works in [FEA]: Run operator-agent chainsaw suite against the Go image #221's e2e, switch later.
  • USER 0:0 is required (the Chroot call needs CAP_SYS_CHROOT). Match the Python image's choice with the same comment.
  • ARGs mirror the Python Dockerfile so the GitHub Actions matrix can pass the same values.
  • Schemas (internal/config/schemas/v1/*.json) are embedded into the binary via embed.FS from [FEA]: Config loader with embedded JSON schemas + Steps.validate #214 — no COPY for them.

2. CI extension

Append to .github/workflows/agent-go-ci.yaml (which today only does test + lint from #213):

  • compute-metadata job equivalent to Python's, deriving AGENT_VERSION and AGENT_IMAGE_TAG from agent/v* git tags.
  • build-agent-go matrix job (linux/amd64 + linux/arm64), buildx, push to ${REGISTRY}/${IMAGE_NAME}/agent-go:${TAG}-${PLATFORM_TAG}. Mirrors the Python build-agent job exactly.
  • create-manifest job that combines the platform-specific tags into a multi-arch manifest at agent-go:${TAG} and generates the supply-chain attestation.

Both jobs gate on the test job from #213 succeeding first.

3. Tag stream

Two options for image tags during the transition:

  • A (recommended for simplicity): tag agent-go with the same version stream as the Python agent (e.g. agent-go:6.4.1, agent-go:latest). Cutover ([FEA]: Cutover: flip default to Go, delete Python, flatten dir, update docs #222) just swaps which Dockerfile builds the agent image; existing tags don't need re-mapping.
  • B: maintain separate version streams (agent-go:0.1.0). Lets us iterate the Go binary without colliding with Python release cadence, but creates two CHANGELOG.md streams.

Recommend A. Document the choice in the PR description.

4. Tests

There is nothing to "unit test" about a Dockerfile, but the CI job itself is the test:

  • Build succeeds on both platforms.
  • Multi-arch manifest pushes successfully.
  • docker run --rm $IMAGE --version prints the expected version (smoke-test step).

Scope boundaries

In scope:

  • The Dockerfile, the new CI jobs, the multi-arch manifest, the smoke-test.

Out of scope:

Acceptance criteria

  • containers/agent-go.Dockerfile builds on linux/amd64 and linux/arm64.
  • A multi-arch image is pushed to GHCR under agent-go.
  • The image's --version output matches the build args.
  • Final image size ≤ Python image size (rough sanity check; static Go binary on distroless base should land well under the Python wheel install).
  • Existing Python agent image at agent/{version} continues to build and push unchanged on the same commit.
  • New jobs are scoped via paths: so they don't run on operator-only changes.

Open questions

  • Distroless base vs static: try static first; if Chroot or DNS resolution fails in [FEA]: Run operator-agent chainsaw suite against the Go image #221's e2e, fall back to base. Document the choice.
  • Tag stream: A or B above. Recommend A.
  • Should we publish with provenance enabled now, or wait for cutover? Match the Python workflow's choice (it currently sets --provenance=false).
  • Multi-arch QEMU vs native runners: Python workflow uses native arm runners (ubuntu-24.04-arm). Use the same to keep build times reasonable.

References (codebase)

Alternatives considered

  • Build the Go binary outside Docker (Goreleaser) and COPY it in. Rejected — keeps the existing CI shape (buildx + manifest) so cutover ([FEA]: Cutover: flip default to Go, delete Python, flatten dir, update docs #222) is just a Dockerfile content swap.
  • Skip the Dockerfile and run the Go binary directly in operator-agent tests via a sidecar container. Rejected — the operator's image-pull path is part of what we're testing.

Code of Conduct

  • I agree to follow Skyhook's Code of Conduct.

Metadata

Metadata

Assignees

Labels

component/agentSkyhook agent (package executor)
No fields configured for Enhancement.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions