You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The flip. After #213–#221 land, this single small PR repoints the production agent image build at the Go binary, deletes the Python source, flattens agent/go/ up one level into agent/, and updates the changelog and docs. The previous issue (#221) has already proven the Go image passes the full operator-agent chainsaw suite, so this is a content swap, not a behavior change.
It is intentionally a single PR so it's easy to revert with one click if a downstream consumer hits something the e2e suite missed.
Motivation
Until now Python and Go have shipped side-by-side: Python in the production agent image, Go in a separate agent-go image (built from the nested agent/go/ source tree) only used in CI. That parallelism is the safety net while the rewrite is in flight; once parity is proven (#221), maintaining two implementations indefinitely would be pure tech debt.
This PR is small but high-impact. Reviewers should focus on:
Are we sure no consumer is pinned to the Python wheel rather than the image? (Spoiler: there isn't one — the agent is only ever consumed as a container.)
Is the changelog entry honest about the language change?
Is every mention of "Python", "hatch", or "wheel" in docs/ updated or removed?
Feature description
A single PR doing four mechanical things:
Repoint the production agent image build at the Go Dockerfile.
Delete the Python source under agent/skyhook-agent/, agent/vendor/, agent/hatch.toml, and the Python-specific bits of agent/Makefile. Also remove agent/.dockerignore (the go/ exclusion it carries no longer makes sense once go/ is the only thing in agent/).
Flatten agent/go/ up one level so its content lives directly at agent/.
Update agent/CHANGELOG.md, agent/README.md, .claude/CLAUDE.md, and any docs/ references to the Python agent.
Proposed direction
1. Image content swap
Two acceptable approaches; pick one:
A (recommended): git mv containers/agent-go.Dockerfile containers/agent.Dockerfile (overwriting the Python one). Reviewer sees the diff as "old Python build pipeline → new Go build pipeline". Also update the COPY agent/go/ ./ line inside the Dockerfile to COPY . /code/ (or equivalent) — the source no longer lives at agent/go/ after step 3.
B: Leave both Dockerfiles in place but change containers/agent.Dockerfile to a one-liner that pulls from agent-go.Dockerfile. Rejected — adds indirection for no benefit.
Same for the workflow: git mv .github/workflows/agent-go-ci.yaml .github/workflows/agent-ci.yaml (overwriting the Python one). Update paths: filters: drop the !agent/go/** exclusion that #213 added (the subdir no longer exists) and drop the agent/go/** include from the Go workflow (replaced by agent/**).
agent/Makefile (the Python one) gets deleted in this step. The Go module's Makefile at agent/go/Makefile will end up at agent/Makefile after step 3, so the root Makefile target make -C agent test keeps working (it now hits Go targets instead of hatch ones — that's the intended behavior).
3. Flatten agent/go/ → agent/
After the deletions, agent/ contains only go/, README.md, CHANGELOG.md, and possibly an empty Makefile slot. Then:
git mv agent/go/* agent/go/.* agent/ 2>/dev/null || true
# or, more carefully, mv each tracked file individually
git rmdir agent/go
The 2>/dev/null is for hidden files like .golangci.yml if present. Verify with git status that no Go file got dropped.
Because the module path declared in agent/go/go.mod is github.com/NVIDIA/nodewright/agent (#213 deliberately chose this with cutover in mind), no import statement in any .go file changes — only the on-disk path. The root Makefile fan-out targets that #213 added as agent-go-* should be renamed to plain agent-* (or merged with whatever the Python agent/Makefile exposed).
Important: stage the deletes from step 2 and the moves from step 3 in the same commit so reviewers see "Python deleted, Go landed in same place" as one logical change. Git renders this as a series of file renames from agent/go/X to agent/X plus the deletions of agent/skyhook-agent/**.
## [agent/v7.0.0] - YYYY-MM-DD
### BREAKING (build/runtime, not user-visible)
- *(agent)* Rewrote in Go. The agent CLI contract (positional args, env vars,
log/flag/history/log-file paths, exit codes, log-line format) is unchanged
per the operator-agent chainsaw suite. The image still runs as root from a
distroless base.
### Removed
- Python source under `agent/skyhook-agent/`, the hatch toolchain, and the
vendored Python deps. The agent is now a single static Go binary.
The version bump to v7.0.0 reflects the major build-stack change; CRDs and CLI args are unchanged so users upgrading should see no behavior difference.
5. Docs
Update:
agent/README.md — replace Python build instructions with Go ones.
.claude/CLAUDE.md — the "Three components, three toolchains" section becomes "operator (Go), agent (Go), chart (Helm)". Update the "Common commands" agent block.
Search docs/ for any reference to "Python" or "skyhook-agent wheel" or "hatch" and update.
.github/dependabot.yml (if it lists pip ecosystems for the agent) — remove pip entries for agent/.
6. Tests
This PR's "test" is that all existing CI lanes still pass with the swap:
Renamed Agent CI workflow runs go test/build/docker build/operator-agent-tests and they all pass.
The smoke-test confirms the published agent:{tag} image at the new build content has the right entrypoint.
No e2e regression vs. the last commit on main before cutover.
Operator-agent chainsaw suite still passes (proves nothing slipped in the rename).
No reference to python, hatch, wheel, or pip remains under agent/ or docs/.
Open questions
Should we tag agent/v7.0.0 in the same PR or in a follow-up? Recommend follow-up: PR lands on main, then a release PR bumps the version stamp [FEA]: Bootstrap agent/go/ module + port pure-data types #213 added, then tag. Avoids tag/PR coupling.
Keep the Python image (agent:6.4.x and agent:latest-python) on GHCR for one minor cycle as a fallback? Recommend yes — don't delete, but don't push new ones. Operators in production may want a roll-back image if something obscure surfaces in the field.
Should the operator's Go module pin a minimum agent version that requires the Go image? Probably not — the contract is the same. If the contract were changing, that would be a separate PR.
Soft cutover: keep both Dockerfiles building two different images for several months. Rejected — extends the maintenance window and benefits only us; users don't see two images.
Atomic single commit but no PR: just push to main. Rejected — needs review like any other change.
Keep agent/go/ as the nested dir: skip the flatten. Rejected — leaves a vestige of "this used to be Python" in the directory tree forever, and forces every doc/Makefile/CI file to keep saying agent/go/ for the next decade. The flatten is a one-time cost; carrying the nested layout is forever.
Summary
The flip. After #213–#221 land, this single small PR repoints the production
agentimage build at the Go binary, deletes the Python source, flattensagent/go/up one level intoagent/, and updates the changelog and docs. The previous issue (#221) has already proven the Go image passes the full operator-agent chainsaw suite, so this is a content swap, not a behavior change.It is intentionally a single PR so it's easy to revert with one click if a downstream consumer hits something the e2e suite missed.
Motivation
Until now Python and Go have shipped side-by-side: Python in the production
agentimage, Go in a separateagent-goimage (built from the nestedagent/go/source tree) only used in CI. That parallelism is the safety net while the rewrite is in flight; once parity is proven (#221), maintaining two implementations indefinitely would be pure tech debt.This PR is small but high-impact. Reviewers should focus on:
docs/updated or removed?Feature description
A single PR doing four mechanical things:
agentimage build at the Go Dockerfile.agent/skyhook-agent/,agent/vendor/,agent/hatch.toml, and the Python-specific bits ofagent/Makefile. Also removeagent/.dockerignore(thego/exclusion it carries no longer makes sense oncego/is the only thing inagent/).agent/go/up one level so its content lives directly atagent/.agent/CHANGELOG.md,agent/README.md, .claude/CLAUDE.md, and anydocs/references to the Python agent.Proposed direction
1. Image content swap
Two acceptable approaches; pick one:
git mv containers/agent-go.Dockerfile containers/agent.Dockerfile(overwriting the Python one). Reviewer sees the diff as "old Python build pipeline → new Go build pipeline". Also update theCOPY agent/go/ ./line inside the Dockerfile toCOPY . /code/(or equivalent) — the source no longer lives atagent/go/after step 3.containers/agent.Dockerfileto a one-liner that pulls fromagent-go.Dockerfile. Rejected — adds indirection for no benefit.Same for the workflow:
git mv .github/workflows/agent-go-ci.yaml .github/workflows/agent-ci.yaml(overwriting the Python one). Updatepaths:filters: drop the!agent/go/**exclusion that #213 added (the subdir no longer exists) and drop theagent/go/**include from the Go workflow (replaced byagent/**).2. Delete Python source
agent/Makefile(the Python one) gets deleted in this step. The Go module's Makefile atagent/go/Makefilewill end up atagent/Makefileafter step 3, so the root Makefile targetmake -C agent testkeeps working (it now hits Go targets instead of hatch ones — that's the intended behavior).3. Flatten
agent/go/→agent/After the deletions,
agent/contains onlygo/,README.md,CHANGELOG.md, and possibly an emptyMakefileslot. Then:The
2>/dev/nullis for hidden files like.golangci.ymlif present. Verify withgit statusthat no Go file got dropped.Because the module path declared in
agent/go/go.modisgithub.com/NVIDIA/nodewright/agent(#213 deliberately chose this with cutover in mind), no import statement in any.gofile changes — only the on-disk path. The root Makefile fan-out targets that #213 added asagent-go-*should be renamed to plainagent-*(or merged with whatever the Pythonagent/Makefileexposed).Important: stage the deletes from step 2 and the moves from step 3 in the same commit so reviewers see "Python deleted, Go landed in same place" as one logical change. Git renders this as a series of file renames from
agent/go/Xtoagent/Xplus the deletions ofagent/skyhook-agent/**.4. CHANGELOG entry
Append to agent/CHANGELOG.md:
The version bump to
v7.0.0reflects the major build-stack change; CRDs and CLI args are unchanged so users upgrading should see no behavior difference.5. Docs
Update:
docs/for any reference to "Python" or "skyhook-agent wheel" or "hatch" and update.agent/.6. Tests
This PR's "test" is that all existing CI lanes still pass with the swap:
Agent CIworkflow runsgo test/build/docker build/operator-agent-testsand they all pass.agent:{tag}image at the new build content has the right entrypoint.mainbefore cutover.Scope boundaries
In scope:
Out of scope:
Acceptance criteria
containers/agent.Dockerfilebuilds the Go agent..github/workflows/agent-ci.yamlruns Go test + build + e2e.agent/skyhook-agent/,agent/vendor/,agent/hatch.toml, andagent/.dockerignoreare gone.agent/go/no longer exists; its content lives atagent/.github.com/NVIDIA/nodewright/agent.agent/CHANGELOG.mdhas a v7.0.0 entry naming the rewrite.agent/README.mddocuments the Go build/test workflow.python,hatch,wheel, orpipremains underagent/ordocs/.Open questions
agent/v7.0.0in the same PR or in a follow-up? Recommend follow-up: PR lands onmain, then a release PR bumps the version stamp [FEA]: Bootstrapagent/go/module + port pure-data types #213 added, then tag. Avoids tag/PR coupling.agent:6.4.xandagent:latest-python) on GHCR for one minor cycle as a fallback? Recommend yes — don't delete, but don't push new ones. Operators in production may want a roll-back image if something obscure surfaces in the field.References (codebase)
Alternatives considered
agent/go/as the nested dir: skip the flatten. Rejected — leaves a vestige of "this used to be Python" in the directory tree forever, and forces every doc/Makefile/CI file to keep sayingagent/go/for the next decade. The flatten is a one-time cost; carrying the nested layout is forever.Code of Conduct