Skip to content

[FEA]: chroot-exec subcommand on the agent binary #216

Description

@rice-riley

Summary

Port agent/skyhook-agent/src/skyhook_agent/chroot_exec.py to Go as a subcommand of the agent binary (agent chroot-exec <control-file> <chroot-dir>). Today this Python script is invoked as a child process by the agent's main controller — keeping it as a subcommand of the same Go binary preserves that two-process model (so the chroot only persists for the duration of the step) without shipping two binaries.

Depends on #213 (the cmd/agent skeleton).

Motivation

The chroot is intentionally per-step, not per-agent: the parent process reads config.json and orchestrates retries / interrupts in the container's normal filesystem; the child process chroots into the host root and exec's the step. If the agent itself chrooted, every subsequent step (and the schema files embedded in the binary, and the /etc/resolv.conf already copied over) would have to be re-set-up in the host root. The current split is the right design — we just need the same split in Go.

This PR also extracts the env-merging logic into a pure function that #217 (the runner) can unit-test without actually chrooting.

Feature description

Add a chroot-exec subcommand to cmd/agent that:

  1. Reads a JSON control file at argv[1] containing {cmd, no_chmod, env, copy_dir}.
  2. Captures the parent process's env (the container env).
  3. If chroot_dir != "local", calls syscall.Chroot(chrootDir) and os.Chdir("/").
  4. Optionally chmod +x on cmd[0] (Python adds S_IXGRP|S_IXUSR|S_IXOTH).
  5. Reads the chroot's env via subprocess.run(["env"]) (in Go: exec.Command("env").Output()) and merges in the order container_env ← chroot_env ← skyhook_env.
  6. Validates copy_dir is absolute, then exec's cmd with Dir: copy_dir and the merged env.

Proposed direction

1. Subcommand wiring

agent chroot-exec <control-file> <chroot-dir>

Use the subcommand router #213 picked. No new dep if flag is already enough.

2. Env merge — pure function

In internal/chroot/env.go:

func ProcessEnv(containerEnv, skyhookEnv, chrootEnv map[string]string) map[string]string

Order matters and must match Python _get_process_env:

  1. Start from containerEnv.
  2. Overwrite with chrootEnv (so things like PATH resolve against the host).
  3. Overwrite with skyhookEnv (so package-author env wins last).

This function is unit-tested without any chroot. The chroot subcommand main is just glue.

3. Chroot mechanics

  • syscall.Chroot requires CAP_SYS_CHROOT. Agent already runs as root in the container (see USER 0:0 in containers/agent.Dockerfile). Document the requirement in a // why: comment.
  • After Chroot, immediately Chdir("/") (matches Python).
  • The "local" sentinel skips the chroot entirely — used by tests and as a debug escape hatch. Preserve it.

4. Control file

JSON shape (matches Python):

{
  "cmd": ["/path/to/step.sh", "arg1"],
  "no_chmod": false,
  "env": {"FOO": "bar"},
  "copy_dir": "/etc/skyhook/.../skyhook_dir"
}

The control file is written by the parent (#217) to a tempfile.NamedTemporaryFile and deleted on parent return. Do not change the file format — #217 will write it.

5. Tests

Port agent/skyhook-agent/tests/test_chroot_exec.py:

  • ProcessEnv precedence test (table-driven, no I/O).
  • Control file parse error test.
  • copy_dir not absolute → error.
  • no_chmod=true skips the chmod call (mock the chmod function).
  • "local" chroot mode: no syscall.Chroot call, env merged, command exec'd in copy_dir.

The actual syscall.Chroot path is not unit-testable without root + a real filesystem. Cover it in #221's chainsaw e2e tests against a real kind cluster.

Scope boundaries

In scope:

  • The chroot-exec subcommand and its env-merging helper.

Out of scope:

Acceptance criteria

  • agent chroot-exec --help documents the two positional args.
  • Env merge precedence matches Python on all permutations.
  • Absolute-path validation on copy_dir matches Python.
  • "local" chroot mode runs end-to-end in tests without root.
  • The Python tests' assertions all have a Go equivalent.
  • Binary stays single-artifact — no new executables shipped.

Open questions

  • Should we use unix.Chroot from golang.org/x/sys/unix (already in operator vendor) or stdlib syscall.Chroot? Both work on Linux; pick whichever lives in fewer imports.
  • The Python version captures os.environ before chroot. Go's os.Environ() does the same — confirm via test that env captured pre-chroot still propagates post-chroot.
  • Cross-platform: the Python agent only runs on Linux (Distroless). Should we //go:build linux the chroot file or add a stub for darwin to keep go test working on developer Macs? Recommend //go:build linux + a darwin stub that returns errors.New("not supported").

References (codebase)

Alternatives considered

  • Ship chroot-exec as a standalone binary (mirroring Python's two-script layout). Rejected — single Go binary is simpler to ship and version.
  • Re-exec the agent binary with a special env var instead of a subcommand. Rejected — explicit subcommand is more discoverable and easier to test.

Code of Conduct

  • I agree to follow Skyhook's Code of Conduct.

Metadata

Metadata

Assignees

Labels

component/agentSkyhook agent (package executor)
No fields configured for Enhancement.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions