Skip to content

feat(overlays): deliver tools, workflows, and skills via repo-cache overlay sources#530

Open
Zygimantass wants to merge 5 commits into
mainfrom
feat/overlay-deep
Open

feat(overlays): deliver tools, workflows, and skills via repo-cache overlay sources#530
Zygimantass wants to merge 5 commits into
mainfrom
feat/overlay-deep

Conversation

@Zygimantass

@Zygimantass Zygimantass commented Jun 12, 2026

Copy link
Copy Markdown
Member

Implements the repo-cache-backed overlay delivery model (OVERLAYS.md): one ordered overlays.sources list drives tools, workflows, and skills, so the API and sandboxes always see the same overlay revision set, later sources shadow earlier ones on name collisions, and shipping overlay content is a Git push plus a values update — no API or sandbox image rebuild.

Values shape

overlays:
  sources:
    - repo: paradigmxyz/centaur
      ref: 5da358a2
    - repo: your-org/centaur-overlay
      ref: abc1234

Subdirs default to the conventional layout (toolsSubdir: tools, workflowsSubdir: workflows, skillsSubdir: .agents/skills), so most sources only need repo + ref. Directories a repo doesn't contain are skipped at runtime; set a subdir to "" to explicitly disable that surface, or to another path to relocate it.

When overlays.sources is empty, the centaur.overlaySources helper compat-maps toolServer.repo/ref/subdir + toolServer.extraSources into the same ordered list (exposing workflows/ and .agents/skills/ from each repo), so existing deployments keep working unchanged — the compat rendering is byte-identical.

Changes

Chart

  • repo-cache DaemonSet repos/refs, API TOOL_DIRS/WORKFLOW_DIRS, the KUBERNETES_TOOLS_* bootstrap sources, sandbox KUBERNETES_WORKFLOW_DIRS, and CENTAUR_SKILL_DIRS (delivered through SESSION_SANDBOX_EXTRA_ENV) all render from the same ordered source list
  • API discovery and workflow-host execution WORKFLOW_DIRS are the same list, translated only for the mount prefix (/var/lib/centaur/repos vs /home/agent/github) — fixing the mismatch where the API could discover an overlay workflow the workflow-host couldn't import
  • the api-rs pod mounts repo-cache whenever any overlay source exists (previously only when tools were configured)

api-rs

  • new --kubernetes-workflow-dirs / KUBERNETES_WORKFLOW_DIRS: agent-k8s workflow-host sandboxes prefer the explicit overlay-rendered value; tools-repo and baked-in fallbacks preserved
  • missing tools subdirs are skipped, not fatal: tools-bootstrap waits only for the repo-cache checkout and skips a source without a tools tree (an init failure is terminal for the Sandbox), the git-clone fallback guards its copy the same way, and ToolGitSource warns instead of erroring — required so the defaulted toolsSubdir is safe for workflows- or skills-only overlay repos

Sandbox entrypoint

  • CENTAUR_SKILL_DIRS (colon-separated, overlay order) is copied into the workspace .agents/skills; all skill sources now go through copy_skill_dir, which replaces same-named skills wholesale so a shadowing skill never inherits stale files from the skill it replaces (note: this also changes legacy baked-in/mounted sources from file-merge to per-skill replace)

Docs

  • extend/overlay rewritten around repo-cache overlays; the dead overlay.image mechanism removed from skills/tools/workflows/acme/configuration pages; public/md mirrors regenerated

Behavior changes for existing deployments

  • the workflow-host now sees every source's workflows dir (previously only the base repo's)
  • .agents/skills from extra sources now load into agent workspaces
  • legacy skill sources replace same-named skills instead of merging files

Verification

  • helm lint + helm template across default, multi-source overlays.sources (with defaulted subdirs and "" opt-outs), compat extraSources, repoCache.enabled=false, and toolServer.enabled=false values; compat-path render is byte-identical to the pre-defaults commit
  • cargo test -p centaur-api-server -p centaur-sandbox-agent-k8s (51 passed), cargo fmt --check, cargo clippy clean
  • entrypoint bash -n plus functional shell tests of the skill-copy/shadowing logic and the tools-bootstrap skip-missing-subdir path

Known follow-ups

  • promptPath/personasSubdir are accepted in the schema/helper but not yet consumed by any template (prompt delivery still uses overlay.systemPrompt)
  • overlay skills ride SESSION_SANDBOX_EXTRA_ENV; an operator who sets apiRs.extraEnv.SESSION_SANDBOX_EXTRA_ENV directly overrides the rendered list and drops them
  • live-cluster E2E (kind + sandbox pods) not yet run; verification above is render- and unit-level

…verlay sources

Implements the repo-cache-backed overlay model (OVERLAYS.md): one ordered
overlays.sources list drives every tools/workflows/skills surface, so the API
and sandboxes always see the same overlay revision set and later sources shadow
earlier ones on name collisions. Adding or updating overlay content becomes a
Git push plus a values update — no API or sandbox image rebuild.

Chart:
- new overlays.sources value; the centaur.overlaySources helper compat-maps
  toolServer.repo/ref/subdir + extraSources when the list is empty
- repo-cache DaemonSet repos/refs, API TOOL_DIRS/WORKFLOW_DIRS, the
  KUBERNETES_TOOLS_* bootstrap sources, sandbox KUBERNETES_WORKFLOW_DIRS, and
  CENTAUR_SKILL_DIRS (via SESSION_SANDBOX_EXTRA_ENV) all render from the same
  ordered list
- API and workflow-host WORKFLOW_DIRS are the same list, translated only for
  the mount prefix (/var/lib/centaur/repos vs /home/agent/github)

api-rs:
- --kubernetes-workflow-dirs: agent-k8s workflow-host sandboxes prefer the
  explicit overlay-rendered value; tools-repo and baked-in fallbacks preserved

Sandbox entrypoint:
- copy each CENTAUR_SKILL_DIRS entry into the workspace .agents/skills in
  overlay order via copy_skill_dir, which replaces same-named skills wholesale
  so a shadowing skill never inherits stale files; legacy skill sources kept

Docs: rewrite extend/overlay around repo-cache overlays and drop the dead
overlay.image mechanism from skills/tools/workflows/acme/configuration pages.

Amp-Thread-ID: https://ampcode.com/threads/T-019ebbf7-8029-758d-b684-ef3e7ac9712c
Co-authored-by: Amp <amp@ampcode.com>
@github-actions

Copy link
Copy Markdown

Cloudflare Workers docs preview

https://pr-530-centaur-docs.porto.workers.dev

Zygimantass and others added 4 commits June 12, 2026 17:08
…yout

overlays.sources entries now default toolsSubdir/workflowsSubdir/skillsSubdir
to tools, workflows, and .agents/skills, so most sources only need repo + ref.
A subdir explicitly set to "" disables that surface for the source.

Defaults mean a source repo may legitimately lack a defaulted directory
(e.g. a skills-only overlay), so missing directories are now skipped
everywhere instead of failing:

- tools-bootstrap waits only for the repo-cache checkout itself and skips a
  source whose tools subdir is absent (an init failure is terminal for the
  Sandbox); the git-clone fallback skips the copy the same way
- api-rs ToolGitSource warns instead of erroring when a synced source lacks
  the tools subdir, and tool-dir collection skips missing dirs in both the
  repo-cache and direct-clone modes

Workflow discovery (workflow_host.py) and the skills entrypoint already
skipped missing directories. The toolServer compatibility mapping renders
byte-identically.

Amp-Thread-ID: https://ampcode.com/threads/T-019ebbf7-8029-758d-b684-ef3e7ac9712c
Co-authored-by: Amp <amp@ampcode.com>
skills/tools/workflows pages now state the conventional defaults
(tools, workflows, .agents/skills) and that sources without a directory
are skipped; configuration reference rows note the defaults.

Amp-Thread-ID: https://ampcode.com/threads/T-019ebbf7-8029-758d-b684-ef3e7ac9712c
Co-authored-by: Amp <amp@ampcode.com>
Three recovery tests read thread-state right after observing
chat.stopStream, but recovery clears the render obligation after the
stream stops — on a slow runner the assertion sees the seeded obligation
still in place (CI failed twice on commits that don't touch slackbotv2).
Use the waitFor-on-renderObligation pattern the rest of the file already
uses.

Amp-Thread-ID: https://ampcode.com/threads/T-019ebbf7-8029-758d-b684-ef3e7ac9712c
Co-authored-by: Amp <amp@ampcode.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant