feat(shim): setcap cap_bpf+cap_perfmon to enable eBPF tracing on hosted runners#19
Open
colek42 wants to merge 26 commits into
Open
feat(shim): setcap cap_bpf+cap_perfmon to enable eBPF tracing on hosted runners#19colek42 wants to merge 26 commits into
colek42 wants to merge 26 commits into
Conversation
…cing The cilock binary's eBPF tracing path needs CAP_BPF + CAP_PERFMON to attach kprobes. On GH-hosted runners these caps are NOT inherited from the runner user, so without this hop cilock falls back to its slower ptrace+seccomp path on every invocation. After downloading + chmod, we try \`sudo -n setcap cap_bpf,cap_perfmon+ep\` on the binary. Hosted runners have NOPASSWD sudo, so this succeeds on the default config. In containers without sudo (most \`container:\` jobs), it fails silently and cilock falls back to ptrace+seccomp. The warning surfaces the container-config snippet needed to enable eBPF in that case. Critically we do NOT grant CAP_SYS_ADMIN — only the minimum caps needed for unprivileged BPF prog_load + kprobe attach. Pair with the rookery uname-based kernel version fix; together they let setcap'd-but-not-root cilock invocations use the eBPF tracing path on hosted runners. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a rookery_ref workflow_dispatch input so we can cut a cilock-action dev release that embeds a specific rookery branch / tag / SHA before that ref has merged to rookery's main. Use case: testing rc48 of rookery (which ships the new fanotify + fs-verity + tracee-priv-drop stack) without first merging nk/ci-trace-mode-probe to rookery's main. Tag-push behavior is unchanged: pushing v* triggers a release with the rookery main checkout, as before. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
cilock ships a pre-built .bpf.o embedded in its binary, but the
CO-RE relocations are baked against whichever vmlinux.h the release
was built on. On GHA hosted runners with the Azure-flavored kernel,
x86_64 BTF differs enough from mainline that every kprobe poisons
("bad CO-RE relocation: invalid func unknown#195896080").
rookery now auto-rebuilds the .bpf.o from its embedded source against
/sys/kernel/btf/vmlinux when CO-RE fails. That path needs
clang + bpftool + libbpf-dev on PATH. Install them here.
bpftool standalone isn't in every Ubuntu image's universe repo;
fall back to linux-tools-generic which ships
/usr/lib/linux-tools/<kernel>/bpftool. rookery's findBpftool()
globs both locations.
Together with the existing setcap step, the user-facing UX is now
just `uses: aflock-ai/cilock-action@v1` — kernel-arch-portable BPF
tracing handled transparently. End users see one INFO line about
the toolchain install, nothing else.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
GoReleaser refuses to release unless HEAD has a semver tag pointing at it. On tag-push triggers GITHUB_REF_NAME provides that; on workflow_dispatch (the path we use for dev RCs that pin a non-main rookery_ref) we have nothing — goreleaser then picks the stale v1 major-version alias and bails with "git tag v1 was not made against commit <sha>". Materialise a LOCAL tag at the dispatch HEAD (never pushed) and pass it via GORELEASER_CURRENT_TAG. Tag name comes from a new release_tag input, or is derived from rookery_ref + short SHA when omitted (v0.0.0-dev-<rookery-ref>-<sha>). Also: skip the v1 major-tag-update on dispatch runs. Dev RCs should not shift the v1 alias that production consumers point at. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
rookery rc50 separated the eBPF code into its own Go submodule at plugins/attestors/commandrun/ebpf so the runtime BPF-rebuild path (rebuild_linux.go) has a clean boundary. cilock-action's go.mod needs both a require entry and a replace pointing at the same local checkout the release workflow does (./rookery → ../rookery/...). Matches the existing pattern used for every other rookery sibling module in this file. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Dev RCs (workflow_dispatch with rookery_ref) can pin a rookery that's brought in new transitive deps not yet reflected in cilock-action's checked-in go.sum. Most recent case: rc50 split commandrun/ebpf into its own module and pulled cilium/ebpf via that boundary; cilock-action go.sum had no entries for those packages so goreleaser bailed. Run \`go mod tidy\` after checking out both repos; CI-local change to go.sum that's not pushed back. Tag-push triggers (real releases) keep using the committed go.sum since rookery is at main and the go.sum should match. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
goreleaser aborts with 'git is in a dirty state' if go.mod/go.sum were modified after checkout. The tidy step in the dev-RC dispatch flow legitimately modifies them; commit those changes locally (CI workspace only; never pushed) so the subsequent materialise-tag step tags the new HEAD and goreleaser is happy. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Self-test workflow that uses aflock-ai/cilock-action@v1.0.5-rc1 the way an external repo would. Three workloads (hello-go, hello-rust, hello-shell) × the published action × sigstore keyless on hosted ubuntu-24.04. Verifies the attestation file lands and parses; logs captureMode, traceModeDetail, totals, and diagnostics so a human review confirms the eBPF path actually engaged. This is the end-to-end check the matrix workflow couldn't be (matrix builds cilock from source on the runner; smoke uses the published release artifacts). Triggers on workflow_dispatch and on changes to the smoke yaml itself for iteration. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Without this push, GitHub's release-create API points the tag at the default branch's HEAD (not the post-tidy HEAD inside the runner). Downstream consumers doing \`uses: aflock-ai/cilock-action@<tag>\` then fetch the wrong action.yml + shim, missing whatever changes the dispatch was meant to ship. v1.0.5-rc1 hit this: artifacts were correctly built from the nk/ebpf-setcap-shim HEAD + rookery rc50, but the published tag pointed at main's HEAD (d39bb9b — days old). Downstream smoke fetched the old shim that doesn't install BPF deps or run setcap. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Smoke just needs to validate the action runs end-to-end and produces a local attestation file. Archivista upload requires platform API credentials which the test repo doesn't have. The local outfile is the actual signal we want anyway. Prior smoke (26420816740) confirmed the BPF self-heal works end-to-end: ✓ Installed BPF rebuild toolchain cilock-ebpf: embedded BPF object failed CO-RE — attempting to rebuild from embedded source cilock-ebpf: using bpftool at /usr/lib/linux-tools/6.8.0-117-generic/bpftool cilock-ebpf: rebuilt BPF object loaded successfully This commit re-greens the smoke matrix so we can capture the success case as a published artifact for the record. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Make the zero-drop guarantee the default consumer experience. With CILOCK_FANOTIFY=auto the kernel synchronously blocks the tracee on every open until userspace has hashed the file — turning the BPF capture path's drop-tolerant 'events' into a kernel-enforced 'every file is recorded'. require-zero-drops=true fails the attestation rather than ship one that silently lost content (rookery's WithRequireZeroDrops). Defaults are ON; consumers wanting the old loose semantics opt out explicitly: fanotify: 'off' require-zero-drops: 'false' Smoke of rc2 produced 370 hashFailureSilentDrops on a tiny go build because fanotify was off. The new smoke asserts every drop counter is 0 and fails loudly if not — this is the contract we're now shipping by default. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
rc3 wires fanotify=auto + require-zero-drops=true into action.yml and the shim. Smoke now asserts every drop counter is zero in the emitted attestation — the contract this release ships by default. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
rc51 fixes the false-positive zero-drop gate where fanotify rescues weren't reconciled against UnhashedOpens / FallbackHashFailures. This smoke confirms the user-facing default-on path actually delivers what it promises. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three changes that fix the conceptual model surfaced by the gh CLI
smoke (which classified 9281 compiler intermediates as "products"):
1. New \`products\` input — newline-separated list of paths/globs
the build is expected to produce. Joined as a {a,b,c} brace
pattern for the rookery product attestor.
2. Default = workingDir/** when \`products\` is empty. Idiomatic
builds that write under the workspace just work. Builds that
write to /tmp or ~/.local/bin/ must explicitly list those paths.
3. \`::warning::no products detected\` when the resolved glob
matched nothing. Surfaces the active glob and tells the user
exactly where + how to override it in their workflow YAML.
Legacy \`product-include-glob\` input still honoured (no default,
opt-in). \`product-exclude-glob\` unchanged.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
resolveProductIncludeGlob compiled relative entries (e.g.,
\`products: bin/gh\`) as-is, but rookery's trace mode emits absolute
paths in TraceOutputs (e.g., /home/runner/work/cli/cli/bin/gh).
The relative glob matched zero paths → empty products map even when
the summary classifier saw 2.
Now: relative entries get filepath.Join'd against cfg.WorkingDir
(falling back to os.Getwd if WorkingDir is empty) before compiling
into the {a,b,c} brace glob.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
End-to-end smoke for the v0.3 multi-step chain pipeline. Pipeline: 1) cilock-action @step=source attests the gh CLI source tree; emits the v0.3 leaf sidecar alongside the signed envelope. 2) cilock-action @step=build attests 'go build -o bin/gh ./cmd/gh' in trace mode; captures every material the build read under src/. 3) jq extracts the consumed materials from the build attestation, filters to paths under src/ (the source step's coverage), feeds them into 'cilock prove-chain' which generates per-material RFC 6962 inclusion proofs against the source step's signed Merkle root. 4) cilock verify walks a multi-step policy with build.artifactsFrom=[source] and allowedUntracked covering toolchain paths under /opt/hostedtoolcache/**, /usr/lib/**, etc. The chain sidecar source is FilesystemChainSidecarSource pointing at /tmp/chain-sidecars/. Negative case: flip a bit in the chain sidecar's first audit-path entry, rerun verify, must exit non-zero. Depends on a cilock-action release that includes prove-chain (rookery PR #176 commit 75c35ed or later). Default ref is v1.1.0-rc1 — bump after each rc until the chain pipeline is GA.
cilock run --outfile <path> writes the v0.3 product leaf sidecar to '<path>.product.tree.json' adjacent to the signed envelope. No new action input needed — the previous draft invented a 'chain-sidecar-out' flag that doesn't exist. Also add a confirmation step that prints the sidecar's schemaVersion, source label, Merkle root, treeSize, and leaf count so a failure later in the chain pipeline has a precise breadcrumb.
…es:')
GitHub Actions doesn't permit ${{ github.event.inputs.X }} inside
the 'uses:' clause of a step, so the workflow_dispatch input we
added was a syntactic dead end. Hardcode v1.0.5-rc14 for the
action and v1.1.0-rc65 for the install.sh fetch — these are the
RCs that contain the chain primitives today.
Bump these together when cutting a future release; both pins live
in the same file.
Two bugs the rc14 dispatch surfaced:
1) Used 'working-directory' (the GHA generic name); the action's
input is 'workingdir'. Renamed.
2) Step ran from $GITHUB_WORKSPACE but actions/checkout puts the
gh CLI .git under src/. cilock's git attestor requires a .git
at workingdir, so it failed 'repository does not exist'. Both
source + build steps now set workingdir=src.
3) Build moves to writing under src/bin/ instead of the workspace
root, so the product target stays inside the workingdir scope
('products: bin/gh' instead of '../bin/gh').
4) The chain-proof extractor stripped $GITHUB_WORKSPACE from the
traced materials list, but the source step's leaf sidecar now
records paths RELATIVE to workingdir=src/. Strip both the
workspace prefix AND the src/ component so 'path=sha256hex'
entries line up with what BuildChainSidecar will look up in
the source sidecar.
…' is the read set)
The 'source' step's command was 'git rev-parse HEAD' — a no-op that
writes no files, so the products Merkle tree had treeSize=0 and the
products leaf sidecar was never written (rookery emits the sidecar
conditionally on len(products) > 0).
Semantic fix, not just plumbing: for a source-provenance step where
nothing is *produced*, the relevant Merkle commitment is what the
step OBSERVED. cilock's walk mode classifies every file under
workingdir as a material (1259 leaves in this run). That's the tree
step 2's consumed materials must trace back to.
Point prove-chain at the material sidecar instead of the (empty)
product sidecar, and use the matching leaf domain
('rookery-material/v0.3'). The verifier-side chain proof binds the
same way — only the upstream Merkle root + domain matter; products
vs materials is just which side of the in-toto Statement contract
the leaves come from.
colek42
pushed a commit
that referenced
this pull request
May 26, 2026
…ero-dep) The zero-dep rewrite dropped the capability setup that #19 added to the old shim, so eBPF hard-failed on hosted runners (bpf(BPF_MAP_CREATE): operation not permitted — needs CAP_BPF+CAP_PERFMON). Re-add it with spawnSync (no deps): best-effort `sudo -n apt-get install` of clang/llvm/libbpf-dev/bpftool (for cilock's CO-RE rebuild) and `sudo -n setcap cap_bpf,cap_perfmon+ep` on the binary. Both non-fatal — without sudo, cilock falls back to ptrace+seccomp. Also pin the trace-probe to v1.0.5-rc15 (freshly built against current rookery main) so the attestation reports captureMode/traceModeDetail. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Best-effort `sudo -n setcap cap_bpf,cap_perfmon+ep` on the cilock binary right after download. GH-hosted runners have NOPASSWD sudo, so this enables eBPF tracing for the default-config case. Container jobs without sudo fail silently and cilock falls back to ptrace+seccomp with a warning that includes the `container.options: --cap-add=BPF --cap-add=PERFMON` snippet needed to enable eBPF there.
We grant only the minimum caps needed — not CAP_SYS_ADMIN.
Why this is needed
cilock's eBPF tracing path (aflock-ai/rookery#176, V1 of #167) needs CAP_BPF + CAP_PERFMON to attach kprobes. Hosted runners give the user no caps by default, so cilock would fall back to ptrace+seccomp on every invocation — which is significantly slower than eBPF for typical builds.
Test plan
🤖 Generated with Claude Code