Skip to content

fix(ptrace): inherit SessionID from parent on minimal-state child creation#312

Closed
es-fabricemarie wants to merge 1 commit into
canyonroad:mainfrom
es-fabricemarie:fix/ptrace-inherit-session-on-minimal-state
Closed

fix(ptrace): inherit SessionID from parent on minimal-state child creation#312
es-fabricemarie wants to merge 1 commit into
canyonroad:mainfrom
es-fabricemarie:fix/ptrace-inherit-session-on-minimal-state

Conversation

@es-fabricemarie
Copy link
Copy Markdown
Contributor

When the kernel delivers a child's PTRACE_EVENT_STOP before the parent's PTRACE_EVENT_FORK, the run() loop and handleEventStop() fall back to creating a minimal TraceeState so the child doesn't get lost.

Both fallbacks previously set SessionID="". If the child execve'd in that window -- the canonical shell fork-then-exec pattern -- HandleExecve saw an unknown session and returned Allow:false, EACCES, rule="unknown_session". ptrace injected EACCES into the tracee mid-execve, which races ld.so startup on the new ELF and crashes the tracee before its entry point runs.

This bug fires under any heavy-fork workload (shell pipelines, make -j, cargo build, npm install) when ptrace is enabled, and is plausibly a contributor to the cascade-failure mode in #292.

Fix in two parts:

  1. tracer.go: both minimal-state fallbacks now read the child's PPid via readPPID and call a new inheritFromParent helper that scans t.tracees for an entry with TGID == parentPID, returning its SessionID/HasPrefilter/TGID. The new child inherits those values. The helper also de-duplicates the inheritance loop across the two sites.

  2. ptrace_handlers.go: as belt-and-suspenders, the !ok branch of HandleExecve now returns Allow:true, rule="unknown_session_allow" instead of denying. The race window should be closed by part 1; denying here only ever caused the tracee crash and never gained security (the outer agentsh wrap still polices file/syscall ops via FUSE).

Tests:

  • internal/ptrace/tracer_inherit_test.go: unit-tests inheritFromParent against a fabricated tracees map (found-by-TGID, unknown-parent, non-positive-pid cases).
  • internal/api/ptrace_handlers_test.go: regression test that calls HandleExecve with a SessionID the manager doesn't know and asserts Allow=true, Rule="unknown_session_allow".

Copy link
Copy Markdown
Collaborator

erans commented May 12, 2026

I think the issue is real, but I would prefer to avoid making every unknown-session execve fail open unconditionally.

Looking through this path, there seem to be two distinct cases that should be handled separately:

  1. Child-stop-before-fork-event race. The minimal TraceeState fallback should inherit parent state immediately. I think the safer fix here is to factor the child-state seeding logic out of handleNewChild and reuse it in the two minimal-state fallback sites. That helper should copy the same enforcement metadata as normal child creation, not only SessionID / HasPrefilter / parent TGID: PendingPrefilter, read/write escalation state, thread escalation flags, etc. Otherwise there is still a short window where a child created through the fallback path can behave differently from a child created through handleNewChild.

  2. attach_mode=pid sessionless roots. Today initPtraceTracer calls tr.AttachPID(pid) without WithSessionID, so the attached root and its descendants can be sessionless by design. That is different from a non-empty session id that is missing from the session manager. I think we should represent this explicitly in tracer state / ExecContext instead of overloading SessionID == "".

Suggested shape:

  • Add a shared helper for parent-derived child metadata and use it from both handleNewChild and the minimal-state fallback paths.
  • Add an explicit marker for sessionless pid-attach tracees, e.g. TraceeState.PolicyScope / SessionlessPIDAttach / similar, and propagate it to children.
  • In HandleExecve, allow the sessionless pid-attach case only when that mode is intentional. A non-empty but unknown SessionID should remain fail-closed or at least be handled separately from the pid-attach fallback.
  • If full ptrace mode is allowed with attach_mode=pid, either give it a configured/global policy context or reject/limit that config; for execve-only hybrid mode, an explicit pass-through for sessionless execve seems reasonable because the wrapper/session layer handles the rest.

That keeps the #292 fix, but avoids turning a real session accounting bug into a silent allow for all unknown-session execve stops.

…ess pid-attach

Two related races / accounting bugs in how the tracer handles child
processes whose stop arrives before the parent's PTRACE_EVENT_FORK is
processed by handleNewChild.

1. Minimal-state fallback only inherited a subset of parent fields.

   handleNewChild copies SessionID, HasPrefilter, PendingPrefilter
   (conditionally), TGID-level escalation, thread-level escalation,
   and parent TGID to a freshly-created child. The minimal-state
   fallback sites in run() and handleEventStop() only copied
   SessionID / HasPrefilter / parent TGID, leaving the child with
   different enforcement state from one created via the normal path
   if it execve'd in the race window.

   Fix: factor the create-from-scratch logic out of handleNewChild
   into seedChildStateFromParent(parent, childTID, childTGID) and
   call it from all three sites (handleNewChild's else branch + the
   two minimal-state fallbacks). A child created via either path is
   now byte-identical in enforcement state.

2. Unknown-session execve conflated two distinct cases.

   The previous revision flipped the unknown_session branch in
   HandleExecve from deny -> allow, but it lumped together:

   (a) Sessionless pid-attach. initPtraceTracer calls tr.AttachPID
       (pid) without WithSessionID for the attach_mode=pid path, so
       the attached root and its descendants are sessionless by
       design -- the wrapper / session layer governs enforcement.
       Pass-through is correct here.

   (b) Non-empty SessionID missing from the session manager. This is
       a real session-accounting bug, not a race. Treating it as a
       silent allow turns the bug into silent under-enforcement.

   Fix: mark case (a) explicitly. New TraceeState.SessionlessPIDAttach
   and ExecContext.SessionlessPIDAttach fields. attach.go sets the
   flag on the root tracee when opts.sessionID == "" (the legitimate
   attach_mode=pid path). seedChildStateFromParent propagates it to
   descendants. HandleExecve splits the unknown-session branch:
   SessionlessPIDAttach=true -> allow with rule sessionless_pid_attach
   (slog.Debug); otherwise -> deny with rule unknown_session
   (slog.Warn). Case (b) is now visible and fail-closed again.

Tests (all pass under -race):

- internal/ptrace/tracer_inherit_test.go: covers findParentByTGID
  and seedChildStateFromParent. Verifies all enforcement fields
  propagate (including SessionlessPIDAttach), the conditional skip
  of PendingPrefilter when parent already has the filter installed,
  and the nil-parent / per-thread defaults.
- internal/api/ptrace_handlers_test.go:
  TestHandleExecve_SessionlessPIDAttachAllows verifies (a) allows
  with rule sessionless_pid_attach.
  TestHandleExecve_NonEmptyUnknownSessionDenies verifies (b) denies
  with rule unknown_session and EACCES.

Empirical / functional validation against the original canyonroad#292 cascade:
pure upstream v0.19.3+f531584 still reproduces 30/30 MODE_A failures
on attach_mode=pid + seccomp_prefilter under sustained sb.run()-equivalent load;
the revised binary passes 90/90 calls across 3 iterations with the
SessionlessPIDAttach=true branch taking the allow path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@es-fabricemarie es-fabricemarie force-pushed the fix/ptrace-inherit-session-on-minimal-state branch from a79f9c7 to 392b358 Compare May 15, 2026 05:26
@es-fabricemarie
Copy link
Copy Markdown
Contributor Author

@erans this should do it.

@erans
Copy link
Copy Markdown
Collaborator

erans commented May 15, 2026

Checked the follow-up on 392b358d. The core #292 execve-only hybrid path is much closer now: SessionlessPIDAttach is explicit, intentional sessionless pid-attach execve is allowed, and non-empty unknown sessions still fail closed.

Two things still look incomplete relative to the earlier review comment:

  1. handleNewChild does not copy SessionlessPIDAttach when repairing an existing fallback-created child state. The new helper copies it, but the existing != nil branch in internal/ptrace/tracer.go updates SessionID/prefilter/escalation fields and leaves existing.SessionlessPIDAttach unchanged. If the early fallback created state before finding the parent, the later fork event can repair most metadata but leave the marker false, so a sessionless pid-attach descendant can still hit unknown_session on execve. Please copy the marker in that branch too, and add a regression test for the existing-state repair path.

  2. Full ptrace attach_mode=pid is still not resolved. The previous review asked to either give full ptrace pid-attach a configured/global policy context or reject/limit that config. Validation still allows ptrace with default full tracing when unix_sockets is false/nil, but file/network/signal handlers still fail closed on empty/unknown sessions. Execve-only hybrid is handled; full ptrace pid-attach still needs an explicit decision.

Verification I ran locally: gofmt -l on touched files is clean, go test ./... passes, GOOS=windows go build ./... passes. GitHub checks are passing with fargate e2e skipped. I did not run the external #292 repro zip.

@erans
Copy link
Copy Markdown
Collaborator

erans commented May 18, 2026

@es-fabricemarie gentle ping — just making sure my May 15 review didn't get lost. Two items still pending before merge (verified against HEAD 392b358d):

  1. SessionlessPIDAttach not copied in the existing-state repair branch. In internal/ptrace/tracer.go lines 1450-1467, the existing != nil branch of handleNewChild updates SessionID, HasPrefilter, PendingPrefilter, all four escalation fields, and Attached, but skips SessionlessPIDAttach. If the early fallback created child state before the fork event arrives, the later repair leaves the marker false, and a sessionless pid-attach descendant can still hit unknown_session on execve. Please copy the marker in that branch too, and add a regression test for the existing-state repair path.

  2. Full ptrace attach_mode=pid validation. internal/config/ptrace.go still allows attach_mode=pid combined with full trace.file/trace.network/trace.signal, with no configured/global policy context for the sessionless root and descendants. Execve-only hybrid is handled correctly now; full ptrace pid-attach needs an explicit decision — either reject the combo in Validate(), or give it a configured policy context.

Otherwise the refactor looks good: SessionlessPIDAttach is explicit, sessionless pid-attach execve is allowed intentionally, non-empty unknown sessions still fail closed, CI is green across all targets, and the branch still merges cleanly. Happy to merge once these two land.

erans added a commit that referenced this pull request May 20, 2026
Fixes #292. Supersedes #312.

Splits HandleExecve's unknown-session branch into a sessionless_pid_attach allow path (the legitimate attach_mode=pid case) and a fail-closed unknown_session deny (the genuine accounting bug case), and shares child-state construction across handleNewChild and the two minimal-state fallbacks via seedChildStateFromParent so a child that execve's before the parent's PTRACE_EVENT_FORK inherits SessionID + flags instead of falling into the deny branch and racing ld.so.

Original tracer/handler diff by @es-fabricemarie via #312. Follow-up: parameterized SuppressInitialStop so the fallback callers (where the initial stop has already arrived) don't leave a stale flag that would swallow the next external SIGSTOP; and reconciled SessionlessPIDAttach in handleNewChild's existing-state update path so a wrong procfs-fallback inference cannot survive once the authoritative fork event arrives.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@erans erans closed this in #359 May 20, 2026
@erans
Copy link
Copy Markdown
Collaborator

erans commented May 20, 2026

Superseded by #359, which preserves your original commit (3e19201b, author preserved) and addresses the roborev review feedback on top. Thanks for the great work and the reproducer — saved us a ton of investigation. Merged to main as 2f55605.

@es-fabricemarie es-fabricemarie deleted the fix/ptrace-inherit-session-on-minimal-state branch May 20, 2026 04:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants