Skip to content

fix(ptrace): sessionless pid-attach + reviewed seed helper (#292, supersedes #312)#359

Merged
erans merged 3 commits into
mainfrom
fix/292-ptrace-sessionless-pid-attach
May 20, 2026
Merged

fix(ptrace): sessionless pid-attach + reviewed seed helper (#292, supersedes #312)#359
erans merged 3 commits into
mainfrom
fix/292-ptrace-sessionless-pid-attach

Conversation

@erans
Copy link
Copy Markdown
Collaborator

@erans erans commented May 20, 2026

Summary

Fixes #292 (intermittent shim/server hang on long sb.run() sequences in attach_mode=pid). Supersedes #312 — preserves the original contributor's commit + adds two follow-up fixes from roborev review.

Commits

  1. 3e19201b — original PR fix(ptrace): inherit SessionID from parent on minimal-state child creation #312 commit by @es-fabricemarie. Splits the HandleExecve unknown-session branch into sessionless_pid_attach (allow) vs unknown_session (still fail-closed deny), and adds seedChildStateFromParent to inherit SessionID + flags from the parent in the two minimal-state fallback paths in handleStop / handleEventStop. Root-cause analysis and reproducer are in Intermittent shim/server hang on long sb.run() sequences (v0.19.2, ptrace-pid hybrid) #292's thread.
  2. 1bd856ef — addresses roborev Medium finding (job 9372). seedChildStateFromParent hard-coded SuppressInitialStop: true, correct for handleNewChild's normal path but wrong for the two fallback callers where the child's initial SIGSTOP has already been received. The flag is now parameterized; fallback callers pass false so an unrelated future SIGSTOP isn't silently swallowed. Comments referring to run() are corrected to handleStop().
  3. bc609c5e — addresses roborev Low finding (job 9375). handleNewChild's existing-state branch reconciles a fallback-created child with the kernel-authoritative parent; it overwrote every enforcement field except SessionlessPIDAttach. A stale true inferred from a wrong-procfs-parent could turn the new sessionless_pid_attach allow into a silent fail-open. Now overwritten alongside the other fields.

Test Plan

  • go test ./internal/ptrace/... ./internal/api/... — all green (incl. four TestSeedChildStateFromParent_* covering both suppressInitialStop values + the new HandleExecve sessionless/unknown split tests)
  • go test ./... — full suite, 0 failures
  • GOOS=windows go build ./... — cross-compile clean
  • roborev review --branch --wait --reasoning maximum — pre-followup found 1 Medium + 4 Low (job 9372); after the Medium fix one Low remained (job 9375); now both addressed. Final state has only the open Lows about (a) per-thread escalation imprecision in findParentByTGID, (b) implicit SessionlessPIDAttach inference from empty opts.sessionID, (c) no end-to-end test of the fallback call sites. Documented as known limitations; not regressions and not blockers.

Reproducer

Contributor's reproduction-script.zip attached on #292 — 150/150 MODE_A failures on pure v0.19.3+f531584, 150/150 PASS with this branch's commit 1 alone. Commits 2 and 3 are defense-in-depth on top.

Credit

Author of commit 3e19201b is @es-fabricemarie (PR #312). I'll close #312 once this lands.

🤖 Generated with Claude Code

es-fabricemarie and others added 3 commits May 19, 2026 16:58
…ess pid-attach

Two related races / accounting bugs in how the tracer handles child
processes whose stop arrives before the parent's PTRACE_EVENT_FORK is
processed by handleNewChild.

1. Minimal-state fallback only inherited a subset of parent fields.

   handleNewChild copies SessionID, HasPrefilter, PendingPrefilter
   (conditionally), TGID-level escalation, thread-level escalation,
   and parent TGID to a freshly-created child. The minimal-state
   fallback sites in run() and handleEventStop() only copied
   SessionID / HasPrefilter / parent TGID, leaving the child with
   different enforcement state from one created via the normal path
   if it execve'd in the race window.

   Fix: factor the create-from-scratch logic out of handleNewChild
   into seedChildStateFromParent(parent, childTID, childTGID) and
   call it from all three sites (handleNewChild's else branch + the
   two minimal-state fallbacks). A child created via either path is
   now byte-identical in enforcement state.

2. Unknown-session execve conflated two distinct cases.

   The previous revision flipped the unknown_session branch in
   HandleExecve from deny -> allow, but it lumped together:

   (a) Sessionless pid-attach. initPtraceTracer calls tr.AttachPID
       (pid) without WithSessionID for the attach_mode=pid path, so
       the attached root and its descendants are sessionless by
       design -- the wrapper / session layer governs enforcement.
       Pass-through is correct here.

   (b) Non-empty SessionID missing from the session manager. This is
       a real session-accounting bug, not a race. Treating it as a
       silent allow turns the bug into silent under-enforcement.

   Fix: mark case (a) explicitly. New TraceeState.SessionlessPIDAttach
   and ExecContext.SessionlessPIDAttach fields. attach.go sets the
   flag on the root tracee when opts.sessionID == "" (the legitimate
   attach_mode=pid path). seedChildStateFromParent propagates it to
   descendants. HandleExecve splits the unknown-session branch:
   SessionlessPIDAttach=true -> allow with rule sessionless_pid_attach
   (slog.Debug); otherwise -> deny with rule unknown_session
   (slog.Warn). Case (b) is now visible and fail-closed again.

Tests (all pass under -race):

- internal/ptrace/tracer_inherit_test.go: covers findParentByTGID
  and seedChildStateFromParent. Verifies all enforcement fields
  propagate (including SessionlessPIDAttach), the conditional skip
  of PendingPrefilter when parent already has the filter installed,
  and the nil-parent / per-thread defaults.
- internal/api/ptrace_handlers_test.go:
  TestHandleExecve_SessionlessPIDAttachAllows verifies (a) allows
  with rule sessionless_pid_attach.
  TestHandleExecve_NonEmptyUnknownSessionDenies verifies (b) denies
  with rule unknown_session and EACCES.

Empirical / functional validation against the original #292 cascade:
pure upstream v0.19.3+f531584 still reproduces 30/30 MODE_A failures
on attach_mode=pid + seccomp_prefilter under sustained sb.run()-equivalent load;
the revised binary passes 90/90 calls across 3 iterations with the
SessionlessPIDAttach=true branch taking the allow path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rent

The previous helper hard-coded SuppressInitialStop=true. That's correct
for handleNewChild's normal-path caller (state is created on the
parent's PTRACE_EVENT_FORK, BEFORE the child's initial SIGSTOP arrives,
so the upcoming stop should be swallowed) but wrong for the two
minimal-state fallback callers in handleStop() and handleEventStop(),
where the child's initial stop has ALREADY been dispatched -- that's
exactly what triggered the fallback. Leaving the flag set there
silently swallows the NEXT external SIGSTOP delivered to the process
(e.g. kill -STOP, debugger), which is a behavioral regression vs. the
pre-refactor fallback code that deliberately did not set this flag.

Add a suppressInitialStop bool parameter and pass:
  - true from handleNewChild's else branch (normal create path)
  - false from the two fallback paths

Also corrects the helper docstring and the handleNewChild / handleEventStop
inline comments that referred to "run()" -- the actual containing
function is handleStop().

Adds TestSeedChildStateFromParent_SuppressInitialStopFalse covering the
fallback contract; the existing three tests pass true explicitly to
preserve their normal-path coverage.

Found by roborev review of #312 (job 9372, Medium severity).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…date path

handleNewChild's existing-state branch reconciles a child created via
the procfs-based minimal-state fallback (handleStop/handleEventStop)
with the kernel-authoritative parent from PTRACE_EVENT_FORK. It already
overwrites SessionID, HasPrefilter, PendingPrefilter, and the four
escalation flags so a wrong inference by findParentByTGID + readPPID
cannot survive once the fork event arrives. SessionlessPIDAttach was
missing from that list.

Worst case without this: the procfs fallback infers SessionlessPIDAttach=true
on a child whose true parent has it false (and whose SessionID's session
has already vanished). HandleExecve then misses the session lookup,
falls into the sessionless_pid_attach branch, and silently allows --
exactly the case the new "unknown_session" deny path is supposed to
make visible.

Found by roborev review of the branch (job 9375, Low severity).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@erans erans merged commit 2f55605 into main May 20, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Intermittent shim/server hang on long sb.run() sequences (v0.19.2, ptrace-pid hybrid)

2 participants