feat: eBPF process lifecycle tracking + ProcessMetadataCache library API by zz85 · Pull Request #89 · zz85/profile-bee

zz85 · 2026-03-31T10:34:00Z

Add eBPF-driven process exec/exit detection and a ProcessMetadataCache to profile-bee's library API, enabling external profiling agents to get process-aware profiling without polling /proc.

eBPF changes:

sched_process_exec tracepoint program (detects execve() calls)
lifecycle_tracking_map to broaden exit events for all processes
process_exec_events ring buffer for exec notifications

Userspace:

ProcessEvent unified type replaces ProcessExitEvent in PerfWork
DwarfThreadMsg::ProcessExeced triggers DWARF table reload on exec
ProcessMetadataCache: lazy /proc reader with eBPF-driven eviction
PointerStackFramesCache::invalidate_pid() for exec cache flush
SessionConfig::track_process_lifecycle (auto-on with DWARF)
Graceful fallback for older eBPF binaries without new programs

Benefits for profile-bee core:

Proactive DWARF table loading on exec (no 1s poll delay)
Symbol cache invalidation on exec (no stale resolutions)
Immediate DWARF shard reclamation on exit

97 tests pass (9 new for ProcessMetadataCache), clippy -D warnings clean.

Summary by CodeRabbit

New Features
- Process lifecycle tracking: captures exec and exit events to keep process state, trigger symbol/DWARF reloads, and update monitoring reliably.
- Process metadata cache: on-demand /proc snapshots (cmdline, cwd, env, exe, mount ns) with capacity and automatic eviction; runtime access for enrichment and debugging.
- Unified process event model and optional system-wide lifecycle mode.
Documentation
- Agent guide updated with lifecycle-tracking, process-metadata cache, and enrichment examples.
Tests
- Unit tests for process-metadata cache (load, invalidate, capacity, env lookup).

coderabbitai · 2026-03-31T10:34:15Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3096aef5-0bf4-4364-8f02-6eb633e619da

📥 Commits

Reviewing files that changed from the base of the PR and between 7d3ca02 and 276964c.

📒 Files selected for processing (1)

profile-bee/src/process_metadata.rs

📝 Walkthrough

Walkthrough

eBPF now emits both exec and exit lifecycle events; userspace unifies them into a single ProcessEvent and dispatches EXEC vs EXIT through pipeline, event loop, and DWARF thread. A lazy, capacity-bounded per-PID ProcessMetadataCache and per-PID symbol/frame invalidation were added; session config controls lifecycle tracking and metadata usage.

Changes

Cohort / File(s)	Summary
Docs & Session Config `docs/custom_profiling_agent.md`, `profile-bee/src/session.rs`	Documented ProcessMetadataCache usage; added `SessionConfig.track_process_lifecycle: bool` (default false); lifecycle tracking enables exec tracepoint, sets lifecycle flag in eBPF, and passes `enable_process_metadata` to the event loop.
Common Event Types `profile-bee-common/src/lib.rs`	Added `ProcessExecEvent` and unified `ProcessEvent` (with `event_type`/`pid`/`exit_code`); introduced `PROCESS_EVENT_EXIT`/`PROCESS_EVENT_EXEC`; deprecated `ProcessExitEvent` in docs.
eBPF Programs `profile-bee-ebpf/src/lib.rs`, `profile-bee-ebpf/src/main.rs`	Added `LIFECYCLE_TRACKING_MAP` and `lifecycle_tracking_enabled()`; new `process_exec_events` ring buffer and `RING_BUF_PROCESS_EXEC`; added `handle_process_exec` and `tracepoint_process_exec`; exit emission broadened when lifecycle tracking enabled.
Userspace eBPF API `profile-bee/src/ebpf.rs`	Added `setup_process_exec_ring_buffer`, `attach_process_exec_tracepoint`, and `EbpfProfiler::set_lifecycle_tracking` to detect optional maps/programs and toggle lifecycle-tracking map value.
Pipeline & Ring-buffer Tasks `profile-bee/src/pipeline.rs`	Replaced `PerfWork::ProcessExit` with `PerfWork::ProcessEvent`; exit reader now wraps into `ProcessEvent`; added `setup_process_exec_ring_buffer_task`; added `DwarfThreadMsg::ProcessExeced` and exec-driven DWARF remove+reload flow.
Event Loop & Metadata Cache `profile-bee/src/event_loop.rs`	Added optional `ProcessMetadataCache`, `pending_exit_pids`, and `enable_process_metadata` flag; `drain_events` handles `ProcessEvent` EXIT/EXEC, evicts/invalidate metadata and symbol caches, and exposes `evict_pending_exits()` and `process_metadata()` accessors.
Process Metadata Module `profile-bee/src/process_metadata.rs`	New module: `ProcessMetadata` and `ProcessMetadataCache` with lazy /proc reads, PID-reuse detection via start_time, capacity bounds, environ lookup, `get_or_load`/`get`/`invalidate`/`remove`, stats, and unit tests.
Per-PID Cache Invalidation `profile-bee/src/cache.rs`, `profile-bee/src/trace_handler.rs`	Added `PointerStackFramesCache::invalidate_pid` and `TraceHandler::invalidate_caches_for_pid` to clear per-PID frame/address cache entries on exec.
Binary / Event Handling `profile-bee/bin/profile-bee.rs`	Event loop constructed with `enable_process_metadata` per lifecycle flag; unified handling of `PerfWork::ProcessEvent` for EXEC/EXIT, forwarding `ProcessExeced`/`ProcessExited` messages and updating `known_tgids`.
Exports `profile-bee/src/lib.rs`	Exported `process_metadata` module for external use.

Sequence Diagram

sequenceDiagram
    actor Kernel
    participant eBPF as "eBPF Programs"
    participant RingBuf as "Ring Buffers"
    participant Pipeline as "Event Pipeline"
    participant EventLoop as "Event Loop"
    participant DwarfThread as "DWARF Thread"
    participant Cache as "ProcessMetadataCache"

    Kernel->>eBPF: sched_process_exec tracepoint
    eBPF->>RingBuf: emit ProcessExecEvent
    RingBuf->>Pipeline: read ProcessExecEvent
    Pipeline->>Pipeline: convert to ProcessEvent (EXEC)
    Pipeline->>EventLoop: send ProcessEvent (EXEC)

    Kernel->>eBPF: sched_process_exit tracepoint
    eBPF->>RingBuf: emit ProcessExitEvent
    RingBuf->>Pipeline: read ProcessExitEvent
    Pipeline->>Pipeline: convert to ProcessEvent (EXIT)
    Pipeline->>EventLoop: send ProcessEvent (EXIT)

    EventLoop->>EventLoop: match event_type

    rect rgba(100,200,100,0.5)
    EventLoop->>DwarfThread: ProcessExeced(pid)
    DwarfThread->>DwarfThread: remove old DWARF state
    DwarfThread->>DwarfThread: reload DWARF for pid
    EventLoop->>EventLoop: invalidate symbol caches
    EventLoop->>Cache: invalidate(pid)
    end

    rect rgba(200,100,100,0.5)
    EventLoop->>DwarfThread: ProcessExited(pid)
    DwarfThread->>DwarfThread: cleanup DWARF state
    EventLoop->>EventLoop: remove from known_tgids
    EventLoop->>Cache: remove(pid)
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

refactor Library API, add new Output Formats + AWS CodeGuru Integration #83: Prior related changes to session/event_loop/ebpf/pipeline APIs that this PR extends (exec/exit lifecycle and metadata additions).
Dwarf speed up #75: Related modifications to PointerStackFramesCache and per-PID cache behavior; intersects invalidate_pid changes.
feat: replace PROC_INFO HashMap with EXEC_MAPPINGS LPM trie #80: Overlaps on eBPF lifecycle/exit handling and lifecycle-tracking map semantics.

Poem

🐰 I nibble on execs and exits with delight,

Caches hop away at each new process sight,
eBPF drums the tune and ring buffers hum,
DWARF folds and reloads — fresh traces come,
The rabbit applauds: "Lifecycle tracking — yum!"

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main feature addition: eBPF process lifecycle tracking combined with a ProcessMetadataCache library API, which aligns with the primary objectives of the pull request.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch process_lifecycle

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Add eBPF-driven process exec/exit detection and a ProcessMetadataCache to profile-bee's library API, enabling external profiling agents to get process-aware profiling without polling /proc. eBPF changes: - sched_process_exec tracepoint program (detects execve() calls) - lifecycle_tracking_map to broaden exit events for all processes - process_exec_events ring buffer for exec notifications Userspace: - ProcessEvent unified type replaces ProcessExitEvent in PerfWork - DwarfThreadMsg::ProcessExeced triggers DWARF table reload on exec - ProcessMetadataCache: lazy /proc reader with eBPF-driven eviction - PointerStackFramesCache::invalidate_pid() for exec cache flush - SessionConfig::track_process_lifecycle (auto-on with DWARF) - Graceful fallback for older eBPF binaries without new programs Benefits for profile-bee core: - Proactive DWARF table loading on exec (no 1s poll delay) - Symbol cache invalidation on exec (no stale resolutions) - Immediate DWARF shard reclamation on exit 97 tests pass (9 new for ProcessMetadataCache), clippy -D warnings clean.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

profile-bee-ebpf/src/lib.rs (1)
1429-1443: Consider rate-limiting under extreme fork workloads.

handle_process_exec fires for every execve() system-wide when lifecycle tracking is enabled. On systems with very high fork rates (e.g., build servers, CI runners), this could generate significant event volume.

Current behavior is acceptable for typical profiling use cases:

Ring buffer will drop events if full (graceful degradation)

Userspace async task drains efficiently

DWARF/cache invalidation is idempotent

For production agents on high-churn systems, consider documenting this trade-off or adding optional PID filtering.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@profile-bee-ebpf/src/lib.rs` around lines 1429 - 1443, handle_process_exec
currently emits a ProcessExecEvent for every execve system-wide (using
ctx.tgid() and RING_BUF_PROCESS_EXEC), which can overwhelm high-churn hosts; add
an optional PID filter or simple rate-limiting check inside handle_process_exec
to bail out early (e.g., consult a configurable allowlist/denylist or
token-bucket counter) before reserving/writing to RING_BUF_PROCESS_EXEC so
events for uninteresting PIDs or excessive frequency are skipped—update
ProcessExecEvent emission flow and any config hooks to support
enabling/disabling PID filtering at runtime.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@profile-bee/src/event_loop.rs`:
- Around line 246-253: The code evicts process metadata on PROCESS_EVENT_EXIT
(see PROCESS_EVENT_EXIT branch, DwarfThreadMsg::ProcessExited, self.known_tgids,
self.process_metadata), which loses data for processes that start+exit within
the same collection window; stop removing the PID from ProcessMetadataCache here
— either remove the cache.remove(event.pid) call and let collect()/collect_raw()
consume and evict snapshots, or eagerly populate the cache when a PID is first
seen so eviction on exit is safe; ensure whichever approach you pick updates the
consumer-visible lifecycle (e.g., add a post-collection eviction step or an
"evict after consumed" marker) rather than deleting inside the
PROCESS_EVENT_EXIT handler.
- Around line 261-274: When handling PROCESS_EVENT_EXEC in event_loop
(PROCESS_EVENT_EXEC branch) we invalidate symbol caches
(trace_handler.invalidate_caches_for_pid and process_metadata.invalidate) while
leaving pre-exec samples in trace_count, which causes build_raw_stacks() to
re-symbolize those pre-exec StackInfo against the new image; fix by flushing
this PID's accumulated samples from trace_count before invalidating caches or
else attach an exec-generation version to stored samples so build_raw_stacks()
can skip/resymbolize correctly. Concretely, update the PROCESS_EVENT_EXEC
handler to (a) remove or clear entries for event.pid from trace_count (the data
structure holding per-PID StackInfo counts) before calling
invalidate_caches_for_pid(event.pid) and process_metadata.invalidate(event.pid),
or (b) increment a per-PID exec-generation counter stored alongside known_tgids
and annotate new samples with that generation so build_raw_stacks()
ignores/keeps pre-exec samples tied to the older generation.

In `@profile-bee/src/process_metadata.rs`:
- Around line 118-136: get_or_load currently returns entries purely by pid,
which allows stale metadata to be reused when a PID is recycled; update
get_or_load in ProcessMetadataCache to validate the cached entry against a
stable process identity (e.g. the /proc/[pid]/stat starttime) before returning.
Concretely: when cache.contains_key(&pid) call a validation helper (e.g.
ProcessMetadata::current_start_time(pid) or a new method on ProcessMetadata) and
compare it to the stored entry’s start_time; if they differ, drop the cached
entry, call ProcessMetadata::load(pid) to reload and insert the new metadata,
and then return the fresh entry; otherwise return the cached reference as
before. Ensure this logic is applied in the get_or_load method and that
ProcessMetadata stores the start_time used for validation.

---

Nitpick comments:
In `@profile-bee-ebpf/src/lib.rs`:
- Around line 1429-1443: handle_process_exec currently emits a ProcessExecEvent
for every execve system-wide (using ctx.tgid() and RING_BUF_PROCESS_EXEC), which
can overwhelm high-churn hosts; add an optional PID filter or simple
rate-limiting check inside handle_process_exec to bail out early (e.g., consult
a configurable allowlist/denylist or token-bucket counter) before
reserving/writing to RING_BUF_PROCESS_EXEC so events for uninteresting PIDs or
excessive frequency are skipped—update ProcessExecEvent emission flow and any
config hooks to support enabling/disabling PID filtering at runtime.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 793c12eb-dd91-40dc-a8b4-e14b9344b97f

📥 Commits

Reviewing files that changed from the base of the PR and between 75b93ca and 14cab1a.

⛔ Files ignored due to path filters (2)

profile-bee-ebpf/Cargo.lock is excluded by !**/*.lock
profile-bee/ebpf-bin/profile-bee.bpf.o is excluded by !**/*.o

📒 Files selected for processing (13)

docs/custom_profiling_agent.md
profile-bee-common/src/lib.rs
profile-bee-ebpf/src/lib.rs
profile-bee-ebpf/src/main.rs
profile-bee/bin/profile-bee.rs
profile-bee/src/cache.rs
profile-bee/src/ebpf.rs
profile-bee/src/event_loop.rs
profile-bee/src/lib.rs
profile-bee/src/pipeline.rs
profile-bee/src/process_metadata.rs
profile-bee/src/session.rs
profile-bee/src/trace_handler.rs

…ct PID reuse Three correctness fixes for the process lifecycle tracking: 1. Defer ProcessMetadataCache eviction with one-cycle buffer: exit PIDs are collected in pending_exit_pids during drain_events() and evicted at the START of the next drain_events() call. This gives agents one full collection cycle to read metadata for processes that exited, covering async ring buffer delivery where a sample may arrive after its process's exit event. evict_pending_exits() remains public for agents that need immediate eviction under memory pressure. 2. Flush trace_count on exec: pre-exec StackInfo entries were left in trace_count after invalidating symbol caches, causing build_raw_stacks() to symbolize old addresses against the new binary image (producing [unknown] symbols). Now trace_count.retain() removes entries for the exec'd PID before cache invalidation. 3. Detect PID reuse via starttime: get_or_load() now reads /proc/[pid]/stat starttime and compares it to the cached entry. If they differ, the PID was recycled and the stale entry is replaced with fresh metadata.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@profile-bee/src/event_loop.rs`:
- Around line 78-83: pending_exit_pids currently only stores pid (Vec<u32>),
which can match a recycled PID later; change it to store a process identity
(e.g., a small struct/tuple like ProcessIdentity { pid: u32, start_time_ns: u64
} or (u32, u64)) and update all places that push/read it (places around
evict_pending_exits(), where pending_exit_pids is populated during
drain_events()/build_raw_stacks(), and the other occurrences noted) so eviction
checks equality on both pid and start_time (or other unique start marker) rather
than pid alone; update types/signatures (pending_exit_pids, evict_pending_exits,
any iterators) and conversion points where you previously pushed a u32 to
instead capture the process start timestamp from process metadata and store the
identity.

In `@profile-bee/src/process_metadata.rs`:
- Around line 62-64: The current logic caches process entries even when
process.stat() fails (start_time == 0), allowing stale data to bypass PID-reuse
checks; update the loading/cache path (where process.stat() and start_time are
computed — e.g., the code that calls process.stat().ok() and assigns start_time)
to treat a missing stat as a load failure and do not insert the entry into the
cache when start_time == 0 (or when stat() returned None); ensure get_or_load()
and the alternative path at lines ~141-157 follow the same rule so only entries
with a valid start_time are persisted to the cache and PID-reuse detection
remains effective.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dc1f8222-7d4e-47ef-9bcb-8f613dcce67a

📥 Commits

Reviewing files that changed from the base of the PR and between 14cab1a and f77b7ac.

📒 Files selected for processing (2)

profile-bee/src/event_loop.rs
profile-bee/src/process_metadata.rs

coderabbitai · 2026-03-31T16:41:46Z

+    /// PIDs that exited during the current drain_events() call.
+    /// Eviction from `process_metadata` is deferred until after
+    /// `build_raw_stacks()` completes, so agents can still look up
+    /// metadata for processes that exited within the collection window.
+    pending_exit_pids: Vec<u32>,
 }


⚠️ Potential issue | 🟠 Major

Deferred exit eviction should carry process identity, not only PID.

pending_exit_pids stores only pid, so if that PID is recycled before the next drain_events(), evict_pending_exits() can delete metadata for the new process instance.

🩹 Suggested fix (identity-aware deferred eviction)

- pending_exit_pids: Vec<u32>, + pending_exit_pids: Vec<(u32, u64)>, @@ - for pid in self.pending_exit_pids.drain(..) { - cache.remove(pid); + for (pid, exited_start_time) in self.pending_exit_pids.drain(..) { + let should_remove = cache + .get(pid) + .map(|m| m.start_time == exited_start_time) + .unwrap_or(false); + if should_remove { + cache.remove(pid); + } } @@ - if self.process_metadata.is_some() { - self.pending_exit_pids.push(event.pid); + if let Some(cache) = self.process_metadata.as_ref() { + if let Some(meta) = cache.get(event.pid) { + self.pending_exit_pids.push((event.pid, meta.start_time)); + } }

Also applies to: 173-177, 274-286

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@profile-bee/src/event_loop.rs` around lines 78 - 83, pending_exit_pids currently only stores pid (Vec<u32>), which can match a recycled PID later; change it to store a process identity (e.g., a small struct/tuple like ProcessIdentity { pid: u32, start_time_ns: u64 } or (u32, u64)) and update all places that push/read it (places around evict_pending_exits(), where pending_exit_pids is populated during drain_events()/build_raw_stacks(), and the other occurrences noted) so eviction checks equality on both pid and start_time (or other unique start marker) rather than pid alone; update types/signatures (pending_exit_pids, evict_pending_exits, any iterators) and conversion points where you previously pushed a u32 to instead capture the process start timestamp from process metadata and store the identity.

…urn) sched_process_exit fires for every thread exit, not just process exits. On thread-heavy hosts (Java, Go, Node.js), this generated thousands of duplicate ring buffer events per second for the same tgid, each waking the tokio AsyncFd task and pushing through the mpsc channel. Add tid != tgid early return so only actual process exits (main thread) produce events. sched_process_exec already only fires once per execve() so no filter needed there.

…e entries load() used map_or(0, ...) when stat() failed, caching entries with start_time == 0. The get_or_load() validation skipped PID reuse checks when cached_start == 0, so stale metadata was served indefinitely. Now load() requires stat() to succeed (early return None on failure), guaranteeing all cached entries have a valid start_time. Simplified the get_or_load() validation — the cached_start != 0 guard is no longer needed since load() enforces it.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

profile-bee/src/process_metadata.rs (1)
156-163: Consider opportunistic dead-entry pruning before hard-cap rejection.

When full, get_or_load() immediately returns None. With ProcessMetadataCache::new(4096) in profile-bee/src/event_loop.rs (Line 143), missed/delayed lifecycle events can make metadata lookup degrade for all new PIDs. A cheap prune pass on full-cache path would improve resilience.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@profile-bee/src/process_metadata.rs` around lines 156 - 163, In
get_or_load(), instead of immediately returning None when self.cache.len() >=
self.max_entries, perform a cheap opportunistic prune of dead/stale entries from
self.cache (e.g., remove entries whose metadata indicates the process exited or
whose last_seen timestamp is older than a short threshold), then re-check
capacity and proceed to insert; if the cache remains full after pruning, emit
the existing warn and return None. Update logic around the existing capacity
check in get_or_load() (the branch that uses self.max_entries) so the prune step
runs before the hard-cap rejection; keep pruning cheap and non-blocking to avoid
latency regressions (this complements ProcessMetadataCache::new(4096) usage in
event_loop.rs).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@profile-bee-ebpf/src/lib.rs`:
- Around line 75-79: The exec tracepoint handler emits events regardless of the
runtime gate; update handle_process_exec (and the other exec-emitting sites
noted) to check lifecycle_tracking_enabled() (which reads
LIFECYCLE_TRACKING_MAP) before allocating or writing exec notifications so
events are only produced when lifecycle tracking is enabled; bail out early from
handle_process_exec and the same exec emission paths when
lifecycle_tracking_enabled() returns false to avoid host-wide ring-buffer
pressure and cache churn.

In `@profile-bee/src/process_metadata.rs`:
- Around line 137-154: When a cached PID is found in get_or_load, current_start
== 0 (ProcessMetadata::current_start_time returning 0) must be treated as
"process gone" instead of returning stale data; change the branch in get_or_load
so that when self.cache.contains_key(&pid) and current_start == 0 you remove the
cached entry from self.cache and return None (rather than falling through to the
existing else that returns the stale entry), keeping the existing behavior for
the PID-reuse case when current_start != cached_start (remove and reload).

---

Nitpick comments:
In `@profile-bee/src/process_metadata.rs`:
- Around line 156-163: In get_or_load(), instead of immediately returning None
when self.cache.len() >= self.max_entries, perform a cheap opportunistic prune
of dead/stale entries from self.cache (e.g., remove entries whose metadata
indicates the process exited or whose last_seen timestamp is older than a short
threshold), then re-check capacity and proceed to insert; if the cache remains
full after pruning, emit the existing warn and return None. Update logic around
the existing capacity check in get_or_load() (the branch that uses
self.max_entries) so the prune step runs before the hard-cap rejection; keep
pruning cheap and non-blocking to avoid latency regressions (this complements
ProcessMetadataCache::new(4096) usage in event_loop.rs).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c6253d12-c345-4eb9-ad2a-8f15f4474075

📥 Commits

Reviewing files that changed from the base of the PR and between f77b7ac and 1cda34f.

⛔ Files ignored due to path filters (1)

profile-bee/ebpf-bin/profile-bee.bpf.o is excluded by !**/*.o

📒 Files selected for processing (2)

profile-bee-ebpf/src/lib.rs
profile-bee/src/process_metadata.rs

coderabbitai · 2026-04-01T04:48:50Z

+/// Whether process lifecycle tracking is enabled (0 = disabled, 1 = enabled).
+/// When enabled, exit events fire for ALL process exits (not just DWARF-tracked
+/// or monitored PIDs), and the exec tracepoint sends exec events.
+#[map(name = "lifecycle_tracking_map")]
+static LIFECYCLE_TRACKING_MAP: Array<u32> = Array::with_max_entries(1, 0);


⚠️ Potential issue | 🟠 Major

Gate exec notifications on lifecycle_tracking_enabled().

LIFECYCLE_TRACKING_MAP is documented as the runtime switch for exec notifications, but handle_process_exec() ignores it and emits on every attached exec. That creates avoidable host-wide ring-buffer pressure and cache churn when lifecycle tracking is off.

Suggested guard

pub unsafe fn handle_process_exec<C: EbpfContext>(ctx: C) { use profile_bee_common::ProcessExecEvent; + if !lifecycle_tracking_enabled() { + return; + } + let tgid = ctx.tgid(); if let Some(mut entry) = RING_BUF_PROCESS_EXEC.reserve::<ProcessExecEvent>(0) { let exec_event = ProcessExecEvent { pid: tgid, _pad: 0 }; let _writable = entry.write(exec_event);

Also applies to: 1448-1457

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@profile-bee-ebpf/src/lib.rs` around lines 75 - 79, The exec tracepoint handler emits events regardless of the runtime gate; update handle_process_exec (and the other exec-emitting sites noted) to check lifecycle_tracking_enabled() (which reads LIFECYCLE_TRACKING_MAP) before allocating or writing exec notifications so events are only produced when lifecycle tracking is enabled; bail out early from handle_process_exec and the same exec emission paths when lifecycle_tracking_enabled() returns false to avoid host-wide ring-buffer pressure and cache churn.

coderabbitai · 2026-04-01T04:48:50Z

+    let tid = ctx.pid();
    let tgid = ctx.tgid();
+
+    // Skip thread exits — only fire for the main thread (process exit).
+    if tid != tgid {
+        return;
+    }


⚠️ Potential issue | 🔴 Critical

tid == tgid is not a reliable “process is gone” check.

The thread-group leader can exit via pthread_exit() while other threads in the same tgid keep running. This branch would emit a ProcessExitEvent early, which can stop --pid profiling and evict DWARF/process metadata for a still-live process, while the real last-thread exit may be filtered out later. Please key this off actual thread-group teardown, not leader-thread exit.

get_or_load() returned stale cached entries when current_start_time() returned 0 (process exited, /proc/[pid]/stat unreadable). The condition `current_start != 0 && current_start != cached_start` evaluated to false, falling through to the else branch that returned the stale entry. Now current_start == 0 is handled explicitly: remove the stale entry and return None, since the process no longer exists.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

profile-bee/src/process_metadata.rs (1)

161-167: Don't warn on every miss after saturation.

Once the cache is full, every uncached PID will emit a warning. On a workload with lots of short-lived processes, that can turn into log spam from the sampling path. Consider logging once per “full” state and using a counter/metric for the rest.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@profile-bee/src/process_metadata.rs`:
- Around line 137-159: The public getter get() currently skips the PID/exit
revalidation that get_or_load() performs, allowing callers to observe stale
cmdline/exe/environ; either rename get() to get_cached_unchecked() (and update
callers) to make unchecked semantics explicit, or modify get() to perform the
same validation as get_or_load() (reuse ProcessMetadata::current_start_time(pid)
and the cache removal/reload logic) and then forward to get_or_load() or return
the validated cached entry; apply the same change for the analogous method
referenced around lines 177-180 so all public accessors either validate or are
explicitly named as unchecked.
- Around line 34-35: The derived Debug on ProcessMetadata can leak secrets via
the environ and cmdline fields; remove Debug from the derive on the
ProcessMetadata struct and add a manual impl of std::fmt::Debug for
ProcessMetadata that redacts or omits sensitive content (e.g., print
placeholders or counts like "<redacted environ (N entries)>" and "<redacted
cmdline (N args)>" while still showing non-sensitive fields), retain
#[derive(Clone)] for Clone, and implement the Debug impl using fmt::DebugStruct
(or fmt::Formatter) to format the other fields safely.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b1d98863-9c8f-4497-9a10-d2bbaf08d092

📥 Commits

Reviewing files that changed from the base of the PR and between 1cda34f and 7d3ca02.

📒 Files selected for processing (1)

profile-bee/src/process_metadata.rs

coderabbitai · 2026-04-01T05:06:34Z

+    pub fn get_or_load(&mut self, pid: u32) -> Option<&ProcessMetadata> {
+        if self.cache.contains_key(&pid) {
+            // Validate against PID reuse and process exit: compare the cached
+            // start_time against the current /proc/[pid]/stat starttime.
+            let cached_start = self.cache[&pid].start_time;
+            let current_start = ProcessMetadata::current_start_time(pid);
+            if current_start == 0 {
+                // Process is gone (/proc/[pid]/stat unreadable) — remove stale entry.
+                self.cache.remove(&pid);
+                return None;
+            }
+            if current_start != cached_start {
+                tracing::debug!(
+                    "PID {} recycled (starttime {} -> {}), reloading metadata",
+                    pid,
+                    cached_start,
+                    current_start,
+                );
+                self.cache.remove(&pid);
+                // Fall through to reload below
+            } else {
+                return self.cache.get(&pid);
+            }


⚠️ Potential issue | 🟠 Major

Make the unchecked get() semantics explicit.

get_or_load() does the PID/exit revalidation, but get() skips it entirely. Because the validated path still reopens /proc/[pid]/stat on cache hits, callers on a hot path are likely to reach for get() and can then read stale cmdline/exe/environ after a missed lifecycle event. I'd either rename this to something like get_cached_unchecked() or make the public getter validated as well.

Also applies to: 177-180

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@profile-bee/src/process_metadata.rs` around lines 137 - 159, The public getter get() currently skips the PID/exit revalidation that get_or_load() performs, allowing callers to observe stale cmdline/exe/environ; either rename get() to get_cached_unchecked() (and update callers) to make unchecked semantics explicit, or modify get() to perform the same validation as get_or_load() (reuse ProcessMetadata::current_start_time(pid) and the cache removal/reload logic) and then forward to get_or_load() or return the validated cached entry; apply the same change for the analogous method referenced around lines 177-180 so all public accessors either validate or are explicitly named as unchecked.

Replace derived Debug with manual impl that shows entry counts instead of values ('<N args>', '<N vars>') to prevent secrets like AWS_SECRET_ACCESS_KEY from leaking into logs via {:?} formatting.

zz85 force-pushed the process_lifecycle branch from 3884f03 to 14cab1a Compare March 31, 2026 10:35

coderabbitai Bot reviewed Mar 31, 2026

View reviewed changes

Comment thread profile-bee/src/event_loop.rs

Comment thread profile-bee/src/event_loop.rs

Comment thread profile-bee/src/process_metadata.rs

coderabbitai Bot reviewed Mar 31, 2026

View reviewed changes

zz85 added 2 commits April 1, 2026 00:30

coderabbitai Bot reviewed Apr 1, 2026

View reviewed changes

fix: redact environ/cmdline in ProcessMetadata Debug impl

276964c

Replace derived Debug with manual impl that shows entry counts instead of values ('<N args>', '<N vars>') to prevent secrets like AWS_SECRET_ACCESS_KEY from leaking into logs via {:?} formatting.

zz85 merged commit dcc0294 into main Apr 1, 2026
3 of 4 checks passed

zz85 deleted the process_lifecycle branch April 1, 2026 05:33

coderabbitai Bot mentioned this pull request Apr 18, 2026

Deeper V8 integration #95

Merged

Conversation

zz85 commented Mar 31, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zz85 commented Mar 31, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 31, 2026 •

edited

Loading