feat: eBPF process lifecycle tracking + ProcessMetadataCache library API#89
Conversation
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthrougheBPF now emits both exec and exit lifecycle events; userspace unifies them into a single ProcessEvent and dispatches EXEC vs EXIT through pipeline, event loop, and DWARF thread. A lazy, capacity-bounded per-PID ProcessMetadataCache and per-PID symbol/frame invalidation were added; session config controls lifecycle tracking and metadata usage. Changes
Sequence DiagramsequenceDiagram
actor Kernel
participant eBPF as "eBPF Programs"
participant RingBuf as "Ring Buffers"
participant Pipeline as "Event Pipeline"
participant EventLoop as "Event Loop"
participant DwarfThread as "DWARF Thread"
participant Cache as "ProcessMetadataCache"
Kernel->>eBPF: sched_process_exec tracepoint
eBPF->>RingBuf: emit ProcessExecEvent
RingBuf->>Pipeline: read ProcessExecEvent
Pipeline->>Pipeline: convert to ProcessEvent (EXEC)
Pipeline->>EventLoop: send ProcessEvent (EXEC)
Kernel->>eBPF: sched_process_exit tracepoint
eBPF->>RingBuf: emit ProcessExitEvent
RingBuf->>Pipeline: read ProcessExitEvent
Pipeline->>Pipeline: convert to ProcessEvent (EXIT)
Pipeline->>EventLoop: send ProcessEvent (EXIT)
EventLoop->>EventLoop: match event_type
rect rgba(100,200,100,0.5)
EventLoop->>DwarfThread: ProcessExeced(pid)
DwarfThread->>DwarfThread: remove old DWARF state
DwarfThread->>DwarfThread: reload DWARF for pid
EventLoop->>EventLoop: invalidate symbol caches
EventLoop->>Cache: invalidate(pid)
end
rect rgba(200,100,100,0.5)
EventLoop->>DwarfThread: ProcessExited(pid)
DwarfThread->>DwarfThread: cleanup DWARF state
EventLoop->>EventLoop: remove from known_tgids
EventLoop->>Cache: remove(pid)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Add eBPF-driven process exec/exit detection and a ProcessMetadataCache to profile-bee's library API, enabling external profiling agents to get process-aware profiling without polling /proc. eBPF changes: - sched_process_exec tracepoint program (detects execve() calls) - lifecycle_tracking_map to broaden exit events for all processes - process_exec_events ring buffer for exec notifications Userspace: - ProcessEvent unified type replaces ProcessExitEvent in PerfWork - DwarfThreadMsg::ProcessExeced triggers DWARF table reload on exec - ProcessMetadataCache: lazy /proc reader with eBPF-driven eviction - PointerStackFramesCache::invalidate_pid() for exec cache flush - SessionConfig::track_process_lifecycle (auto-on with DWARF) - Graceful fallback for older eBPF binaries without new programs Benefits for profile-bee core: - Proactive DWARF table loading on exec (no 1s poll delay) - Symbol cache invalidation on exec (no stale resolutions) - Immediate DWARF shard reclamation on exit 97 tests pass (9 new for ProcessMetadataCache), clippy -D warnings clean.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
profile-bee-ebpf/src/lib.rs (1)
1429-1443: Consider rate-limiting under extreme fork workloads.
handle_process_execfires for everyexecve()system-wide when lifecycle tracking is enabled. On systems with very high fork rates (e.g., build servers, CI runners), this could generate significant event volume.Current behavior is acceptable for typical profiling use cases:
- Ring buffer will drop events if full (graceful degradation)
- Userspace async task drains efficiently
- DWARF/cache invalidation is idempotent
For production agents on high-churn systems, consider documenting this trade-off or adding optional PID filtering.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@profile-bee-ebpf/src/lib.rs` around lines 1429 - 1443, handle_process_exec currently emits a ProcessExecEvent for every execve system-wide (using ctx.tgid() and RING_BUF_PROCESS_EXEC), which can overwhelm high-churn hosts; add an optional PID filter or simple rate-limiting check inside handle_process_exec to bail out early (e.g., consult a configurable allowlist/denylist or token-bucket counter) before reserving/writing to RING_BUF_PROCESS_EXEC so events for uninteresting PIDs or excessive frequency are skipped—update ProcessExecEvent emission flow and any config hooks to support enabling/disabling PID filtering at runtime.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@profile-bee/src/event_loop.rs`:
- Around line 246-253: The code evicts process metadata on PROCESS_EVENT_EXIT
(see PROCESS_EVENT_EXIT branch, DwarfThreadMsg::ProcessExited, self.known_tgids,
self.process_metadata), which loses data for processes that start+exit within
the same collection window; stop removing the PID from ProcessMetadataCache here
— either remove the cache.remove(event.pid) call and let collect()/collect_raw()
consume and evict snapshots, or eagerly populate the cache when a PID is first
seen so eviction on exit is safe; ensure whichever approach you pick updates the
consumer-visible lifecycle (e.g., add a post-collection eviction step or an
"evict after consumed" marker) rather than deleting inside the
PROCESS_EVENT_EXIT handler.
- Around line 261-274: When handling PROCESS_EVENT_EXEC in event_loop
(PROCESS_EVENT_EXEC branch) we invalidate symbol caches
(trace_handler.invalidate_caches_for_pid and process_metadata.invalidate) while
leaving pre-exec samples in trace_count, which causes build_raw_stacks() to
re-symbolize those pre-exec StackInfo against the new image; fix by flushing
this PID's accumulated samples from trace_count before invalidating caches or
else attach an exec-generation version to stored samples so build_raw_stacks()
can skip/resymbolize correctly. Concretely, update the PROCESS_EVENT_EXEC
handler to (a) remove or clear entries for event.pid from trace_count (the data
structure holding per-PID StackInfo counts) before calling
invalidate_caches_for_pid(event.pid) and process_metadata.invalidate(event.pid),
or (b) increment a per-PID exec-generation counter stored alongside known_tgids
and annotate new samples with that generation so build_raw_stacks()
ignores/keeps pre-exec samples tied to the older generation.
In `@profile-bee/src/process_metadata.rs`:
- Around line 118-136: get_or_load currently returns entries purely by pid,
which allows stale metadata to be reused when a PID is recycled; update
get_or_load in ProcessMetadataCache to validate the cached entry against a
stable process identity (e.g. the /proc/[pid]/stat starttime) before returning.
Concretely: when cache.contains_key(&pid) call a validation helper (e.g.
ProcessMetadata::current_start_time(pid) or a new method on ProcessMetadata) and
compare it to the stored entry’s start_time; if they differ, drop the cached
entry, call ProcessMetadata::load(pid) to reload and insert the new metadata,
and then return the fresh entry; otherwise return the cached reference as
before. Ensure this logic is applied in the get_or_load method and that
ProcessMetadata stores the start_time used for validation.
---
Nitpick comments:
In `@profile-bee-ebpf/src/lib.rs`:
- Around line 1429-1443: handle_process_exec currently emits a ProcessExecEvent
for every execve system-wide (using ctx.tgid() and RING_BUF_PROCESS_EXEC), which
can overwhelm high-churn hosts; add an optional PID filter or simple
rate-limiting check inside handle_process_exec to bail out early (e.g., consult
a configurable allowlist/denylist or token-bucket counter) before
reserving/writing to RING_BUF_PROCESS_EXEC so events for uninteresting PIDs or
excessive frequency are skipped—update ProcessExecEvent emission flow and any
config hooks to support enabling/disabling PID filtering at runtime.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 793c12eb-dd91-40dc-a8b4-e14b9344b97f
⛔ Files ignored due to path filters (2)
profile-bee-ebpf/Cargo.lockis excluded by!**/*.lockprofile-bee/ebpf-bin/profile-bee.bpf.ois excluded by!**/*.o
📒 Files selected for processing (13)
docs/custom_profiling_agent.mdprofile-bee-common/src/lib.rsprofile-bee-ebpf/src/lib.rsprofile-bee-ebpf/src/main.rsprofile-bee/bin/profile-bee.rsprofile-bee/src/cache.rsprofile-bee/src/ebpf.rsprofile-bee/src/event_loop.rsprofile-bee/src/lib.rsprofile-bee/src/pipeline.rsprofile-bee/src/process_metadata.rsprofile-bee/src/session.rsprofile-bee/src/trace_handler.rs
…ct PID reuse Three correctness fixes for the process lifecycle tracking: 1. Defer ProcessMetadataCache eviction with one-cycle buffer: exit PIDs are collected in pending_exit_pids during drain_events() and evicted at the START of the next drain_events() call. This gives agents one full collection cycle to read metadata for processes that exited, covering async ring buffer delivery where a sample may arrive after its process's exit event. evict_pending_exits() remains public for agents that need immediate eviction under memory pressure. 2. Flush trace_count on exec: pre-exec StackInfo entries were left in trace_count after invalidating symbol caches, causing build_raw_stacks() to symbolize old addresses against the new binary image (producing [unknown] symbols). Now trace_count.retain() removes entries for the exec'd PID before cache invalidation. 3. Detect PID reuse via starttime: get_or_load() now reads /proc/[pid]/stat starttime and compares it to the cached entry. If they differ, the PID was recycled and the stale entry is replaced with fresh metadata.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@profile-bee/src/event_loop.rs`:
- Around line 78-83: pending_exit_pids currently only stores pid (Vec<u32>),
which can match a recycled PID later; change it to store a process identity
(e.g., a small struct/tuple like ProcessIdentity { pid: u32, start_time_ns: u64
} or (u32, u64)) and update all places that push/read it (places around
evict_pending_exits(), where pending_exit_pids is populated during
drain_events()/build_raw_stacks(), and the other occurrences noted) so eviction
checks equality on both pid and start_time (or other unique start marker) rather
than pid alone; update types/signatures (pending_exit_pids, evict_pending_exits,
any iterators) and conversion points where you previously pushed a u32 to
instead capture the process start timestamp from process metadata and store the
identity.
In `@profile-bee/src/process_metadata.rs`:
- Around line 62-64: The current logic caches process entries even when
process.stat() fails (start_time == 0), allowing stale data to bypass PID-reuse
checks; update the loading/cache path (where process.stat() and start_time are
computed — e.g., the code that calls process.stat().ok() and assigns start_time)
to treat a missing stat as a load failure and do not insert the entry into the
cache when start_time == 0 (or when stat() returned None); ensure get_or_load()
and the alternative path at lines ~141-157 follow the same rule so only entries
with a valid start_time are persisted to the cache and PID-reuse detection
remains effective.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dc1f8222-7d4e-47ef-9bcb-8f613dcce67a
📒 Files selected for processing (2)
profile-bee/src/event_loop.rsprofile-bee/src/process_metadata.rs
| /// PIDs that exited during the current drain_events() call. | ||
| /// Eviction from `process_metadata` is deferred until after | ||
| /// `build_raw_stacks()` completes, so agents can still look up | ||
| /// metadata for processes that exited within the collection window. | ||
| pending_exit_pids: Vec<u32>, | ||
| } |
There was a problem hiding this comment.
Deferred exit eviction should carry process identity, not only PID.
pending_exit_pids stores only pid, so if that PID is recycled before the next drain_events(), evict_pending_exits() can delete metadata for the new process instance.
🩹 Suggested fix (identity-aware deferred eviction)
- pending_exit_pids: Vec<u32>,
+ pending_exit_pids: Vec<(u32, u64)>,
@@
- for pid in self.pending_exit_pids.drain(..) {
- cache.remove(pid);
+ for (pid, exited_start_time) in self.pending_exit_pids.drain(..) {
+ let should_remove = cache
+ .get(pid)
+ .map(|m| m.start_time == exited_start_time)
+ .unwrap_or(false);
+ if should_remove {
+ cache.remove(pid);
+ }
}
@@
- if self.process_metadata.is_some() {
- self.pending_exit_pids.push(event.pid);
+ if let Some(cache) = self.process_metadata.as_ref() {
+ if let Some(meta) = cache.get(event.pid) {
+ self.pending_exit_pids.push((event.pid, meta.start_time));
+ }
}Also applies to: 173-177, 274-286
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@profile-bee/src/event_loop.rs` around lines 78 - 83, pending_exit_pids
currently only stores pid (Vec<u32>), which can match a recycled PID later;
change it to store a process identity (e.g., a small struct/tuple like
ProcessIdentity { pid: u32, start_time_ns: u64 } or (u32, u64)) and update all
places that push/read it (places around evict_pending_exits(), where
pending_exit_pids is populated during drain_events()/build_raw_stacks(), and the
other occurrences noted) so eviction checks equality on both pid and start_time
(or other unique start marker) rather than pid alone; update types/signatures
(pending_exit_pids, evict_pending_exits, any iterators) and conversion points
where you previously pushed a u32 to instead capture the process start timestamp
from process metadata and store the identity.
…urn) sched_process_exit fires for every thread exit, not just process exits. On thread-heavy hosts (Java, Go, Node.js), this generated thousands of duplicate ring buffer events per second for the same tgid, each waking the tokio AsyncFd task and pushing through the mpsc channel. Add tid != tgid early return so only actual process exits (main thread) produce events. sched_process_exec already only fires once per execve() so no filter needed there.
…e entries load() used map_or(0, ...) when stat() failed, caching entries with start_time == 0. The get_or_load() validation skipped PID reuse checks when cached_start == 0, so stale metadata was served indefinitely. Now load() requires stat() to succeed (early return None on failure), guaranteeing all cached entries have a valid start_time. Simplified the get_or_load() validation — the cached_start != 0 guard is no longer needed since load() enforces it.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
profile-bee/src/process_metadata.rs (1)
156-163: Consider opportunistic dead-entry pruning before hard-cap rejection.When full,
get_or_load()immediately returnsNone. WithProcessMetadataCache::new(4096)inprofile-bee/src/event_loop.rs(Line 143), missed/delayed lifecycle events can make metadata lookup degrade for all new PIDs. A cheap prune pass on full-cache path would improve resilience.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@profile-bee/src/process_metadata.rs` around lines 156 - 163, In get_or_load(), instead of immediately returning None when self.cache.len() >= self.max_entries, perform a cheap opportunistic prune of dead/stale entries from self.cache (e.g., remove entries whose metadata indicates the process exited or whose last_seen timestamp is older than a short threshold), then re-check capacity and proceed to insert; if the cache remains full after pruning, emit the existing warn and return None. Update logic around the existing capacity check in get_or_load() (the branch that uses self.max_entries) so the prune step runs before the hard-cap rejection; keep pruning cheap and non-blocking to avoid latency regressions (this complements ProcessMetadataCache::new(4096) usage in event_loop.rs).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@profile-bee-ebpf/src/lib.rs`:
- Around line 75-79: The exec tracepoint handler emits events regardless of the
runtime gate; update handle_process_exec (and the other exec-emitting sites
noted) to check lifecycle_tracking_enabled() (which reads
LIFECYCLE_TRACKING_MAP) before allocating or writing exec notifications so
events are only produced when lifecycle tracking is enabled; bail out early from
handle_process_exec and the same exec emission paths when
lifecycle_tracking_enabled() returns false to avoid host-wide ring-buffer
pressure and cache churn.
In `@profile-bee/src/process_metadata.rs`:
- Around line 137-154: When a cached PID is found in get_or_load, current_start
== 0 (ProcessMetadata::current_start_time returning 0) must be treated as
"process gone" instead of returning stale data; change the branch in get_or_load
so that when self.cache.contains_key(&pid) and current_start == 0 you remove the
cached entry from self.cache and return None (rather than falling through to the
existing else that returns the stale entry), keeping the existing behavior for
the PID-reuse case when current_start != cached_start (remove and reload).
---
Nitpick comments:
In `@profile-bee/src/process_metadata.rs`:
- Around line 156-163: In get_or_load(), instead of immediately returning None
when self.cache.len() >= self.max_entries, perform a cheap opportunistic prune
of dead/stale entries from self.cache (e.g., remove entries whose metadata
indicates the process exited or whose last_seen timestamp is older than a short
threshold), then re-check capacity and proceed to insert; if the cache remains
full after pruning, emit the existing warn and return None. Update logic around
the existing capacity check in get_or_load() (the branch that uses
self.max_entries) so the prune step runs before the hard-cap rejection; keep
pruning cheap and non-blocking to avoid latency regressions (this complements
ProcessMetadataCache::new(4096) usage in event_loop.rs).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c6253d12-c345-4eb9-ad2a-8f15f4474075
⛔ Files ignored due to path filters (1)
profile-bee/ebpf-bin/profile-bee.bpf.ois excluded by!**/*.o
📒 Files selected for processing (2)
profile-bee-ebpf/src/lib.rsprofile-bee/src/process_metadata.rs
| /// Whether process lifecycle tracking is enabled (0 = disabled, 1 = enabled). | ||
| /// When enabled, exit events fire for ALL process exits (not just DWARF-tracked | ||
| /// or monitored PIDs), and the exec tracepoint sends exec events. | ||
| #[map(name = "lifecycle_tracking_map")] | ||
| static LIFECYCLE_TRACKING_MAP: Array<u32> = Array::with_max_entries(1, 0); |
There was a problem hiding this comment.
Gate exec notifications on lifecycle_tracking_enabled().
LIFECYCLE_TRACKING_MAP is documented as the runtime switch for exec notifications, but handle_process_exec() ignores it and emits on every attached exec. That creates avoidable host-wide ring-buffer pressure and cache churn when lifecycle tracking is off.
Suggested guard
pub unsafe fn handle_process_exec<C: EbpfContext>(ctx: C) {
use profile_bee_common::ProcessExecEvent;
+ if !lifecycle_tracking_enabled() {
+ return;
+ }
+
let tgid = ctx.tgid();
if let Some(mut entry) = RING_BUF_PROCESS_EXEC.reserve::<ProcessExecEvent>(0) {
let exec_event = ProcessExecEvent { pid: tgid, _pad: 0 };
let _writable = entry.write(exec_event);Also applies to: 1448-1457
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@profile-bee-ebpf/src/lib.rs` around lines 75 - 79, The exec tracepoint
handler emits events regardless of the runtime gate; update handle_process_exec
(and the other exec-emitting sites noted) to check lifecycle_tracking_enabled()
(which reads LIFECYCLE_TRACKING_MAP) before allocating or writing exec
notifications so events are only produced when lifecycle tracking is enabled;
bail out early from handle_process_exec and the same exec emission paths when
lifecycle_tracking_enabled() returns false to avoid host-wide ring-buffer
pressure and cache churn.
| let tid = ctx.pid(); | ||
| let tgid = ctx.tgid(); | ||
|
|
||
| // Skip thread exits — only fire for the main thread (process exit). | ||
| if tid != tgid { | ||
| return; | ||
| } |
There was a problem hiding this comment.
tid == tgid is not a reliable “process is gone” check.
The thread-group leader can exit via pthread_exit() while other threads in the same tgid keep running. This branch would emit a ProcessExitEvent early, which can stop --pid profiling and evict DWARF/process metadata for a still-live process, while the real last-thread exit may be filtered out later. Please key this off actual thread-group teardown, not leader-thread exit.
get_or_load() returned stale cached entries when current_start_time() returned 0 (process exited, /proc/[pid]/stat unreadable). The condition `current_start != 0 && current_start != cached_start` evaluated to false, falling through to the else branch that returned the stale entry. Now current_start == 0 is handled explicitly: remove the stale entry and return None, since the process no longer exists.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
profile-bee/src/process_metadata.rs (1)
161-167: Don't warn on every miss after saturation.Once the cache is full, every uncached PID will emit a warning. On a workload with lots of short-lived processes, that can turn into log spam from the sampling path. Consider logging once per “full” state and using a counter/metric for the rest.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@profile-bee/src/process_metadata.rs`:
- Around line 137-159: The public getter get() currently skips the PID/exit
revalidation that get_or_load() performs, allowing callers to observe stale
cmdline/exe/environ; either rename get() to get_cached_unchecked() (and update
callers) to make unchecked semantics explicit, or modify get() to perform the
same validation as get_or_load() (reuse ProcessMetadata::current_start_time(pid)
and the cache removal/reload logic) and then forward to get_or_load() or return
the validated cached entry; apply the same change for the analogous method
referenced around lines 177-180 so all public accessors either validate or are
explicitly named as unchecked.
- Around line 34-35: The derived Debug on ProcessMetadata can leak secrets via
the environ and cmdline fields; remove Debug from the derive on the
ProcessMetadata struct and add a manual impl of std::fmt::Debug for
ProcessMetadata that redacts or omits sensitive content (e.g., print
placeholders or counts like "<redacted environ (N entries)>" and "<redacted
cmdline (N args)>" while still showing non-sensitive fields), retain
#[derive(Clone)] for Clone, and implement the Debug impl using fmt::DebugStruct
(or fmt::Formatter) to format the other fields safely.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b1d98863-9c8f-4497-9a10-d2bbaf08d092
📒 Files selected for processing (1)
profile-bee/src/process_metadata.rs
| pub fn get_or_load(&mut self, pid: u32) -> Option<&ProcessMetadata> { | ||
| if self.cache.contains_key(&pid) { | ||
| // Validate against PID reuse and process exit: compare the cached | ||
| // start_time against the current /proc/[pid]/stat starttime. | ||
| let cached_start = self.cache[&pid].start_time; | ||
| let current_start = ProcessMetadata::current_start_time(pid); | ||
| if current_start == 0 { | ||
| // Process is gone (/proc/[pid]/stat unreadable) — remove stale entry. | ||
| self.cache.remove(&pid); | ||
| return None; | ||
| } | ||
| if current_start != cached_start { | ||
| tracing::debug!( | ||
| "PID {} recycled (starttime {} -> {}), reloading metadata", | ||
| pid, | ||
| cached_start, | ||
| current_start, | ||
| ); | ||
| self.cache.remove(&pid); | ||
| // Fall through to reload below | ||
| } else { | ||
| return self.cache.get(&pid); | ||
| } |
There was a problem hiding this comment.
Make the unchecked get() semantics explicit.
get_or_load() does the PID/exit revalidation, but get() skips it entirely. Because the validated path still reopens /proc/[pid]/stat on cache hits, callers on a hot path are likely to reach for get() and can then read stale cmdline/exe/environ after a missed lifecycle event. I'd either rename this to something like get_cached_unchecked() or make the public getter validated as well.
Also applies to: 177-180
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@profile-bee/src/process_metadata.rs` around lines 137 - 159, The public
getter get() currently skips the PID/exit revalidation that get_or_load()
performs, allowing callers to observe stale cmdline/exe/environ; either rename
get() to get_cached_unchecked() (and update callers) to make unchecked semantics
explicit, or modify get() to perform the same validation as get_or_load() (reuse
ProcessMetadata::current_start_time(pid) and the cache removal/reload logic) and
then forward to get_or_load() or return the validated cached entry; apply the
same change for the analogous method referenced around lines 177-180 so all
public accessors either validate or are explicitly named as unchecked.
Replace derived Debug with manual impl that shows entry counts instead
of values ('<N args>', '<N vars>') to prevent secrets like
AWS_SECRET_ACCESS_KEY from leaking into logs via {:?} formatting.
Add eBPF-driven process exec/exit detection and a ProcessMetadataCache to profile-bee's library API, enabling external profiling agents to get process-aware profiling without polling /proc.
eBPF changes:
Userspace:
Benefits for profile-bee core:
97 tests pass (9 new for ProcessMetadataCache), clippy -D warnings clean.
Summary by CodeRabbit
New Features
Documentation
Tests