Conversation
…olization Add unsymbolized output mode that saves raw instruction pointer addresses and /proc/<pid>/maps snapshots to a self-contained .raw file. This enables offline/post-hoc symbolization workflows where capture needs to be fast and symbol resolution happens later with better data (debug symbols, V8 introspection, updated perf-map files). - TraceHandler::get_raw_addresses() extracts raw IPs without symbolization - ProfilingEventLoop::collect_unsymbolized() skips blazesym entirely - RawCollapseSink writes raw collapse format with mappings header - OutputFormat::Raw + .raw extension inferred by -o flag - Raw-only mode uses fast unsymbolized path; mixed mode (e.g. -o flame.svg -o profile.raw) falls back to symbolized collection for live sinks
Add a subcommand that reads .raw capture files and re-symbolizes addresses via blazesym, producing standard output formats. This completes the offline symbolization workflow: sudo probee -o profile.raw -- ./myapp # capture (fast, no symbols) probee symbolize profile.raw -o flame.svg # resolve later - New symbolize module with raw collapse parser and re-symbolizer - Handles mappings header, kernel (_k suffix) and user addresses - Routes output through existing OutputSink pipeline (svg/html/json/etc) - Falls back to stdout in collapse format if no -o flags given - 6 unit tests covering parser edge cases
Add end-to-end V8 heap introspection that resolves JavaScript function names without requiring --perf-basic-prof or any Node.js flags. Inspired by the OpenTelemetry eBPF profiler's V8 tracer. eBPF side: - New v8_proc_info HashMap map keyed by tgid - copy_stack_regs re-walks the FP chain to read V8 FP context slots ([fp - 64 + fp_function]) and extract JSFunction → SharedFunctionInfo tagged pointers via try_read_v8_sfi() - SFI pointers stored in FramePointers.v8_sfi[] parallel to IPs Userspace side: - v8/introspection.rs: reads v8dbg_* ELF symbols from Node.js binary to discover heap object layouts, with version-specific fallbacks (V8 8.1+) - v8/heap.rs: reads V8 objects via process_vm_readv — chases JSFunction → SFI → name_or_scope_info → function name and SFI → script → Script.name → source file. Handles Seq/Cons/Thin strings. - v8/types.rs: tagged pointer ops, V8IntrospectionData, FP offset mapping Integration: - event_loop: try_setup_v8_for_pid() auto-detects Node.js on first sample, reads ELF introspection, loads V8ProcInfo into eBPF, registers V8HeapReader - trace_handler: for frames with v8_sfi[i] != 0, overrides blazesym's [unknown] with resolved JS function name + source file from heap reader - ebpf.rs: V8ProcInfoPod wrapper, load/remove methods - Updated prebuilt eBPF binary
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 39 minutes and 45 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (2)
📒 Files selected for processing (14)
📝 WalkthroughWalkthroughAdds V8/Node.js introspection and heap-based symbolization: eBPF collects per-frame V8 SharedFunctionInfo tagged pointers and a per-process Changes
Sequence DiagramsequenceDiagram
participant CLI as CLI/Main
participant Prof as Userspace Profiler
participant eBPF as eBPF (kernel)
participant ELF as ELF Reader
participant Heap as V8HeapReader
participant TH as TraceHandler
CLI->>Prof: try_setup_v8_for_pid(tgid)
Prof->>ELF: read_introspection_data(elf_bytes)
ELF-->>Prof: V8IntrospectionData
Prof->>Prof: to_proc_info() -> V8ProcInfo
Prof->>eBPF: load_v8_proc_info(tgid, V8ProcInfo)
eBPF->>eBPF: collect_stack_frames()
eBPF->>eBPF: try_read_v8_sfi(V8ProcInfo, frame_bp)
eBPF-->>Prof: FramePointers { pointers[], v8_sfi[] }
Prof->>TH: get_exp_stacked_frames(...)
TH->>Heap: resolve_sfi(sfi_tagged)
Heap->>Heap: process_vm_readv -> parse SFI & strings
Heap-->>TH: V8Symbol {function_name, source_file}
TH-->>Prof: symbolized_frames (apply V8 overrides)
Prof->>CLI: emit outputs / write raw file
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
profile-bee-ebpf/src/lib.rs (1)
1199-1247:⚠️ Potential issue | 🔴 CriticalMake the V8 FP walk verifier-safe and clear stale
v8_sfi.This introduces dynamic loop bounds in eBPF (
1..len,1..pointers.len()), which can fail verifier constraints. Also,v8_sfiis reused from per-CPU storage but never cleared, so failed/non-V8 frames can inherit stale SFI pointers. Finally, reading fromregs.rbpcorresponds to frame0, not1, so the current frame’s JS metadata is shifted by one.🔧 Suggested fix
unsafe fn copy_stack_regs(regs: &pt_regs, pointer: &mut FramePointers) -> (u64, u64, usize, u64) { let (ip, bp, len, sp) = copy_stack_regs_fp_only(regs, &mut pointer.pointers); pointer.len = len; + + for i in 0..MAX_V8_FRAMES { + pointer.v8_sfi[i] = 0; + } // Check if this is a V8 process — look up V8ProcInfo by tgid. @@ if let Some(vi) = V8_PROC_INFO.get(&tgid) { // Re-walk the frame chain to read V8 FP context for each frame. // We re-read bp from regs since copy_stack_regs_fp_only consumed it. let mut frame_bp = regs.rbp; - for i in 1..len { - if i >= MAX_V8_FRAMES || frame_bp == 0 { + for i in 0..MAX_V8_FRAMES { + if i >= len || frame_bp == 0 { break; } // Read V8 SFI from this frame's FP context if let Some(sfi) = try_read_v8_sfi(vi, frame_bp) { pointer.v8_sfi[i] = sfi; } // Advance to next frame (just read the saved bp, don't need the IP) - frame_bp = bpf_probe_read(frame_bp as *const u8 as _).unwrap_or(0); + frame_bp = bpf_probe_read_user(frame_bp as *const u64).unwrap_or(0); } } @@ - for i in 1..pointers.len() { + for i in 1..1024 { + if i >= pointers.len() { + break; + } let Some(ret_addr) = get_frame(&mut bp) else { len = i; break;As per coding guidelines, All loops in eBPF code MUST be bounded with compile-time constants (e.g.,
for _ in 0..CONST). Do not usewhileloops or dynamic bounds.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@profile-bee-ebpf/src/lib.rs` around lines 1199 - 1247, The V8 FP walk uses dynamic loop bounds and never clears per-CPU pointer.v8_sfi, and it reads regs.rbp as frame0 causing an off-by-one; fix copy_stack_regs and copy_stack_regs_fp_only to use compile-time bounded loops (e.g., for i in 1..MAX_STACK_FRAMES or for i in 0..MAX_V8_FRAMES) that break when get_frame/conditions indicate end, explicitly zero-initialize pointer.v8_sfi before the V8 walk, start the V8 extraction at the correct frame by using the saved BP corresponding to frame1 (adjust how frame_bp is initialized relative to regs.rbp), and replace dynamic reads with safe bpf_probe_read checks (use try_read_v8_sfi and get_frame return paths) so the verifier accepts the code and stale SFI entries cannot be inherited.profile-bee/src/event_loop.rs (1)
393-431:⚠️ Potential issue | 🟠 MajorClear and reload V8 state on process lifecycle changes.
On exit,
known_tgidsis cleared but the V8 reader andv8_proc_infomap entry are left behind. On exec,known_tgidsis removed and immediately reinserted, which prevents the nextStackInfofrom callingtry_setup_v8_for_pid()becauseinsert()will returnfalse. This can leave stale V8 offsets/readers after exec or PID reuse.🛡️ Suggested lifecycle fix
PROCESS_EVENT_EXIT => { if let Some(tx) = &self.tgid_request_tx { let _ = tx.send(DwarfThreadMsg::ProcessExited(event.pid)); } self.known_tgids.remove(&event.pid); + self.trace_handler.invalidate_caches_for_pid(event.pid); + if let Err(e) = crate::ebpf::remove_v8_proc_info(&mut self.bpf, event.pid) { + tracing::debug!("failed to remove V8ProcInfo for pid {}: {}", event.pid, e); + } // Defer metadata cache eviction until after build_raw_stacks()// The PID is still alive but with a new binary image. - // Re-add to known_tgids so DWARF tables are reloaded. + // Let the next StackInfo re-run DWARF/V8 setup for the new image. self.known_tgids.remove(&event.pid); - self.known_tgids.insert(event.pid); // Invalidate symbol caches for this PID self.trace_handler.invalidate_caches_for_pid(event.pid); + if let Err(e) = crate::ebpf::remove_v8_proc_info(&mut self.bpf, event.pid) { + tracing::debug!("failed to remove V8ProcInfo for pid {}: {}", event.pid, e); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@profile-bee/src/event_loop.rs` around lines 393 - 431, On PROCESS_EVENT_EXIT and PROCESS_EVENT_EXEC handlers in the PerfWork::ProcessEvent match, ensure V8 state is explicitly cleared: remove any v8_proc_info entry for event.pid and close/drop its V8 reader before touching known_tgids so no stale reader/offsets remain; for exec, do NOT immediately reinsert into known_tgids (remove and let try_setup_v8_for_pid perform the insert when it reinitializes V8), and keep the existing calls to trace_handler.invalidate_caches_for_pid and process_metadata.invalidate(event.pid) — update the handlers to call the v8 cleanup (v8_proc_info.remove(...) and reader shutdown) and avoid the immediate known_tgids.insert(event.pid) that prevents try_setup_v8_for_pid from running.
🧹 Nitpick comments (1)
profile-bee/src/ebpf.rs (1)
563-569: Consider lowering log level for V8 load events.
tracing::info!fires on every Node.js PID load and then on every subsequent exec/fork in a busy system (e.g., a Node-based service farm), which can easily flood logs. Most other per-process ebpf-map loads in this file already usetracing::info!for one-shot setup, but this one fires per PID. Considertracing::debug!, or at minimum guard against duplicate loads.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@profile-bee/src/ebpf.rs` around lines 563 - 569, The V8 load log currently uses tracing::info! and fires per PID (the tracing::info! call that logs "loaded V8ProcInfo for pid {}" with tgid, info.version, info.type_jsfunction_first, info.type_jsfunction_last) which can flood logs; change this call to tracing::debug! so it only appears at debug level, or alternatively add a lightweight dedupe guard (e.g., track seen pids in a HashSet or check an existing map keyed by tgid before logging) so the message is emitted only once per PID load.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@profile-bee-common/src/lib.rs`:
- Around line 39-50: The v8_sfi array is limited to MAX_V8_FRAMES = 64 while
pointers is 1024, causing silent truncation in profile-bee-ebpf::copy_stack_regs
(the loop breaks at i >= MAX_V8_FRAMES) so deep V8 JS frames are never
symbolized; either (preferred) add a PerCpu counter (e.g.,
v8_frames_truncated_percpu) exported from the eBPF side and increment it inside
copy_stack_regs when i >= MAX_V8_FRAMES to make truncation observable, or (at
minimum) update the doc-comment for the v8_sfi field to explicitly warn about
the fixed 64-frame cutoff and refer to copy_stack_regs and MAX_V8_FRAMES so
future maintainers know coverage is limited.
In `@profile-bee/bin/profile-bee.rs`:
- Around line 340-346: The interception of "symbolize" using std::env::args()
before clap parsing causes flags like --help/--version to be treated as unknown
by run_symbolize_subcommand; update the pre-clap handling so that when args[1]
== "symbolize" you check for "-h"/"--help" and "--version" and print the
symbolizer usage/version (or return success) instead of calling
run_symbolize_subcommand, or preferably register "symbolize" as a proper clap
subcommand (derive Subcommand on your Opt enum and move symbolize handling into
the clap-driven dispatch) so the main clap parser supplies normal help/version
behavior; refer to args and run_symbolize_subcommand (or the Opt Subcommand
variant) to locate where to implement the change.
- Around line 790-805: When both a raw sink and symbolized sinks are present the
current path only runs the symbolized pipeline (event_loop.collect and
sink.write_batch) so raw_sink.add_samples is never called and raw.write ends up
empty; either detect this combination early and return an error message
instructing users to use `.raw` alone (check raw_sink.is_some() &&
has_symbolized_sinks) or, after calling event_loop.collect and before raw.write,
pass the collected samples into the raw sink (call
raw_sink.add_samples(&result.stacks) or the appropriate add/append method) and
set its duration/finish so the .raw file contains the same captured samples as
the symbolized output.
In `@profile-bee/src/event_loop.rs`:
- Around line 197-204: collect_unsymbolized currently calls drain_events which
still eagerly calls get_exp_stacked_frames for StackInfo events; change
drain_events to accept a flag (e.g., symbolization: bool or
perform_stack_expansion: bool) and short-circuit any calls to
get_exp_stacked_frames (and related V8 heap-reader work) when that flag is
false, then call drain_events(false/perform_stack_expansion = false) from
collect_unsymbolized so build_raw_address_samples runs on raw addresses only;
update signatures for drain_events (and any callers) and ensure StackInfo
handling checks the new flag before invoking get_exp_stacked_frames.
In `@profile-bee/src/output.rs`:
- Around line 406-414: The mappings header order is unstable because
self.mappings_cache is a HashMap; make iteration deterministic by iterating over
tgid keys in sorted order before emitting the "# mappings:<tgid>" blocks (e.g.,
collect the keys from self.mappings_cache, sort them (numeric order for tgid),
then for each sorted tgid lookup maps and write the block); reference symbols:
self.mappings_cache, tgid, maps, and the mapping-emission loop in output.rs so
the output order becomes stable and reproducible.
- Around line 479-484: snapshot_perf_map is an unused public helper; either wire
it into the raw sink/output so perf-maps are available at symbolize time or
remove it. To fix, call snapshot_perf_map(tgid) from
RawCollapseSink::add_samples (or the code path that writes the raw file) and,
when it returns Some, write the perf-map text into the raw output adjacent to
the existing "# mappings:<tgid>" block using a clear marker (e.g., "#
perfmap:<tgid>") so symbolize_raw_file can detect it; then update
symbolize_raw_file to read that perfmap block and pass it into blazesym's
Process symbol ingestion path so JIT frames resolve, or if you prefer to remove
unused code delete snapshot_perf_map and any related references.
- Around line 416-421: The aggregation key currently built by format_sample_key
omits sample.tgid, causing samples from different PIDs with the same cmd to
collide; update format_sample_key to include sample.tgid (e.g., prefix or
separate field) and adjust the aggregation logic that fills aggregated in
output.rs to use that new key so entries are
per-(tgid,cmd,kernel_addrs,user_addrs); also ensure the emitted output includes
the tgid as a column or metadata so symbolize_raw_file can choose the correct
Process(Pid) (symbolize_raw_file in profile-bee/src/symbolize.rs) instead of
relying on pids.first().
In `@profile-bee/src/symbolize.rs`:
- Around line 60-64: The code currently reuses target_pid =
pids.first().copied() for all samples causing cross-process/mis-symbolization;
update symbolize_sample logic to pick the PID that matches each sample's process
by (1) having the writer emit tgid/pid per sample (adjust
RawCollapseSink/format_sample_key to include tgid) and then selecting
Source::Process(Pid) from that emitted tgid, or if changing the writer is not
possible immediately, implement a best-effort match by comparing sample.cmd to
the first mapping line for each PID (use the mappings header associated with
each pid) and choose the PID whose mapping first-line matches sample.cmd; also
update the doc-comment on the function in symbolize.rs to document the
limitation and chosen behavior.
- Around line 139-150: The code currently drops mapping blocks silently because
current_mapping_pid is taken before validating pid_str parse/equality and parse
uses unwrap_or(0); modify the end_mappings handling around
trimmed.strip_prefix("end_mappings:") so you first attempt to parse pid_str with
proper error handling (e.g., match trimmed.parse::<u32>() and emit
tracing::warn! on parse failure), only call current_mapping_pid.take() after
confirming a successful parse and matching PID, and emit a tracing::warn! if
parsed PID != current_mapping_pid to retain/move current_mapping_lines back into
state instead of losing them; update uses of mappings and ProcMappings
accordingly so malformed or mismatched blocks are preserved or logged rather
than silently discarded.
- Around line 65-115: The symbolized frame order is being reversed incorrectly:
remove the .rev() calls so symbols and fallback addresses are appended in the
same root-to-leaf collapse order already present in sample.kernel_addrs and
sample.user_addrs. Specifically, in symbolize.rs inside the loop over samples,
stop calling .rev() on syms.into_iter().rev() (kernel and user success paths)
and on sample.kernel_addrs.iter().rev() and sample.user_addrs.iter().rev()
(kernel and user error/fallback paths) so format_symbolized(...) and the address
fallbacks push frames in [root, …, leaf] order matching
RawCollapseSink::format_sample_key and parse_sample_line.
In `@profile-bee/src/trace_handler.rs`:
- Around line 396-421: get_raw_addresses currently omits the V8 SFI metadata
(pointers.v8_sfi) unlike get_exp_stacked_frames, which means the produced .raw
files cannot carry V8 JS frame names; either include the v8_sfi column into this
code path or clearly document the limitation. Update get_raw_addresses to read
pointers.v8_sfi from the stacked_pointers lookup (StackInfoPod key / pointers
variable) and emit a parallel v8_sfi vector alongside user_stack (or extend the
return type), or alternatively add a doc-comment on get_raw_addresses explaining
that V8 JS frames will degrade to [unknown]/hex for the .raw pipeline and
recommend using the other path that retains v8_sfi. Ensure references to
get_exp_stacked_frames and pointers.v8_sfi are mentioned so reviewers can find
the analogous logic to copy.
- Around line 516-542: The V8 symbol override can use a stale V8HeapReader after
PID reuse; update format_stack_trace so it verifies the cached reader still
matches the sampled process before calling V8HeapReader::resolve_sfi.
Specifically, instead of blindly calling self.v8_readers.get_mut(&pid) and
resolving SFIs, compare a stable identity (e.g., process start_time /
proc_start_time, or the current /proc/<pid>/exe path) from the sample against a
saved identity field on the cached reader (add a reader.executable_path or
reader.start_time when the reader is created, or store readers keyed by
(pid,start_time)); if they don’t match, do not call resolve_sfi and skip
overriding symbols for that sample (or drop/refresh the reader). Ensure the
check happens before any call to V8HeapReader::resolve_sfi or process_vm_readv.
In `@profile-bee/src/v8/heap.rs`:
- Around line 222-240: The ConsString flatten path (when rep_tag ==
self.data.cons_string_tag) can produce arbitrarily large concatenations; add a
total output cap (e.g. MAX_CONSSTRING_BYTES = 16 * 1024) and enforce it before
returning the concatenated result in that branch: after obtaining `first` and
`second` (from `read_ptr` + `extract_string` using
`off_cons_string_first`/`off_cons_string_second` and `types::untag_ptr`), if
`first.len() + second.len()` exceeds the cap trim the combined output to the cap
(keep as much of `first` as fits, then append a prefix of `second` up to the
remaining allowance) and return that truncated string; ensure the cap constant
is used here so recursive calls to `extract_string` remain bounded by the
overall limit.
In `@profile-bee/src/v8/introspection.rs`:
- Around line 419-435: In read_u32_at_vaddr, the range checks use unchecked
arithmetic (vaddr + 4 and sec_addr + sec_size) which can overflow on malformed
ELF metadata; update the containment logic to use checked arithmetic (e.g.,
checked_add/checked_sub) so any overflow yields None and prevents invalid
comparisons, compute offset via checked_sub on vaddr and sec_addr then convert
to usize only after success, and use checked_add when testing offset + 4 against
data.len() to avoid usize overflow before reading the four bytes.
---
Outside diff comments:
In `@profile-bee-ebpf/src/lib.rs`:
- Around line 1199-1247: The V8 FP walk uses dynamic loop bounds and never
clears per-CPU pointer.v8_sfi, and it reads regs.rbp as frame0 causing an
off-by-one; fix copy_stack_regs and copy_stack_regs_fp_only to use compile-time
bounded loops (e.g., for i in 1..MAX_STACK_FRAMES or for i in 0..MAX_V8_FRAMES)
that break when get_frame/conditions indicate end, explicitly zero-initialize
pointer.v8_sfi before the V8 walk, start the V8 extraction at the correct frame
by using the saved BP corresponding to frame1 (adjust how frame_bp is
initialized relative to regs.rbp), and replace dynamic reads with safe
bpf_probe_read checks (use try_read_v8_sfi and get_frame return paths) so the
verifier accepts the code and stale SFI entries cannot be inherited.
In `@profile-bee/src/event_loop.rs`:
- Around line 393-431: On PROCESS_EVENT_EXIT and PROCESS_EVENT_EXEC handlers in
the PerfWork::ProcessEvent match, ensure V8 state is explicitly cleared: remove
any v8_proc_info entry for event.pid and close/drop its V8 reader before
touching known_tgids so no stale reader/offsets remain; for exec, do NOT
immediately reinsert into known_tgids (remove and let try_setup_v8_for_pid
perform the insert when it reinitializes V8), and keep the existing calls to
trace_handler.invalidate_caches_for_pid and
process_metadata.invalidate(event.pid) — update the handlers to call the v8
cleanup (v8_proc_info.remove(...) and reader shutdown) and avoid the immediate
known_tgids.insert(event.pid) that prevents try_setup_v8_for_pid from running.
---
Nitpick comments:
In `@profile-bee/src/ebpf.rs`:
- Around line 563-569: The V8 load log currently uses tracing::info! and fires
per PID (the tracing::info! call that logs "loaded V8ProcInfo for pid {}" with
tgid, info.version, info.type_jsfunction_first, info.type_jsfunction_last) which
can flood logs; change this call to tracing::debug! so it only appears at debug
level, or alternatively add a lightweight dedupe guard (e.g., track seen pids in
a HashSet or check an existing map keyed by tgid before logging) so the message
is emitted only once per PID load.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e3e45ea8-8a86-41d8-bbb0-317b1522cbae
⛔ Files ignored due to path filters (1)
profile-bee/ebpf-bin/profile-bee.bpf.ois excluded by!**/*.o
📒 Files selected for processing (13)
profile-bee-common/src/lib.rsprofile-bee-ebpf/src/lib.rsprofile-bee/bin/profile-bee.rsprofile-bee/src/ebpf.rsprofile-bee/src/event_loop.rsprofile-bee/src/lib.rsprofile-bee/src/output.rsprofile-bee/src/symbolize.rsprofile-bee/src/trace_handler.rsprofile-bee/src/v8/heap.rsprofile-bee/src/v8/introspection.rsprofile-bee/src/v8/mod.rsprofile-bee/src/v8/types.rs
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
profile-bee/src/event_loop.rs (1)
399-441:⚠️ Potential issue | 🟠 MajorRemove stale
v8_proc_infoeBPF map entries on process exit and exec.The
remove_v8_proc_info()function exists inebpf.rsbut is never invoked. Both thePROCESS_EVENT_EXITandPROCESS_EVENT_EXEChandlers callinvalidate_caches_for_pid(event.pid)(which clears only userspacev8_readers), but neither removes the eBPF-sidev8_proc_infoentry. When a process exits or execs and the PID is reused by a non-Node process,try_setup_v8_for_pidearly-returns without populating a new entry. The eBPF FP walker then reads stale V8 offsets and applies them to unrelated memory in the new process, producing incorrect SFI pointers.Add
remove_v8_proc_info(event.pid)to both lifecycle handlers:Example fix
PROCESS_EVENT_EXIT => { if let Some(tx) = &self.tgid_request_tx { let _ = tx.send(DwarfThreadMsg::ProcessExited(event.pid)); } self.known_tgids.remove(&event.pid); - // Clean up V8 state for the exiting process to prevent - // stale readers from being used if the PID is reused. self.trace_handler.invalidate_caches_for_pid(event.pid); + let _ = self.bpf.remove_v8_proc_info(event.pid);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@profile-bee/src/event_loop.rs` around lines 399 - 441, The eBPF-side v8_proc_info map isn't being cleared on process exit or exec, allowing stale entries to be used by try_setup_v8_for_pid and the FP walker; call remove_v8_proc_info(event.pid) in both the PROCESS_EVENT_EXIT and PROCESS_EVENT_EXEC handlers alongside trace_handler.invalidate_caches_for_pid(event.pid) (and process_metadata.invalidate if present) so the eBPF map is removed when a PID exits or execs, preventing stale V8 offsets from being applied to a reused PID.
♻️ Duplicate comments (1)
profile-bee/bin/profile-bee.rs (1)
815-848:⚠️ Potential issue | 🟡 MinorFail fast instead of writing a known-empty
.rawfile.This branch warns that mixed raw/symbolized output is unsupported, but still writes an empty
.rawat Line 848. Return an error here, or setraw_sink = Noneafter warning so users don’t get a successful but unusable capture.🐛 Proposed fix
} else if raw_sink.is_some() && has_symbolized_sinks { - // Mixed mode: both raw and symbolized outputs requested. - // The eBPF maps are consumed by the symbolized collection path, so - // the raw sink cannot extract addresses from the same collection pass. - // Warn the user and proceed with only the symbolized outputs. - eprintln!( - "Warning: .raw output combined with symbolized formats is not supported — \ - the .raw file will be empty. Use -o profile.raw alone for raw captures, \ - then re-symbolize with: probee symbolize profile.raw -o output.svg" - ); + anyhow::bail!( + ".raw output cannot be combined with symbolized formats yet; \ + use -o profile.raw alone, then re-symbolize with: \ + probee symbolize profile.raw -o output.svg" + ); - let result = event_loop.collect(&perf_rx, None); - sink.write_batch(&result.stacks)?; - sink.set_actual_duration_ms(started.elapsed().as_millis() as u64); - sink.finish(&result.stacks)?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@profile-bee/bin/profile-bee.rs` around lines 815 - 848, The mixed-mode branch currently warns but still leaves raw_sink intact so an empty .raw is written later; modify the branch that checks raw_sink.is_some() && has_symbolized_sinks to fail fast or neutralize the raw sink: either return an Err with a clear message (propagate the error from the surrounding function) or set raw_sink = None (mutate the Option) immediately after printing the warning so later code that calls raw.write() won't produce an empty file; locate and change the code in the branch that prints the warning (the raw_sink/has_symbolized_sinks block that calls event_loop.collect, sink.write_batch, sink.finish) to implement one of these two fixes referencing raw_sink.
🧹 Nitpick comments (1)
profile-bee/src/event_loop.rs (1)
232-293: Blocking full-binary ELF read runs on the event drain path.
try_setup_v8_for_pidis invoked synchronously from theStackInfobranch ofdrain_events(lines 366, 370). For Node.js binaries — typically 60–100 MB statically linked —std::fs::read(&exe_path)at line 251 pulls the whole file into aVec<u8>andread_introspection_datathen walks its ELF symbol table, all while the profiler is not draining the perf ring buffer. Under bursty workloads (e.g. a cluster of Node workers starting simultaneously), this window can cause ring-buffer overruns and lost samples.A few incremental improvements worth considering:
- Cache by canonical
exe_path, not justtgid— two Node workers of the same version don't need two ELF parses. Onlyregister_v8_readerand thev8_proc_infoinsert need to run per-tgid.- Memory-map the binary (
memmap2) instead ofstd::fs::readso ELF parsing touches only the symbol-table pages.- Offload detection/parsing to a background thread (similar in spirit to the existing
tgid_request_tx/ DWARF loader pattern at line 362) and install theV8ProcInfoPod+ reader when results arrive. The eBPF extractor already tolerates a missingv8_proc_infoentry, so a small delay before V8 symbols kick in is acceptable.Also, a negative result (
read_introspection_datareturnsNoneat line 259) is not remembered — the full read is retried only if the tgid is re-inserted intoknown_tgids, so today it's effectively once-per-PID, but a future refactor that re-triggers setup (e.g. on exec) should be careful not to re-read the binary on every retry.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@profile-bee/src/event_loop.rs` around lines 232 - 293, try_setup_v8_for_pid currently does a blocking full-file read (std::fs::read) and ELF parsing on the event drain path; change it to (1) canonicalize exe_path and maintain a shared cache keyed by canonical path (store successful V8ProcInfo data and negative results to avoid re-parsing), (2) memory-map the binary (memmap2) and pass the mmap bytes into crate::v8::read_introspection_data instead of Vec<u8>, and (3) move heavy work to a background worker using the existing tgid_request_tx / DWARF loader pattern: send a request for parsing, have the background thread open/mmap+parse the exe and, on success, insert into the eBPF map ("v8_proc_info") and call trace_handler.register_v8_reader(tgid, data) for each tgid that shares that canonical exe; ensure per-tgid bookkeeping (known_tgids) still triggers only the per-tgid registration step and that failed parses are cached to avoid repeated work.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@profile-bee-ebpf/src/lib.rs`:
- Around line 1199-1239: copy_stack_regs currently only zeroes pointer.v8_sfi
when V8_PROC_INFO.get(&tgid) hits, which leaves stale SFI values in the per-CPU
STORAGE when the lookup misses; unconditionally clear pointer.v8_sfi entries up
to min(pointer.len, MAX_V8_FRAMES) at the start of copy_stack_regs (before the
V8_PROC_INFO lookup) to prevent leaking previous samples, and likewise ensure
the same unconditional zeroing of v8_sfi is performed in the DWARF finalization
paths right before inserting a FramePointers into storage (references: function
copy_stack_regs, V8_PROC_INFO, pointer.v8_sfi, MAX_V8_FRAMES, and the code paths
that insert FramePointers during DWARF finalization).
In `@profile-bee/src/symbolize.rs`:
- Around line 52-63: The parsed mapping snapshots stored in mappings (from
parse_raw_collapse) are never used for symbol resolution, so Symbolizer
currently constructs symbol sources using live /proc via Source::Process (e.g.,
where Symbolizer::new / Source::Process are invoked), which breaks offline
re-symbolization for exited/reused PIDs; fix this by building symbol sources
from the captured mappings for each pid/frame instead of probing live
processes—use the mappings map and fallback_pid to create a Source variant (or a
new Source::Mappings-like constructor) that points symbolizer to the recorded
file paths and base addresses from mappings when resolving symbols, and only
fall back to Source::Process for truly live-symbolization scenarios or else
reject offline requests explicitly.
In `@profile-bee/src/v8/heap.rs`:
- Around line 249-262: The truncation at result.truncate(MAX_CONSSTRING_BYTES)
can panic if MAX_CONSSTRING_BYTES falls inside a multi-byte UTF-8 character; add
a helper (e.g., find_char_boundary or ensure_char_boundary) that takes a &str
and a byte limit and returns the largest valid char boundary ≤ limit, then use
that helper to compute a safe_end = find_char_boundary(&result,
MAX_CONSSTRING_BYTES) and call result.truncate(safe_end) instead of truncating
directly; reuse the same helper used when slicing `second` to ensure consistency
with how `first` and `second` are truncated.
---
Outside diff comments:
In `@profile-bee/src/event_loop.rs`:
- Around line 399-441: The eBPF-side v8_proc_info map isn't being cleared on
process exit or exec, allowing stale entries to be used by try_setup_v8_for_pid
and the FP walker; call remove_v8_proc_info(event.pid) in both the
PROCESS_EVENT_EXIT and PROCESS_EVENT_EXEC handlers alongside
trace_handler.invalidate_caches_for_pid(event.pid) (and
process_metadata.invalidate if present) so the eBPF map is removed when a PID
exits or execs, preventing stale V8 offsets from being applied to a reused PID.
---
Duplicate comments:
In `@profile-bee/bin/profile-bee.rs`:
- Around line 815-848: The mixed-mode branch currently warns but still leaves
raw_sink intact so an empty .raw is written later; modify the branch that checks
raw_sink.is_some() && has_symbolized_sinks to fail fast or neutralize the raw
sink: either return an Err with a clear message (propagate the error from the
surrounding function) or set raw_sink = None (mutate the Option) immediately
after printing the warning so later code that calls raw.write() won't produce an
empty file; locate and change the code in the branch that prints the warning
(the raw_sink/has_symbolized_sinks block that calls event_loop.collect,
sink.write_batch, sink.finish) to implement one of these two fixes referencing
raw_sink.
---
Nitpick comments:
In `@profile-bee/src/event_loop.rs`:
- Around line 232-293: try_setup_v8_for_pid currently does a blocking full-file
read (std::fs::read) and ELF parsing on the event drain path; change it to (1)
canonicalize exe_path and maintain a shared cache keyed by canonical path (store
successful V8ProcInfo data and negative results to avoid re-parsing), (2)
memory-map the binary (memmap2) and pass the mmap bytes into
crate::v8::read_introspection_data instead of Vec<u8>, and (3) move heavy work
to a background worker using the existing tgid_request_tx / DWARF loader
pattern: send a request for parsing, have the background thread open/mmap+parse
the exe and, on success, insert into the eBPF map ("v8_proc_info") and call
trace_handler.register_v8_reader(tgid, data) for each tgid that shares that
canonical exe; ensure per-tgid bookkeeping (known_tgids) still triggers only the
per-tgid registration step and that failed parses are cached to avoid repeated
work.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8a55fbee-b9e4-4b6d-bfbc-220a6baca3bc
📒 Files selected for processing (10)
profile-bee-common/src/lib.rsprofile-bee-ebpf/src/lib.rsprofile-bee/bin/profile-bee.rsprofile-bee/src/ebpf.rsprofile-bee/src/event_loop.rsprofile-bee/src/output.rsprofile-bee/src/symbolize.rsprofile-bee/src/trace_handler.rsprofile-bee/src/v8/heap.rsprofile-bee/src/v8/introspection.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- profile-bee/src/output.rs
- profile-bee/src/v8/introspection.rs
| let (mappings, samples) = parse_raw_collapse(&contents)?; | ||
|
|
||
| // Determine PIDs from mappings header | ||
| let pids: Vec<u32> = mappings.keys().copied().collect(); | ||
|
|
||
| let symbolizer = Symbolizer::new(); | ||
| let mut output = Vec::new(); | ||
|
|
||
| // Fallback PID for samples that don't embed a tgid in their root frame. | ||
| // Used only when the raw file was produced by an older version without | ||
| // per-sample PID tagging. | ||
| let fallback_pid = pids.first().copied(); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Confirm raw mapping snapshots are parsed but not consumed by symbolization.
rg -n -C3 '\bProcMappings\b|_lines|Source::Process|Source::Kernel|symbolize\(' profile-bee/src/symbolize.rsRepository: zz85/profile-bee
Length of output: 3474
Use captured mappings for offline symbolization.
Mapping snapshots are parsed into _lines but never used; symbolization resolves against the current live PID via Source::Process (lines 99-100). For offline re-symbolization of exited/reused PIDs, this causes symbolization to fail or resolve to wrong binaries. Construct symbol sources from the captured mappings instead of reading live /proc/[pid]/maps, or explicitly document/reject offline symbolization.
Also applies to: 99-100
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@profile-bee/src/symbolize.rs` around lines 52 - 63, The parsed mapping
snapshots stored in mappings (from parse_raw_collapse) are never used for symbol
resolution, so Symbolizer currently constructs symbol sources using live /proc
via Source::Process (e.g., where Symbolizer::new / Source::Process are invoked),
which breaks offline re-symbolization for exited/reused PIDs; fix this by
building symbol sources from the captured mappings for each pid/frame instead of
probing live processes—use the mappings map and fallback_pid to create a Source
variant (or a new Source::Mappings-like constructor) that points symbolizer to
the recorded file paths and base addresses from mappings when resolving symbols,
and only fall back to Source::Process for truly live-symbolization scenarios or
else reject offline requests explicitly.
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
profile-bee-ebpf/src/lib.rs (1)
1394-1412:⚠️ Potential issue | 🔴 CriticalUse a compile-time loop bound in the eBPF FP fallback.
for i in 1..pointers.len()is a dynamic-bound loop from the verifier’s perspective becausepointersis a slice. Keep the bound constant and break on the slice length instead.🛡️ Proposed fix
- for i in 1..pointers.len() { + for i in 1..LEGACY_MAX_DWARF_STACK_DEPTH { + if i >= pointers.len() { + break; + } let Some(ret_addr) = get_frame(&mut bp) else { len = i; break; };As per coding guidelines,
profile-bee-ebpf/src/**/*.rs: All loops in eBPF code MUST be bounded with compile-time constants.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@profile-bee-ebpf/src/lib.rs` around lines 1394 - 1412, The loop in copy_stack_regs_fp_only uses a dynamic bound (for i in 1..pointers.len()) which the eBPF verifier rejects; replace it with a compile-time constant loop bound (e.g. for i in 1..MAX_FRAMES where MAX_FRAMES is a const equal to the maximum pointers buffer size) and inside the loop check if i >= pointers.len() then break; call get_frame(&mut bp) each iteration as before and set pointers[i] and update len when get_frame returns None. Ensure the const name matches the actual buffer size used for pointers and update any related uses of len, bp, ip, sp accordingly (function: copy_stack_regs_fp_only; symbols: pointers, get_frame, bp, len).profile-bee/bin/profile-bee.rs (1)
1262-1266:⚠️ Potential issue | 🟠 MajorMirror V8 setup in the TUI profiling path.
The non-TUI event loop detects new Node.js TGIDs and loads
v8_proc_info, but this TUI thread only requests DWARF loading. TUI/combined mode will invalidate V8 caches on exit/exec, yet never install the eBPF V8 metadata or userspace heap reader in the first place.🔧 Suggested direction
if let Some(tx) = &tgid_request_tx { if stack.tgid != 0 && known_tgids.insert(stack.tgid) { let _ = tx.send(DwarfThreadMsg::LoadProcess(stack.tgid)); + // Also run the shared Node/V8 setup helper here: + // - detect /proc/<pid>/exe + // - load V8ProcInfo into `bpf.map_mut("v8_proc_info")` + // - register the V8 heap reader on `profiler` } }Factor
try_setup_v8_for_pidout ofProfilingEventLoopso both collection paths use the same implementation.Also applies to: 1308-1328
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@profile-bee/bin/profile-bee.rs` around lines 1262 - 1266, When handling PerfWork::StackInfo in the TUI path (the block using tgid_request_tx and sending DwarfThreadMsg::LoadProcess), you need to mirror the non-TUI V8 setup: factor the existing try_setup_v8_for_pid logic out of ProfilingEventLoop into a shared function and call it here whenever a new Node.js TGID is detected (stack.tgid != 0 && known_tgids.insert(stack.tgid)); the shared function should install eBPF V8 metadata and the userspace heap reader and return errors so the caller can log/fall back, and also ensure V8 caches and installed state are invalidated on exec/exit as the non-TUI path does.profile-bee/src/event_loop.rs (1)
399-408:⚠️ Potential issue | 🟠 MajorRemove
v8_proc_infoon process exit and exec.
invalidate_caches_for_pid()removes userspace V8 readers, but the eBPFv8_proc_infoentry remains. If the PID exits/reuses or execs into a non-Node binary, the FP+V8 walker can keep applying stale V8 offsets to that TGID.🐛 Proposed fix direction
+ fn remove_v8_proc_info(&mut self, tgid: u32) { + if let Some(map) = self.bpf.map_mut("v8_proc_info") { + if let Ok(mut v8_map) = + aya::maps::HashMap::<&mut MapData, u32, V8ProcInfoPod>::try_from(map) + { + let _ = v8_map.remove(&tgid); + } + } + } + @@ self.known_tgids.remove(&event.pid); + self.remove_v8_proc_info(event.pid); // Clean up V8 state for the exiting process to prevent @@ self.known_tgids.remove(&event.pid); + self.remove_v8_proc_info(event.pid); // Invalidate symbol caches for this PIDAlso applies to: 422-437
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@profile-bee/src/event_loop.rs` around lines 399 - 408, On PROCESS_EVENT_EXIT (inside PerfWork::ProcessEvent) also delete the eBPF v8_proc_info entry for the exiting pid in addition to invalidate_caches_for_pid: add a call like self.trace_handler.remove_v8_proc_info_for_pid(event.pid) (implement that method to remove the v8_proc_info map entry for the given pid); apply the same removal call in the exec-handling branch (the block referenced at lines 422-437) so v8_proc_info is cleared on both exit and exec.
♻️ Duplicate comments (4)
profile-bee-ebpf/src/lib.rs (1)
1213-1215:⚠️ Potential issue | 🟠 MajorClear
v8_sfibefore writing reusedSTORAGE.
STORAGEis per-CPU and reused. The inline FP fallback never clearspointer.v8_sfi, andfp_v8_finalize_stackonly overwrites entries up tolen, so a shorter or non-V8 sample can inherit stale SFI pointers and be mis-symbolized.🐛 Proposed fix
+#[inline(always)] +unsafe fn clear_v8_sfi(pointer: &mut FramePointers) { + for i in 0..MAX_V8_FRAMES { + pointer.v8_sfi[i] = 0; + } +} + #[inline(always)] unsafe fn copy_stack_regs(regs: &pt_regs, pointer: &mut FramePointers) -> (u64, u64, usize, u64) { + clear_v8_sfi(pointer); copy_stack_regs_fp_only(regs, &mut pointer.pointers) } @@ let pointer = &mut *pointer; + clear_v8_sfi(pointer); + // Copy frame pointers from state to STORAGE let len = state.frame_count;Also applies to: 1275-1297
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@profile-bee-ebpf/src/lib.rs` around lines 1213 - 1215, The inline FP fallback path reuses per-CPU STORAGE but never clears pointer.v8_sfi, allowing stale SFI pointers to be retained; before calling copy_stack_regs_fp_only (and in the equivalent FP-only paths), explicitly clear pointer.v8_sfi (or reset the v8 SFI field to a neutral value) so that fp_v8_finalize_stack cannot leave old SFI entries in STORAGE; ensure the same clearing is applied wherever fp_v8_finalize_stack writes only up to len (e.g., the other FP-only helper code paths).profile-bee/bin/profile-bee.rs (1)
776-779:⚠️ Potential issue | 🔴 CriticalFix
RawCollapseSinkconstruction/finalization and avoid empty mixed.rawfiles.
RawCollapseSink::new(...)returns aResult, so this buildsOption<Result<_>>, notOption<RawCollapseSink>. Also, mixed raw+symbolized mode warns but still writes the raw sink at the end, producing the empty file called out in the warning.🐛 Proposed fix
let mut raw_sink: Option<RawCollapseSink> = resolved_outputs .iter() .find(|(fmt, _)| *fmt == OutputFormat::Raw) - .map(|(_, path)| RawCollapseSink::new(path.clone())); + .map(|(_, path)| RawCollapseSink::new(path.clone())) + .transpose()?; @@ } else if raw_sink.is_some() && has_symbolized_sinks { - // Mixed mode: both raw and symbolized outputs requested. - // The eBPF maps are consumed by the symbolized collection path, so - // the raw sink cannot extract addresses from the same collection pass. - // Warn the user and proceed with only the symbolized outputs. - eprintln!( - "Warning: .raw output combined with symbolized formats is not supported — \ - the .raw file will be empty. Use -o profile.raw alone for raw captures, \ - then re-symbolize with: probee symbolize profile.raw -o output.svg" - ); - let result = event_loop.collect(&perf_rx, None); - sink.write_batch(&result.stacks)?; - sink.set_actual_duration_ms(started.elapsed().as_millis() as u64); - sink.finish(&result.stacks)?; + anyhow::bail!( + ".raw output cannot be combined with symbolized formats yet; \ + use -o profile.raw alone, then re-symbolize with: probee symbolize profile.raw -o output.svg" + ); @@ - if let Some(ref raw) = raw_sink { - raw.write()?; + if let Some(ref mut raw) = raw_sink { + raw.finalize()?; }Run this read-only check to confirm the constructor/finalizer names in the current tree:
#!/bin/bash # Verify RawCollapseSink API and raw sink call sites. rg -n -C3 'struct RawCollapseSink|impl RawCollapseSink|pub fn new|pub fn finalize|pub fn write|RawCollapseSink::new|raw_sink'Also applies to: 815-848
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@profile-bee/bin/profile-bee.rs` around lines 776 - 779, RawCollapseSink::new returns a Result, so the current mapping produces Option<Result<_>> and later code may call finalize even when construction failed or when running in mixed raw+symbolized mode (causing empty .raw files). Change the construction to handle the Result (e.g., map and then .transpose() or match to produce Option<RawCollapseSink> by propagating or logging the error) when building raw_sink from resolved_outputs; only call RawCollapseSink::finalize on a Some(sink) that was successfully constructed and only when not in mixed raw+symbolized mode (check the existing mixed-mode flag/condition) so we avoid creating empty raw files. Ensure to reference RawCollapseSink::new and RawCollapseSink::finalize in your fix and keep resolved_outputs / OutputFormat::Raw selection logic intact.profile-bee/src/symbolize.rs (1)
38-42:⚠️ Potential issue | 🟠 MajorUse the captured mappings instead of live PIDs for offline symbolization.
The raw file stores mapping snapshots, but they are parsed into
_linesand never consumed.Source::Process(Pid::from(pid))resolves against the current live/proc/<pid>, soprobee symbolizecan fail or mis-symbolize after the profiled process exits or the PID is reused.#!/bin/bash # Confirm mapping snapshots are parsed but not used for symbol resolution. rg -n -C4 'ProcMappings|_lines|Source::Process|Source::Kernel|symbolize\(' profile-bee/src/symbolize.rs profile-bee/src/output.rsAlso applies to: 52-63, 98-100
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@profile-bee/src/symbolize.rs` around lines 38 - 42, ProcMappings currently stores raw mapping lines in the private field _lines but the symbolization path uses Source::Process(Pid::from(pid)) which queries the live /proc/<pid>, causing mis-symbolization for offline snapshots; update the code so symbolization uses the captured mappings: expose or pass ProcMappings._lines (e.g., via a method on ProcMappings or by constructing a new Source variant like Source::ProcessSnapshot) into the symbolizer/symbolize(...) call instead of Source::Process, and update all call sites that create Source::Process from the recorded pid (search for Source::Process(...) and the symbolize function) to use the snapshot-aware source so resolution reads the stored mapping lines rather than the live PID.profile-bee/src/v8/heap.rs (1)
249-261:⚠️ Potential issue | 🟠 MajorUse a floor char boundary before truncating ConsStrings.
Line 261 can still panic when
MAX_CONSSTRING_BYTESlands inside a multi-byte character, and lines 253-258 can include a character that extends pastremaining, exceeding the cap. Use one helper that returns the largest valid UTF-8 boundary<= limit.🐛 Proposed boundary-safe truncation
+ fn floor_char_boundary(s: &str, limit: usize) -> usize { + let limit = limit.min(s.len()); + if s.is_char_boundary(limit) { + return limit; + } + s.char_indices() + .map(|(i, _)| i) + .take_while(|&i| i < limit) + .last() + .unwrap_or(0) + } + let mut result = first; if result.len() < MAX_CONSSTRING_BYTES { let remaining = MAX_CONSSTRING_BYTES - result.len(); - // Truncate at a char boundary - let end = second - .char_indices() - .take_while(|(i, _)| *i < remaining) - .last() - .map(|(i, c)| i + c.len_utf8()) - .unwrap_or(0); + let end = floor_char_boundary(&second, remaining); result.push_str(&second[..end]); } else { - result.truncate(MAX_CONSSTRING_BYTES); + let end = floor_char_boundary(&result, MAX_CONSSTRING_BYTES); + result.truncate(end); }Rust String::truncate panics if byte index is not a char boundary official documentation🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@profile-bee/src/v8/heap.rs` around lines 249 - 261, The current truncation logic can still slice in the middle of a multi-byte character; create and use a helper like largest_char_boundary_le(s: &str, limit: usize) that returns the largest UTF-8 char boundary <= limit, then use it both when appending from second and when truncating result: compute safe_end = largest_char_boundary_le(second, remaining) before slicing &second[..safe_end], and compute safe_trunc = largest_char_boundary_le(&result, MAX_CONSSTRING_BYTES) before calling truncate(MAX_CONSSTRING_BYTES) (or truncate to safe_trunc) so all string slicing/truncation uses valid char boundaries (referencing result, second, MAX_CONSSTRING_BYTES).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@profile-bee-ebpf/src/lib.rs`:
- Around line 1425-1436: The code in try_read_v8_sfi currently only checks that
fp_function < V8_FP_CONTEXT_SIZE which allows a u64 read to overflow the 64-byte
FP context; change the validation to ensure the entire 8-byte slot fits by
checking fp_func_offset + core::mem::size_of::<u64>() as u64 <=
V8_FP_CONTEXT_SIZE as u64 (and return None if not), then proceed to compute
fp_ctx_addr and call bpf_probe_read_user as before (refs: try_read_v8_sfi,
fp_function, fp_func_offset, V8_FP_CONTEXT_SIZE, fp_ctx_addr, jsfunc_tagged).
In `@profile-bee/src/ebpf.rs`:
- Around line 185-208: The FP+V8 tail-call program (fp_v8_unwind_step) is only
loaded in setup_tail_call_unwinding_inner() which runs only when DWARF is
enabled, so PROG_ARRAY[1] is left empty for the non-DWARF (Node.js) path; split
the DWARF and FP+V8 registration so fp_v8_unwind_step is always
loaded/registered before attaching profile_cpu: extract the fp_v8 loading + fd
clone + prog_array.set(1, ...) into a new helper (e.g. register_fp_v8_tail_call
or register_fp_v8_only) and call it unconditionally from setup_ebpf_profiler()
(or the non-DWARF branch) instead of depending on should_enable_fp_v8_tail_call
or setup_tail_call_unwinding_inner; ensure you use the same fp_v8_unwind_step
program name, clone its fd, and write to prog_array index 1 so v8_sfi can be
collected.
In `@profile-bee/src/v8/heap.rs`:
- Around line 290-292: The current code treats SeqOneByteString bytes as UTF-8
via String::from_utf8_lossy, corrupting bytes 0x80–0xFF; change the decoding to
Latin-1 by mapping each byte to the corresponding Unicode code point U+00XX
(e.g., bytes.iter().map(|&b| b as char).collect::<String>() ) instead of
from_utf8_lossy. Update the code around chars_addr / read_bytes / length where
SeqOneByteString is converted to String so the bytes are directly cast to
Latin-1 chars and returned, preserving error handling from read_bytes.
---
Outside diff comments:
In `@profile-bee-ebpf/src/lib.rs`:
- Around line 1394-1412: The loop in copy_stack_regs_fp_only uses a dynamic
bound (for i in 1..pointers.len()) which the eBPF verifier rejects; replace it
with a compile-time constant loop bound (e.g. for i in 1..MAX_FRAMES where
MAX_FRAMES is a const equal to the maximum pointers buffer size) and inside the
loop check if i >= pointers.len() then break; call get_frame(&mut bp) each
iteration as before and set pointers[i] and update len when get_frame returns
None. Ensure the const name matches the actual buffer size used for pointers and
update any related uses of len, bp, ip, sp accordingly (function:
copy_stack_regs_fp_only; symbols: pointers, get_frame, bp, len).
In `@profile-bee/bin/profile-bee.rs`:
- Around line 1262-1266: When handling PerfWork::StackInfo in the TUI path (the
block using tgid_request_tx and sending DwarfThreadMsg::LoadProcess), you need
to mirror the non-TUI V8 setup: factor the existing try_setup_v8_for_pid logic
out of ProfilingEventLoop into a shared function and call it here whenever a new
Node.js TGID is detected (stack.tgid != 0 && known_tgids.insert(stack.tgid));
the shared function should install eBPF V8 metadata and the userspace heap
reader and return errors so the caller can log/fall back, and also ensure V8
caches and installed state are invalidated on exec/exit as the non-TUI path
does.
In `@profile-bee/src/event_loop.rs`:
- Around line 399-408: On PROCESS_EVENT_EXIT (inside PerfWork::ProcessEvent)
also delete the eBPF v8_proc_info entry for the exiting pid in addition to
invalidate_caches_for_pid: add a call like
self.trace_handler.remove_v8_proc_info_for_pid(event.pid) (implement that method
to remove the v8_proc_info map entry for the given pid); apply the same removal
call in the exec-handling branch (the block referenced at lines 422-437) so
v8_proc_info is cleared on both exit and exec.
---
Duplicate comments:
In `@profile-bee-ebpf/src/lib.rs`:
- Around line 1213-1215: The inline FP fallback path reuses per-CPU STORAGE but
never clears pointer.v8_sfi, allowing stale SFI pointers to be retained; before
calling copy_stack_regs_fp_only (and in the equivalent FP-only paths),
explicitly clear pointer.v8_sfi (or reset the v8 SFI field to a neutral value)
so that fp_v8_finalize_stack cannot leave old SFI entries in STORAGE; ensure the
same clearing is applied wherever fp_v8_finalize_stack writes only up to len
(e.g., the other FP-only helper code paths).
In `@profile-bee/bin/profile-bee.rs`:
- Around line 776-779: RawCollapseSink::new returns a Result, so the current
mapping produces Option<Result<_>> and later code may call finalize even when
construction failed or when running in mixed raw+symbolized mode (causing empty
.raw files). Change the construction to handle the Result (e.g., map and then
.transpose() or match to produce Option<RawCollapseSink> by propagating or
logging the error) when building raw_sink from resolved_outputs; only call
RawCollapseSink::finalize on a Some(sink) that was successfully constructed and
only when not in mixed raw+symbolized mode (check the existing mixed-mode
flag/condition) so we avoid creating empty raw files. Ensure to reference
RawCollapseSink::new and RawCollapseSink::finalize in your fix and keep
resolved_outputs / OutputFormat::Raw selection logic intact.
In `@profile-bee/src/symbolize.rs`:
- Around line 38-42: ProcMappings currently stores raw mapping lines in the
private field _lines but the symbolization path uses
Source::Process(Pid::from(pid)) which queries the live /proc/<pid>, causing
mis-symbolization for offline snapshots; update the code so symbolization uses
the captured mappings: expose or pass ProcMappings._lines (e.g., via a method on
ProcMappings or by constructing a new Source variant like
Source::ProcessSnapshot) into the symbolizer/symbolize(...) call instead of
Source::Process, and update all call sites that create Source::Process from the
recorded pid (search for Source::Process(...) and the symbolize function) to use
the snapshot-aware source so resolution reads the stored mapping lines rather
than the live PID.
In `@profile-bee/src/v8/heap.rs`:
- Around line 249-261: The current truncation logic can still slice in the
middle of a multi-byte character; create and use a helper like
largest_char_boundary_le(s: &str, limit: usize) that returns the largest UTF-8
char boundary <= limit, then use it both when appending from second and when
truncating result: compute safe_end = largest_char_boundary_le(second,
remaining) before slicing &second[..safe_end], and compute safe_trunc =
largest_char_boundary_le(&result, MAX_CONSSTRING_BYTES) before calling
truncate(MAX_CONSSTRING_BYTES) (or truncate to safe_trunc) so all string
slicing/truncation uses valid char boundaries (referencing result, second,
MAX_CONSSTRING_BYTES).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e6b22155-42c5-4896-9472-43e56bd52403
⛔ Files ignored due to path filters (1)
profile-bee/ebpf-bin/profile-bee.bpf.ois excluded by!**/*.o
📒 Files selected for processing (11)
profile-bee-common/src/lib.rsprofile-bee-ebpf/src/lib.rsprofile-bee-ebpf/src/main.rsprofile-bee/bin/profile-bee.rsprofile-bee/src/ebpf.rsprofile-bee/src/event_loop.rsprofile-bee/src/output.rsprofile-bee/src/symbolize.rsprofile-bee/src/trace_handler.rsprofile-bee/src/v8/heap.rsprofile-bee/src/v8/introspection.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- profile-bee/src/trace_handler.rs
- profile-bee/src/output.rs
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
profile-bee-ebpf/src/lib.rs (1)
1399-1417:⚠️ Potential issue | 🟠 MajorUse a compile-time bounded loop in the FP fallback walker.
for i in 1..pointers.len()uses a runtime slice length as the eBPF loop bound. Keep the verifier-friendly constant-bound pattern and break when the slice limit is reached.🐛 Proposed fix
- for i in 1..pointers.len() { + for i in 1..MAX_DWARF_STACK_DEPTH { + if i >= pointers.len() { + break; + } let Some(ret_addr) = get_frame(&mut bp) else { len = i; break; }; pointers[i] = ret_addr;As per coding guidelines,
profile-bee-ebpf/src/**/*.rs: “All loops in eBPF code MUST be bounded with compile-time constants (e.g.,for _ in 0..CONST). Do not usewhileloops or dynamic bounds”.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@profile-bee-ebpf/src/lib.rs` around lines 1399 - 1417, The FP fallback walker in copy_stack_regs_fp_only uses a runtime-bound loop (for i in 1..pointers.len()), which breaks eBPF verifier rules; change this to a compile-time bounded loop (e.g., for i in 1..MAX_PTRS) and inside the loop check if i >= pointers.len() then set len = i and break, call get_frame(&mut bp) and break if it returns None, and otherwise write pointers[i] = ret_addr and continue; ensure you define a constant MAX_PTRS matching the maximum slice capacity and update the length variable the same way as before so the rest of the function uses the corrected len.
♻️ Duplicate comments (5)
profile-bee/src/event_loop.rs (1)
373-384:⚠️ Potential issue | 🟠 MajorSkip V8 setup when
collect_unsymbolized()requests no symbolization.With
symbolize = false,drain_events()still runstry_setup_v8_for_pid(), which reads the Node binary, loadsv8_proc_info, and registers a V8 heap reader. Gate that work behindsymbolizeso raw-only capture stays truly unsymbolized and avoids extra eBPF V8 extraction overhead.🐛 Proposed fix
if let Some(tx) = &self.tgid_request_tx { if stack.tgid != 0 && self.known_tgids.insert(stack.tgid) { let _ = tx.send(DwarfThreadMsg::LoadProcess(stack.tgid)); - // Check if this is a Node.js/V8 process and set up - // V8 introspection (eBPF FP context extraction + - // userspace heap reader for JS symbol resolution). - self.try_setup_v8_for_pid(stack.tgid); + if symbolize { + self.try_setup_v8_for_pid(stack.tgid); + } } } else if stack.tgid != 0 && self.known_tgids.insert(stack.tgid) { - // Even without DWARF thread, detect V8 processes - self.try_setup_v8_for_pid(stack.tgid); + if symbolize { + self.try_setup_v8_for_pid(stack.tgid); + } }Also applies to: 388-405
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@profile-bee/src/event_loop.rs` around lines 373 - 384, The drain_events path currently calls try_setup_v8_for_pid(stack.tgid) even when collection was requested without symbolization; change the logic in drain_events (the branches that check self.tgid_request_tx, known_tgids and stack.tgid) to skip calling try_setup_v8_for_pid when the collector was created with symbolize disabled (e.g., check the collect_unsymbolized()/symbolize flag or equivalent field on self before invoking try_setup_v8_for_pid), so V8 extraction (v8_proc_info load and heap reader registration) only runs when symbolization is enabled.profile-bee/bin/profile-bee.rs (1)
773-848:⚠️ Potential issue | 🟠 MajorReject unsupported
.rawcombinations instead of writing empty captures.
--serve -o profile.rawenters the serve branch and later writes an empty raw file without warning; raw + symbolized outputs warn but still exit successfully after writing an empty.raw. Please fail before collection whenraw_sink.is_some() && has_symbolized_sinks, or populate the raw sink from the same collection pass.🐛 Proposed fix
let has_symbolized_sinks = resolved_outputs .iter() .any(|(fmt, _)| *fmt != OutputFormat::Raw) || opt.serve; + + if raw_sink.is_some() && has_symbolized_sinks { + anyhow::bail!( + ".raw output cannot be combined with --serve or symbolized output formats yet; \ + run with `-o profile.raw` alone, then use `probee symbolize profile.raw -o output.svg`" + ); + } if opt.serve { // Serve mode: periodically flush data to the web server🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@profile-bee/bin/profile-bee.rs` around lines 773 - 848, The code currently proceeds with collection when raw_sink.is_some() && has_symbolized_sinks and ends up writing an empty .raw; change this to reject the unsupported combination early by checking the condition before any collection or enter-serve logic (use the raw_sink and has_symbolized_sinks variables and opt.serve flag) and return an error (or exit non-zero) with a clear message if both a raw sink and any symbolized sinks are requested; ensure the same check covers both serve and non-serve flows (so you don't enter the serve loop or call event_loop.collect/collect_unsymbolized when both are requested).profile-bee-ebpf/src/lib.rs (1)
1049-1058:⚠️ Potential issue | 🟠 MajorClear
v8_sfiin finalization paths before storing per-CPUFramePointers.
STORAGEpersists per CPU.copy_stack_regs()now clears fallback samples, but DWARF finalization never clearspointer.v8_sfi, and FP+V8 finalization only overwrites indices belowlen. Clear allMAX_V8_FRAMESslots before copying current SFI values.🐛 Proposed fix
unsafe fn dwarf_finalize_stack(state: &DwarfUnwindState) { let Some(pointer) = STORAGE.get_ptr_mut(0) else { return; }; let pointer = &mut *pointer; + for i in 0..MAX_V8_FRAMES { + pointer.v8_sfi[i] = 0; + } + // Copy unwound frame pointers from state to STORAGE let len = state.frame_count;unsafe fn fp_v8_finalize_stack(state: &DwarfUnwindState) { let Some(pointer) = STORAGE.get_ptr_mut(0) else { return; }; let pointer = &mut *pointer; + for i in 0..MAX_V8_FRAMES { + pointer.v8_sfi[i] = 0; + } + // Copy frame pointers from state to STORAGE let len = state.frame_count;Also applies to: 1286-1302
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@profile-bee-ebpf/src/lib.rs` around lines 1049 - 1058, Finalization paths don't clear the per-CPU pointer.v8_sfi slots before storing FramePointers, so stale V8 SFI data can persist; update the finalization code paths (the DWARF finalizer and the FP+V8 finalizer where pointer.pointers, pointer.len, state.pointers, and state.frame_count are used) to explicitly zero/clear all entries of pointer.v8_sfi for the full MAX_V8_FRAMES range before copying current pointers and setting pointer.len, i.e., iterate 0..MAX_V8_FRAMES and reset pointer.v8_sfi[i] then perform the existing copy of state.pointers and pointer.len so no leftover V8 SFI remains in STORAGE.profile-bee/src/symbolize.rs (1)
38-45:⚠️ Potential issue | 🟠 MajorUse captured mappings instead of live
/proc/<pid>/mapsfor offline symbolization.
parse_raw_collapse()stores the raw mappings, but user frames are still symbolized viaSource::Process(Pid). For exited or PID-reused processes,probee symbolize profile.rawwill resolve to[unknown]or the wrong binary despite the raw file containing mapping snapshots.Run this read-only check to confirm mapping snapshots are parsed but not consumed by symbolization:
#!/bin/bash # Description: Inspect whether captured ProcMappings lines are used to build user-space symbol sources. rg -n -C4 '\bProcMappings\b|_lines|Source::Process|Process::new|symbolize\(' --type rustAlso applies to: 103-119
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@profile-bee/src/symbolize.rs` around lines 38 - 45, The symbolizer is still using live process lookups (Source::Process / Process::new) even though parse_raw_collapse() parsed ProcMappings/_lines into the raw file; change the symbolization path to detect and consume those captured mappings: when parse_raw_collapse() returned non-empty mapping snapshots (ProcMappings), construct a symbol source from those snapshots (a blazesym source built from the captured mappings and mapped file paths) and pass that source into the existing symbolize flow instead of Source::Process(Pid); update the code paths that call symbolize(...) and any Process::new usages to prefer the new mapping-based source when available so offline/exited PID cases resolve correctly.profile-bee/src/v8/introspection.rs (1)
337-348:⚠️ Potential issue | 🟠 MajorValidation still misses
off_jsfunction_sharedand the JSFunction type range.If
off_jsfunction_shared == 0, theoff_jsfunction_codefallback at lines 143–149 builds a bogus offset from0(0 - 8or0 + 24), and the downstream eBPF SFI read points at the wrong JSFunction field with no diagnostic. Likewise, whenv8dbg_LastJSFunctionTypeis absent butv8dbg_FirstJSFunctionTypeis present,last_jsfunction_typestays0and no fallback applies — the range becomes[first, 0]and every JSFunction is silently rejected by eBPF.Recommend tightening the gate here to also require
off_jsfunction_shared != 0,first_jsfunction_type != 0, andlast_jsfunction_type >= first_jsfunction_type, and logging the extra fields for diagnostics.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@profile-bee/src/v8/introspection.rs` around lines 337 - 348, The validation block should also ensure off_jsfunction_shared, first_jsfunction_type are non-zero and that last_jsfunction_type >= first_jsfunction_type to avoid bogus offsets and an inverted type range; update the gate in the same function that currently checks off_map_instance_type/type_jsfunction/type_shared_function_info to additionally require off_jsfunction_shared != 0 and first_jsfunction_type != 0 and validate last_jsfunction_type >= first_jsfunction_type, and extend the tracing::warn message to include the values of off_jsfunction_shared, first_jsfunction_type, and last_jsfunction_type so diagnostics show the additional missing/invalid symbols (references: off_jsfunction_shared, off_jsfunction_code fallback, v8dbg_FirstJSFunctionType, v8dbg_LastJSFunctionType, first_jsfunction_type, last_jsfunction_type).
🧹 Nitpick comments (2)
profile-bee/src/v8/heap.rs (1)
135-160: Single failed slot read aborts the whole ScopeInfo scan.
let tagged = self.read_ptr(start + i * ptr_size)?;inside the loop propagatesNoneout ofextract_name_from_scope_infoas soon as any one slot is unreadable (e.g., if the scan reaches past the ScopeInfo allocation into an unmapped page). Since this is already a best-effort heuristic, continuing to the next slot (orbreaking) is more robust and preserves a chance of finding the name in earlier/later valid slots.♻️ Proposed tweak
- for i in 0..16u64 { - let tagged = self.read_ptr(start + i * ptr_size)?; + for i in 0..16u64 { + let Some(tagged) = self.read_ptr(start + i * ptr_size) else { + break; + }; if !types::is_heap_object(tagged) { continue; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@profile-bee/src/v8/heap.rs` around lines 135 - 160, In extract_name_from_scope_info change the loop so a single failed read_ptr doesn't return None for the whole function: replace the use of the try-operator on read_ptr(start + i * ptr_size)? inside the for-loop with error-aware handling (e.g., match or if-let) that skips that slot (or breaks) on read failure and continues scanning remaining slots; keep the existing logic that checks types::is_heap_object, types::untag_ptr, read_instance_type, and read_v8_string so valid string reads still return Some(name).profile-bee-common/src/lib.rs (1)
242-258: fp_function conversion is properly validated; off_ fields lack runtime bounds checks but pose minimal risk given V8 stability.*The review comment's concern about
fp_functionis incorrect. The code does have explicit remapping viamap_fp_offset()(profile-bee/src/v8/types.rs:179–188), which:
- Converts the signed FP-relative offset to an absolute position within the 64-byte buffer
- Validates that the result is in range [0, 56], returning 64 (sentinel) if out of bounds
- Includes test coverage confirming valid and invalid cases
However,
off_heap_object_map,off_map_instance_type, andoff_jsfunction_sharedare truncated fromu32tou8without validation into_proc_info(). While current V8 mainline offsets are small (0, 12, 24), a future V8 version with larger offsets would silently wrap. Consider adding a validation step (similar tomap_fp_offset) that returnsOption<V8ProcInfo>or logs a warning if any offset exceeds 255.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@profile-bee-common/src/lib.rs` around lines 242 - 258, The issue: off_heap_object_map, off_map_instance_type, and off_jsfunction_shared are truncated to u8 without bounds checks in the conversion code; update to_proc_info() to validate each u32 offset fits in u8 (<=255) before constructing V8ProcInfo—use the same pattern as map_fp_offset() (or return None/Err) and either return Option<V8ProcInfo> or propagate an error/log a warning if any offset is out of range so truncation/wraparound cannot occur silently; ensure tests cover valid and overflowing cases.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@profile-bee/src/event_loop.rs`:
- Around line 267-306: The code currently inserts per-PID V8 layout into the
v8_proc_info eBPF map inside try_setup_v8_for_pid without guaranteeing cleanup
because remove_v8_proc_info_for_pid only runs when ProcessEvent exit/exec
handlers are attached; modify try_setup_v8_for_pid to only insert into
v8_proc_info when lifecycle/event plumbing is enabled (or alternatively ensure
lifecycle tracking is enabled whenever register_v8_reader is called) so entries
cannot be left behind on process exit—specifically, add a check before calling
self.bpf.map_mut("v8_proc_info")/insert (or before calling
self.trace_handler.register_v8_reader) that verifies the exit/exec tracepoints
(lifecycle tracking) are active, and skip insertion/registration if they are
not.
In `@profile-bee/src/v8/introspection.rs`:
- Around line 420-441: In read_u32_at_vaddr, hoist the loop-invariant vaddr_end
= vaddr.checked_add(4) out of the section loop and return None if that fails,
and change per-section checked_add calls (sec_addr.checked_add(sec_size) and
offset.checked_add(4)) to handle overflow by skipping that section (use
continue) instead of using the ? operator so one malformed section cannot cause
a global None; keep the existing bounds checks and data access logic otherwise.
---
Outside diff comments:
In `@profile-bee-ebpf/src/lib.rs`:
- Around line 1399-1417: The FP fallback walker in copy_stack_regs_fp_only uses
a runtime-bound loop (for i in 1..pointers.len()), which breaks eBPF verifier
rules; change this to a compile-time bounded loop (e.g., for i in 1..MAX_PTRS)
and inside the loop check if i >= pointers.len() then set len = i and break,
call get_frame(&mut bp) and break if it returns None, and otherwise write
pointers[i] = ret_addr and continue; ensure you define a constant MAX_PTRS
matching the maximum slice capacity and update the length variable the same way
as before so the rest of the function uses the corrected len.
---
Duplicate comments:
In `@profile-bee-ebpf/src/lib.rs`:
- Around line 1049-1058: Finalization paths don't clear the per-CPU
pointer.v8_sfi slots before storing FramePointers, so stale V8 SFI data can
persist; update the finalization code paths (the DWARF finalizer and the FP+V8
finalizer where pointer.pointers, pointer.len, state.pointers, and
state.frame_count are used) to explicitly zero/clear all entries of
pointer.v8_sfi for the full MAX_V8_FRAMES range before copying current pointers
and setting pointer.len, i.e., iterate 0..MAX_V8_FRAMES and reset
pointer.v8_sfi[i] then perform the existing copy of state.pointers and
pointer.len so no leftover V8 SFI remains in STORAGE.
In `@profile-bee/bin/profile-bee.rs`:
- Around line 773-848: The code currently proceeds with collection when
raw_sink.is_some() && has_symbolized_sinks and ends up writing an empty .raw;
change this to reject the unsupported combination early by checking the
condition before any collection or enter-serve logic (use the raw_sink and
has_symbolized_sinks variables and opt.serve flag) and return an error (or exit
non-zero) with a clear message if both a raw sink and any symbolized sinks are
requested; ensure the same check covers both serve and non-serve flows (so you
don't enter the serve loop or call event_loop.collect/collect_unsymbolized when
both are requested).
In `@profile-bee/src/event_loop.rs`:
- Around line 373-384: The drain_events path currently calls
try_setup_v8_for_pid(stack.tgid) even when collection was requested without
symbolization; change the logic in drain_events (the branches that check
self.tgid_request_tx, known_tgids and stack.tgid) to skip calling
try_setup_v8_for_pid when the collector was created with symbolize disabled
(e.g., check the collect_unsymbolized()/symbolize flag or equivalent field on
self before invoking try_setup_v8_for_pid), so V8 extraction (v8_proc_info load
and heap reader registration) only runs when symbolization is enabled.
In `@profile-bee/src/symbolize.rs`:
- Around line 38-45: The symbolizer is still using live process lookups
(Source::Process / Process::new) even though parse_raw_collapse() parsed
ProcMappings/_lines into the raw file; change the symbolization path to detect
and consume those captured mappings: when parse_raw_collapse() returned
non-empty mapping snapshots (ProcMappings), construct a symbol source from those
snapshots (a blazesym source built from the captured mappings and mapped file
paths) and pass that source into the existing symbolize flow instead of
Source::Process(Pid); update the code paths that call symbolize(...) and any
Process::new usages to prefer the new mapping-based source when available so
offline/exited PID cases resolve correctly.
In `@profile-bee/src/v8/introspection.rs`:
- Around line 337-348: The validation block should also ensure
off_jsfunction_shared, first_jsfunction_type are non-zero and that
last_jsfunction_type >= first_jsfunction_type to avoid bogus offsets and an
inverted type range; update the gate in the same function that currently checks
off_map_instance_type/type_jsfunction/type_shared_function_info to additionally
require off_jsfunction_shared != 0 and first_jsfunction_type != 0 and validate
last_jsfunction_type >= first_jsfunction_type, and extend the tracing::warn
message to include the values of off_jsfunction_shared, first_jsfunction_type,
and last_jsfunction_type so diagnostics show the additional missing/invalid
symbols (references: off_jsfunction_shared, off_jsfunction_code fallback,
v8dbg_FirstJSFunctionType, v8dbg_LastJSFunctionType, first_jsfunction_type,
last_jsfunction_type).
---
Nitpick comments:
In `@profile-bee-common/src/lib.rs`:
- Around line 242-258: The issue: off_heap_object_map, off_map_instance_type,
and off_jsfunction_shared are truncated to u8 without bounds checks in the
conversion code; update to_proc_info() to validate each u32 offset fits in u8
(<=255) before constructing V8ProcInfo—use the same pattern as map_fp_offset()
(or return None/Err) and either return Option<V8ProcInfo> or propagate an
error/log a warning if any offset is out of range so truncation/wraparound
cannot occur silently; ensure tests cover valid and overflowing cases.
In `@profile-bee/src/v8/heap.rs`:
- Around line 135-160: In extract_name_from_scope_info change the loop so a
single failed read_ptr doesn't return None for the whole function: replace the
use of the try-operator on read_ptr(start + i * ptr_size)? inside the for-loop
with error-aware handling (e.g., match or if-let) that skips that slot (or
breaks) on read failure and continues scanning remaining slots; keep the
existing logic that checks types::is_heap_object, types::untag_ptr,
read_instance_type, and read_v8_string so valid string reads still return
Some(name).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e25c317c-d4d6-41bb-88b9-70655eb4d831
⛔ Files ignored due to path filters (2)
profile-bee-ebpf/Cargo.lockis excluded by!**/*.lockprofile-bee/ebpf-bin/profile-bee.bpf.ois excluded by!**/*.o
📒 Files selected for processing (11)
profile-bee-common/src/lib.rsprofile-bee-ebpf/src/lib.rsprofile-bee-ebpf/src/main.rsprofile-bee/bin/profile-bee.rsprofile-bee/src/ebpf.rsprofile-bee/src/event_loop.rsprofile-bee/src/output.rsprofile-bee/src/symbolize.rsprofile-bee/src/trace_handler.rsprofile-bee/src/v8/heap.rsprofile-bee/src/v8/introspection.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- profile-bee/src/output.rs
- profile-bee/src/trace_handler.rs
…olization pipelines - Fix reversed frame order in symbolize.rs (.rev() calls produced leaf-to-root instead of root-to-leaf collapse format) - Include tgid in raw collapse aggregation key to prevent cross-PID collision - Use per-sample PID for symbolization instead of single fallback PID - Clean up V8 readers and symbol caches on process EXIT (prevents stale readers after PID reuse); don't re-insert into known_tgids on EXEC so V8/DWARF setup runs for the new binary - Zero v8_sfi array before V8 FP walk and fix off-by-one (frame N's SFI was stored at v8_sfi[N+1]) - Sort mappings header by tgid for deterministic raw output - Add warnings for malformed/mismatched mapping blocks in raw collapse parser - Use checked arithmetic in read_u32_at_vaddr to handle malformed ELF metadata - Cap ConsString concatenation at 16KB to prevent unbounded allocations - Handle --help/--version in symbolize subcommand before dispatch - Warn when raw+symbolized sinks are combined (raw file will be empty) - Skip symbolization in drain_events when collect_unsymbolized is called - Downgrade per-PID V8ProcInfo load log from info to debug - Remove unused snapshot_perf_map helper - Improve doc-comments for v8_sfi truncation and get_raw_addresses V8 limitation
Summary by CodeRabbit
New Features
Bug Fixes / Improvements