Conversation
Add a JITDump binary format parser that resolves JIT-compiled JavaScript function names from runtimes like Bun (JavaScriptCore), Java HotSpot, and LuaJIT. Without this, JIT code in anonymous mmap'd pages shows as [unknown]. Implementation: - New jitdump.rs: zero-dependency parser with BTreeMap for O(log n) address range lookups, incremental reload for streaming modes, tolerates truncated files. Handles JIT_CODE_LOAD, JIT_CODE_MOVE, JIT_CODE_DEBUG_INFO, and JIT_CODE_CLOSE records. Discovers both standard (jit-<pid>.dump) and JSC (jit-<tid>-<pid>-<random>) file naming conventions. - trace_handler.rs: JIT tables per PID, override [unknown] symbols after V8 SFI resolution (V8 takes priority for Node.js, JITDump for non-V8). - event_loop.rs: auto-load JITDump on new PID detection, reload at end of each collection window to catch symbols written during profiling. - spawn.rs: auto-detect bun/bunx and inject BUN_JSC_useJITDump=1. - profile-bee.rs: warn when --pid targets Bun without JITDump file. - symbolize.rs: JITDump override in offline re-symbolization path. - E2E tests: bun_callstack.js fixture + 2 tests verifying sample collection and JS function name resolution via JITDump.
📝 WalkthroughWalkthroughThe changes implement end-to-end JITDump symbol support for Bun profiling by introducing a new jitdump module that parses JIT code records, integrating JIT table loading/reloading into the event loop and symbolizer, extending the trace handler with per-PID JIT symbol resolution, injecting BUN_JSC_useJITDump=1 into Bun child processes, and emitting guidance warnings when profiling running Bun processes without JITDump enabled. Tests validate the functionality. Changes
Sequence DiagramsequenceDiagram
participant User
participant Spawner as Spawner<br/>(spawn.rs)
participant EventLoop as Event Loop<br/>(event_loop.rs)
participant TraceHandler as Trace Handler<br/>(trace_handler.rs)
participant JITDump as JITDump Module<br/>(jitdump.rs)
participant Symbolizer as Symbolizer<br/>(symbolize.rs)
User->>Spawner: Profile Bun process
Note over Spawner: Detect Bun via is_bun_program()
Spawner->>Spawner: Inject BUN_JSC_useJITDump=1
EventLoop->>JITDump: find_jitdump_for_pid(tgid)
JITDump-->>EventLoop: /tmp/jit-*.dump path
EventLoop->>JITDump: load_from_file()
JITDump-->>EventLoop: JitSymbolTable
EventLoop->>TraceHandler: register_jit_table(tgid, table)
EventLoop->>EventLoop: Process events, drain_events()
Note over EventLoop: After each cycle
EventLoop->>EventLoop: reload_jitdump_tables()
Symbolizer->>JITDump: find_jitdump_for_pid(pid)
JITDump-->>Symbolizer: Path to JITDump file
Symbolizer->>JITDump: load_from_file()
JITDump-->>Symbolizer: JitSymbolTable
Symbolizer->>Symbolizer: Enumerate frames
alt Frame is [unknown]
Symbolizer->>JITDump: resolve(addr)
JITDump-->>Symbolizer: JitSymbol (range-based lookup)
Symbolizer->>JITDump: format_jit_symbol(sym)
JITDump-->>Symbolizer: Formatted symbol with file:line
end
Symbolizer-->>User: Resolved call stack with JIT symbols
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
profile-bee/src/trace_handler.rs (1)
367-373:⚠️ Potential issue | 🟡 MinorCaching concern: JITDump overrides won't reflect later reloads.
get_exp_stacked_framescaches the symbolized result by(tgid, ktrace_id, utrace_id)for reuse. In streaming/TUI mode,event_loop::reload_jitdump_tablespicks up new symbols between windows, but any stack that hit the cache before those symbols were loaded will keep serving[unknown]from the cached vector — JITDump-based overrides only apply on the first symbolization for a given stack ID. For long-lived processes that JIT functions progressively, this can leave stale[unknown]frames in the flamegraph despite the reload.Consider invalidating (or at least refreshing the JIT-override-dependent entries of) the cache when
reload_jitdump_tablesadds new symbols for a PID, or performing the JIT override on the cached result at lookup time rather than at insertion.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@profile-bee/src/trace_handler.rs` around lines 367 - 373, The cached symbolized stacks returned by get_exp_stacked_frames are becoming stale when reload_jitdump_tables adds new JIT symbols; update the cache logic so JIT-dump overrides are not permanently baked into cached entries—either invalidate or version cache entries when reload_jitdump_tables updates symbols for a PID or apply the JIT override at lookup time. Concretely: add a JIT-generation/version or last-jit-update marker per tgid that reload_jitdump_tables increments, include that marker in the cache key or store it with entries and check it in get_exp_stacked_frames before returning self.cache.get(tgid, ktrace_id, utrace_id), and if out-of-date re-run the JIT-override step (or refresh/invalidate the entry) so newly loaded JIT symbols are applied. Ensure references to self.cache.get, get_exp_stacked_frames and reload_jitdump_tables are the touch points for this change.
🧹 Nitpick comments (6)
profile-bee/src/event_loop.rs (2)
544-550:reload_jitdump_tables()runs on everydrain_eventscall, including thecollect_unsymbolizedpath.In raw-capture mode (
collect_unsymbolized), symbolization is deliberately skipped for speed, but this unconditional tail call still does filesystem I/O and parsing per PID per window. Consider skipping the reload whensymbolize == false, since the raw output path never reads the JIT tables anyway.♻️ Proposed refactor
- // Reload JITDump symbol tables at the end of the collection window. - // JIT runtimes (Bun, Java, etc.) write symbols incrementally as code - // is compiled — the initial load on first PID sight may have found an - // empty or nonexistent file. This final pass picks up all symbols - // written during the profiling window. - self.reload_jitdump_tables(); + // Reload JITDump symbol tables at the end of the collection window, + // but only when we intend to symbolize — raw-capture mode doesn't + // consult these tables. + if symbolize { + self.reload_jitdump_tables(); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@profile-bee/src/event_loop.rs` around lines 544 - 550, The call to reload_jitdump_tables() in drain_events causes unnecessary I/O during the collect_unsymbolized/raw-capture path; update drain_events to skip calling reload_jitdump_tables() when symbolization is disabled (check the symbolization flag used elsewhere, e.g. symbolize or the collect_unsymbolized code path) so that reload_jitdump_tables() only runs when symbolization is enabled; locate the drain_events implementation and add a conditional guard around the reload_jitdump_tables() tail call (or early-return for the raw path) so JIT parsing is avoided when symbolize == false or when collect_unsymbolized is active.
342-368:self.known_tgids.clone()per drain cycle is unnecessary.
reload_jitdump_tablesis called at the tail of everydrain_events(so once per collection window / TUI refresh). Cloning the wholeHashSet<u32>each time is avoidable — collect into a smallVec<u32>once via.iter().copied().collect()if you need to release the borrow onself.known_tgids, or restructure to avoid the borrow entirely.Also, on every tick this function does a
stat/openper known PID (viafind_jitdump_for_pid, which doesread_dir("/tmp")for the JSC pattern path). For a system-wide profile with many PIDs this is O(N × dir_size) syscalls per window — probably fine in practice, but worth noting if profiling dense workloads.♻️ Proposed refactor
- pub fn reload_jitdump_tables(&mut self) { - for &tgid in &self.known_tgids.clone() { + pub fn reload_jitdump_tables(&mut self) { + let tgids: Vec<u32> = self.known_tgids.iter().copied().collect(); + for tgid in tgids {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@profile-bee/src/event_loop.rs` around lines 342 - 368, The current reload_jitdump_tables clones self.known_tgids every call which is unnecessary and expensive; change the iteration to first collect the PIDs into a small Vec<u32> (e.g. self.known_tgids.iter().copied().collect()) so you release the borrow on self before calling jitter lookups, then iterate that Vec and call jitdump::find_jitdump_for_pid, trace_handler.jit_table_mut(tgid) and try_load_jitdump_for_pid(tgid) as before; this removes the per-cycle HashSet clone while preserving behavior and prevents holding a borrow across calls that access the filesystem.profile-bee/src/jitdump.rs (2)
108-117:reload_from_filedoesn't re-validate on file rotation.If a JIT runtime rotates or truncates
/tmp/jit-<pid>.dumpbetween reads (e.g., process exited and a reused PID wrote a fresh file), seeking to the oldlast_read_offsetwill land mid-record of the new file and produce garbage before tripping the truncation fallback. Not a bug today given the eBPF lifecycle hooks invalidate caches on exec/exit, but worth guarding — e.g., checkingfile.metadata().len() < self.last_read_offsetand resetting, or comparing the header magic bytes at offset 0 before seeking.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@profile-bee/src/jitdump.rs` around lines 108 - 117, In reload_from_file, guard against rotated/truncated files by validating the file before seeking: open the file, check file.metadata().len() and if it's less than self.last_read_offset set self.last_read_offset = 0 (or otherwise reset state), and/or read and verify the header magic bytes at offset 0 match the expected JIT dump signature before doing reader.seek(SeekFrom::Start(self.last_read_offset)); then proceed to call self.read_records(&mut reader) and update symbols as before; make sure to reference and update the fields last_read_offset and symbols and use the existing read_records method when implementing the fallback reset.
342-365: Minor:format!("jit-")is unnecessary and PID substring match can over-accept.Two small points on
find_jitdump_for_pid:
- Line 351:
let prefix = format!("jit-");allocates viaformat!with no interpolation —"jit-"can be used directly (clippyuseless_format).- The JSC pattern match
name.contains(&format!("-{}-", pid_str))is a substring test on the whole filename. For small PIDs this can incidentally match the<tid>or<random>segment rather than the intended<pid>segment (e.g.pid=5matchingjit-5-<otherpid>-abcorjit-123-456-5abcvariants). Splitting on-and checking the PID is at position 2 would be more precise.♻️ Proposed refactor
- // JSC/Bun convention: /tmp/jit-<tid>-<pid>-<random> - // Scan /tmp/ for files matching this pattern. - let prefix = format!("jit-"); - let pid_str = pid.to_string(); - if let Ok(entries) = std::fs::read_dir("/tmp") { - for entry in entries.flatten() { - let name = entry.file_name(); - let name = name.to_string_lossy(); - // Match pattern: jit-<digits>-<pid>-<alphanum> - if name.starts_with(&prefix) && name.contains(&format!("-{}-", pid_str)) { - return Some(entry.path()); - } - } - } + // JSC/Bun convention: /tmp/jit-<tid>-<pid>-<random> + let pid_str = pid.to_string(); + if let Ok(entries) = std::fs::read_dir("/tmp") { + for entry in entries.flatten() { + let name = entry.file_name(); + let name = name.to_string_lossy(); + // Match pattern: jit-<tid>-<pid>-<random> + let parts: Vec<&str> = name.split('-').collect(); + if parts.len() >= 4 && parts[0] == "jit" && parts[2] == pid_str { + return Some(entry.path()); + } + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@profile-bee/src/jitdump.rs` around lines 342 - 365, In find_jitdump_for_pid, avoid the useless allocation from format!("jit-") by using the literal "jit-" and make the JSC/Bun filename match precise by splitting the filename on '-' and verifying the PID is in the expected segment: check that the split parts length is >= 3 and parts[2] == pid_str (instead of name.contains(&format!("-{}-", pid_str))). Keep the existing starts_with("jit-") check (use the literal) and return entry.path() when the split PID matches.profile-bee/bin/profile-bee.rs (1)
612-617: Three near-duplicate warning blocks — consider extracting.The
spawn.is_none() { if let Some(target_pid) = pid { warn_nodejs_without_perf_map(target_pid); warn_bun_without_jitdump(target_pid); } }pattern is now repeated verbatim across the batch path,run_combined_mode, andrun_tui_mode. A tiny helper (e.g.warn_unsupported_jit_runtimes(target_pid)wrapping both calls) would make future additions (Deno, Java, …) a one-liner.Also applies to: 1664-1670, 1773-1779
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@profile-bee/bin/profile-bee.rs` around lines 612 - 617, Extract the duplicated block that checks spawn.is_none() and, when pid is Some, calls warn_nodejs_without_perf_map and warn_bun_without_jitdump into a small helper function (e.g. warn_unsupported_jit_runtimes(target_pid: i32)) and replace the three verbatim sites (the batch path, run_combined_mode, and run_tui_mode) with a single call to that helper; specifically, create the helper that accepts the target_pid and invokes warn_nodejs_without_perf_map(target_pid) and warn_bun_without_jitdump(target_pid), then in each location replace the if spawn.is_none() { if let Some(target_pid) = pid { ... } } block with the same pattern that calls the new helper to keep the spawn/pid check but centralize the warnings.tests/run_e2e.sh (1)
611-664: Optional:test_bun_samples_collectedlargely duplicates the sample check insidetest_bun_jitdump_callstack.Both tests gate on
bun/fixture presence, runrun_bun_profiler, and assertcount_samples > 0. The only extra behavior intest_bun_jitdump_callstackis theassert_stack_containscheck. This mirrors the Node.js split (test_nodejs_samples_collected+test_nodejs_callstack), so it's consistent — but be aware each test spins up a fresh profile run (~1s + overhead), effectively doubling Bun test runtime for a check already covered by the callstack test. Consider collapsing into a single test if CI time becomes a concern.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/run_e2e.sh` around lines 611 - 664, The two tests test_bun_samples_collected and test_bun_jitdump_callstack duplicate the same setup and sample-count check; to reduce redundant runs collapse them by removing test_bun_samples_collected and moving its basic sample existence check into test_bun_jitdump_callstack before the assert_stack_contains call (use the existing run_bun_profiler and count_samples logic), or alternatively have test_bun_jitdump_callstack call test_bun_samples_collected to reuse its check; update references to run_bun_profiler, count_samples, and assert_stack_contains accordingly so only one profiler run is performed per Bun fixture.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@profile-bee/src/jitdump.rs`:
- Around line 689-692: test_find_jitdump_nonexistent currently scans the real
/tmp and can false-fail if a stale jit file exists; update the test to use an
isolated temporary directory instead of the global /tmp by either (a) adding a
helper that searches a provided directory (e.g.,
find_jitdump_for_pid_in_dir(pid: u32, dir: &Path) or make find_jitdump_for_pid
accept a Path argument) and calling that helper from the test with a tempdir
created via tempfile::tempdir(), or (b) change the test to use a randomized high
PID that is extremely unlikely to exist; prefer option (a) so change the
test_find_jitdump_nonexistent to create a tempfile::TempDir, call the
new/modified search function (referencing find_jitdump_for_pid or new
find_jitdump_for_pid_in_dir) against that temp dir, and assert is_none().
- Around line 199-247: The bug is a key mismatch between parse_debug_info and
parse_code_load: change both sides to key pending_debug by code_addr (u64)
instead of using code_index; specifically, in parse_code_load replace the lookup
self.pending_debug.remove(&code_index) with
self.pending_debug.remove(&code_addr) and ensure parse_debug_info inserts into
pending_debug using the debug record's code_addr as the map key; also update the
parse_debug_info doc comment to state that code_addr is the correlation key (not
a "proxy for code_index") so future readers aren't confused.
In `@tests/run_e2e.sh`:
- Around line 661-663: The current regex passed to assert_stack_contains
("hot\|processData\|handleRequest\|serverLoop") is too loose and can match
unrelated native symbols; tighten it by anchoring to exact JS function names
from bun_callstack.js (use word-boundaries like
\bprocessData\b,\bhandleRequest\b,\bserverLoop\b) or replace the check with an
assertion that matches JITDump's collapse-line format (e.g. the resolved
collapse line pattern produced by JITDump for bun_callstack.js) so the test only
passes when symbolization actually resolved JS function names; update the call
to assert_stack_contains to use the new anchored regex or format-specific
pattern.
---
Outside diff comments:
In `@profile-bee/src/trace_handler.rs`:
- Around line 367-373: The cached symbolized stacks returned by
get_exp_stacked_frames are becoming stale when reload_jitdump_tables adds new
JIT symbols; update the cache logic so JIT-dump overrides are not permanently
baked into cached entries—either invalidate or version cache entries when
reload_jitdump_tables updates symbols for a PID or apply the JIT override at
lookup time. Concretely: add a JIT-generation/version or last-jit-update marker
per tgid that reload_jitdump_tables increments, include that marker in the cache
key or store it with entries and check it in get_exp_stacked_frames before
returning self.cache.get(tgid, ktrace_id, utrace_id), and if out-of-date re-run
the JIT-override step (or refresh/invalidate the entry) so newly loaded JIT
symbols are applied. Ensure references to self.cache.get, get_exp_stacked_frames
and reload_jitdump_tables are the touch points for this change.
---
Nitpick comments:
In `@profile-bee/bin/profile-bee.rs`:
- Around line 612-617: Extract the duplicated block that checks spawn.is_none()
and, when pid is Some, calls warn_nodejs_without_perf_map and
warn_bun_without_jitdump into a small helper function (e.g.
warn_unsupported_jit_runtimes(target_pid: i32)) and replace the three verbatim
sites (the batch path, run_combined_mode, and run_tui_mode) with a single call
to that helper; specifically, create the helper that accepts the target_pid and
invokes warn_nodejs_without_perf_map(target_pid) and
warn_bun_without_jitdump(target_pid), then in each location replace the if
spawn.is_none() { if let Some(target_pid) = pid { ... } } block with the same
pattern that calls the new helper to keep the spawn/pid check but centralize the
warnings.
In `@profile-bee/src/event_loop.rs`:
- Around line 544-550: The call to reload_jitdump_tables() in drain_events
causes unnecessary I/O during the collect_unsymbolized/raw-capture path; update
drain_events to skip calling reload_jitdump_tables() when symbolization is
disabled (check the symbolization flag used elsewhere, e.g. symbolize or the
collect_unsymbolized code path) so that reload_jitdump_tables() only runs when
symbolization is enabled; locate the drain_events implementation and add a
conditional guard around the reload_jitdump_tables() tail call (or early-return
for the raw path) so JIT parsing is avoided when symbolize == false or when
collect_unsymbolized is active.
- Around line 342-368: The current reload_jitdump_tables clones self.known_tgids
every call which is unnecessary and expensive; change the iteration to first
collect the PIDs into a small Vec<u32> (e.g.
self.known_tgids.iter().copied().collect()) so you release the borrow on self
before calling jitter lookups, then iterate that Vec and call
jitdump::find_jitdump_for_pid, trace_handler.jit_table_mut(tgid) and
try_load_jitdump_for_pid(tgid) as before; this removes the per-cycle HashSet
clone while preserving behavior and prevents holding a borrow across calls that
access the filesystem.
In `@profile-bee/src/jitdump.rs`:
- Around line 108-117: In reload_from_file, guard against rotated/truncated
files by validating the file before seeking: open the file, check
file.metadata().len() and if it's less than self.last_read_offset set
self.last_read_offset = 0 (or otherwise reset state), and/or read and verify the
header magic bytes at offset 0 match the expected JIT dump signature before
doing reader.seek(SeekFrom::Start(self.last_read_offset)); then proceed to call
self.read_records(&mut reader) and update symbols as before; make sure to
reference and update the fields last_read_offset and symbols and use the
existing read_records method when implementing the fallback reset.
- Around line 342-365: In find_jitdump_for_pid, avoid the useless allocation
from format!("jit-") by using the literal "jit-" and make the JSC/Bun filename
match precise by splitting the filename on '-' and verifying the PID is in the
expected segment: check that the split parts length is >= 3 and parts[2] ==
pid_str (instead of name.contains(&format!("-{}-", pid_str))). Keep the existing
starts_with("jit-") check (use the literal) and return entry.path() when the
split PID matches.
In `@tests/run_e2e.sh`:
- Around line 611-664: The two tests test_bun_samples_collected and
test_bun_jitdump_callstack duplicate the same setup and sample-count check; to
reduce redundant runs collapse them by removing test_bun_samples_collected and
moving its basic sample existence check into test_bun_jitdump_callstack before
the assert_stack_contains call (use the existing run_bun_profiler and
count_samples logic), or alternatively have test_bun_jitdump_callstack call
test_bun_samples_collected to reuse its check; update references to
run_bun_profiler, count_samples, and assert_stack_contains accordingly so only
one profiler run is performed per Bun fixture.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6edf4292-b853-4eae-9fdb-3dc02aead58e
📒 Files selected for processing (9)
profile-bee/bin/profile-bee.rsprofile-bee/src/event_loop.rsprofile-bee/src/jitdump.rsprofile-bee/src/lib.rsprofile-bee/src/spawn.rsprofile-bee/src/symbolize.rsprofile-bee/src/trace_handler.rstests/fixtures/src/bun_callstack.jstests/run_e2e.sh
| /// Parse a JIT_CODE_LOAD record. | ||
| /// | ||
| /// Fixed fields after record header: | ||
| /// pid(u32), tid(u32), vma(u64), code_addr(u64), code_size(u64), code_index(u64) | ||
| /// Followed by: null-terminated function name, then raw code bytes. | ||
| fn parse_code_load<R: Read>(&mut self, reader: &mut R, payload_size: u64) -> io::Result<()> { | ||
| // Fixed fields: 4 + 4 + 8 + 8 + 8 + 8 = 40 bytes | ||
| const FIXED_SIZE: u64 = 40; | ||
| if payload_size < FIXED_SIZE { | ||
| return Err(io::Error::new( | ||
| io::ErrorKind::InvalidData, | ||
| "JIT_CODE_LOAD too small", | ||
| )); | ||
| } | ||
|
|
||
| let _pid = read_u32(reader)?; | ||
| let _tid = read_u32(reader)?; | ||
| let _vma = read_u64(reader)?; | ||
| let code_addr = read_u64(reader)?; | ||
| let code_size = read_u64(reader)?; | ||
| let code_index = read_u64(reader)?; | ||
|
|
||
| // Remaining bytes: null-terminated name + code bytes | ||
| let remaining = payload_size - FIXED_SIZE; | ||
| let mut name_and_code = vec![0u8; remaining as usize]; | ||
| reader.read_exact(&mut name_and_code)?; | ||
|
|
||
| // Extract null-terminated name | ||
| let name_end = name_and_code | ||
| .iter() | ||
| .position(|&b| b == 0) | ||
| .unwrap_or(name_and_code.len()); | ||
| let name = String::from_utf8_lossy(&name_and_code[..name_end]).to_string(); | ||
|
|
||
| // Check for pending debug info | ||
| let source = self.pending_debug.remove(&code_index); | ||
|
|
||
| self.symbols.insert( | ||
| code_addr, | ||
| JitSymbol { | ||
| code_addr, | ||
| code_size, | ||
| name, | ||
| source, | ||
| }, | ||
| ); | ||
|
|
||
| Ok(()) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In the Linux perf jitdump specification, does the JIT_CODE_DEBUG_INFO record correlate with the following JIT_CODE_LOAD by matching code_addr, or by code_index?
💡 Result:
In the Linux perf jitdump specification, the JIT_CODE_DEBUG_INFO record correlates with the JIT_CODE_LOAD record by matching code_addr. Both records use the uint64_t code_addr field: in JIT_CODE_LOAD, it is the "code start address for the jitted code", and in JIT_CODE_DEBUG_INFO, it is the "address of function for which the debug information is generated". The specification requires that JIT_CODE_DEBUG_INFO must always be generated BEFORE the corresponding JIT_CODE_LOAD for the same function, which facilitates parsing by matching on code_addr (as addresses are known at debug info emission time). The code_index field in JIT_CODE_LOAD is a unique identifier but is not present in JIT_CODE_DEBUG_INFO; it is used in other records like JIT_CODE_MOVE.
Citations:
- 1: https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/jitdump-specification.txt
- 2: https://github.com/torvalds/linux/blob/46a51f4f5edade43ba66b3c151f0e25ec8b69cb6/tools/perf/Documentation/jitdump-specification.txt
- 3: https://android.googlesource.com/kernel/common/+/baa23246e93f/tools/perf/Documentation/jitdump-specification.txt?autodive=0%2F%2F%2F%2F%2F%2F%2F%2F
- 4: https://github.com/torvalds/linux/blob/master/tools/perf/util/jitdump.h
🏁 Script executed:
# First, let's examine the jitdump.rs file to verify the code at the specified lines
head -250 profile-bee/src/jitdump.rs | tail -80Repository: zz85/profile-bee
Length of output: 2676
🏁 Script executed:
# Also check the parse_debug_info function around lines 282-332
sed -n '282,332p' profile-bee/src/jitdump.rsRepository: zz85/profile-bee
Length of output: 1994
🏁 Script executed:
# Find where pending_debug is stored/looked up to confirm the mismatch
rg "pending_debug\." profile-bee/src/jitdump.rs -B 2 -A 2Repository: zz85/profile-bee
Length of output: 320
Debug-info lookup key mismatch: code_index vs code_addr.
parse_debug_info stores pending_debug keyed by the record's code_addr (line 315), but parse_code_load looks up pending_debug with code_index (line 234). Per the JITDump spec, a JIT_CODE_DEBUG_INFO record's code_addr field matches its corresponding JIT_CODE_LOAD's code_addr (not code_index) — so as written, source will essentially always be None and format_jit_symbol will never emit (file:line). The doc comment on parse_debug_info even says "used as a proxy for code_index", which appears to be the source of the confusion.
Suggested fix: key both sides by code_addr.
🐛 Proposed fix
- // Check for pending debug info
- let source = self.pending_debug.remove(&code_index);
+ // Check for pending debug info (keyed by code_addr per JITDump spec)
+ let source = self.pending_debug.remove(&code_addr);And update the doc comment on parse_debug_info (lines 288–290) to reflect that code_addr is the actual correlation key, not a "proxy for code_index". A regression test that emits a JIT_CODE_DEBUG_INFO followed by JIT_CODE_LOAD and asserts sym.source.is_some() would also be worthwhile.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /// Parse a JIT_CODE_LOAD record. | |
| /// | |
| /// Fixed fields after record header: | |
| /// pid(u32), tid(u32), vma(u64), code_addr(u64), code_size(u64), code_index(u64) | |
| /// Followed by: null-terminated function name, then raw code bytes. | |
| fn parse_code_load<R: Read>(&mut self, reader: &mut R, payload_size: u64) -> io::Result<()> { | |
| // Fixed fields: 4 + 4 + 8 + 8 + 8 + 8 = 40 bytes | |
| const FIXED_SIZE: u64 = 40; | |
| if payload_size < FIXED_SIZE { | |
| return Err(io::Error::new( | |
| io::ErrorKind::InvalidData, | |
| "JIT_CODE_LOAD too small", | |
| )); | |
| } | |
| let _pid = read_u32(reader)?; | |
| let _tid = read_u32(reader)?; | |
| let _vma = read_u64(reader)?; | |
| let code_addr = read_u64(reader)?; | |
| let code_size = read_u64(reader)?; | |
| let code_index = read_u64(reader)?; | |
| // Remaining bytes: null-terminated name + code bytes | |
| let remaining = payload_size - FIXED_SIZE; | |
| let mut name_and_code = vec![0u8; remaining as usize]; | |
| reader.read_exact(&mut name_and_code)?; | |
| // Extract null-terminated name | |
| let name_end = name_and_code | |
| .iter() | |
| .position(|&b| b == 0) | |
| .unwrap_or(name_and_code.len()); | |
| let name = String::from_utf8_lossy(&name_and_code[..name_end]).to_string(); | |
| // Check for pending debug info | |
| let source = self.pending_debug.remove(&code_index); | |
| self.symbols.insert( | |
| code_addr, | |
| JitSymbol { | |
| code_addr, | |
| code_size, | |
| name, | |
| source, | |
| }, | |
| ); | |
| Ok(()) | |
| } | |
| /// Parse a JIT_CODE_LOAD record. | |
| /// | |
| /// Fixed fields after record header: | |
| /// pid(u32), tid(u32), vma(u64), code_addr(u64), code_size(u64), code_index(u64) | |
| /// Followed by: null-terminated function name, then raw code bytes. | |
| fn parse_code_load<R: Read>(&mut self, reader: &mut R, payload_size: u64) -> io::Result<()> { | |
| // Fixed fields: 4 + 4 + 8 + 8 + 8 + 8 = 40 bytes | |
| const FIXED_SIZE: u64 = 40; | |
| if payload_size < FIXED_SIZE { | |
| return Err(io::Error::new( | |
| io::ErrorKind::InvalidData, | |
| "JIT_CODE_LOAD too small", | |
| )); | |
| } | |
| let _pid = read_u32(reader)?; | |
| let _tid = read_u32(reader)?; | |
| let _vma = read_u64(reader)?; | |
| let code_addr = read_u64(reader)?; | |
| let code_size = read_u64(reader)?; | |
| let code_index = read_u64(reader)?; | |
| // Remaining bytes: null-terminated name + code bytes | |
| let remaining = payload_size - FIXED_SIZE; | |
| let mut name_and_code = vec![0u8; remaining as usize]; | |
| reader.read_exact(&mut name_and_code)?; | |
| // Extract null-terminated name | |
| let name_end = name_and_code | |
| .iter() | |
| .position(|&b| b == 0) | |
| .unwrap_or(name_and_code.len()); | |
| let name = String::from_utf8_lossy(&name_and_code[..name_end]).to_string(); | |
| // Check for pending debug info (keyed by code_addr per JITDump spec) | |
| let source = self.pending_debug.remove(&code_addr); | |
| self.symbols.insert( | |
| code_addr, | |
| JitSymbol { | |
| code_addr, | |
| code_size, | |
| name, | |
| source, | |
| }, | |
| ); | |
| Ok(()) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@profile-bee/src/jitdump.rs` around lines 199 - 247, The bug is a key mismatch
between parse_debug_info and parse_code_load: change both sides to key
pending_debug by code_addr (u64) instead of using code_index; specifically, in
parse_code_load replace the lookup self.pending_debug.remove(&code_index) with
self.pending_debug.remove(&code_addr) and ensure parse_debug_info inserts into
pending_debug using the debug record's code_addr as the map key; also update the
parse_debug_info doc comment to state that code_addr is the correlation key (not
a "proxy for code_index") so future readers aren't confused.
| fn test_find_jitdump_nonexistent() { | ||
| // PID 0 should never have a JITDump file | ||
| assert!(find_jitdump_for_pid(0).is_none()); | ||
| } |
There was a problem hiding this comment.
test_find_jitdump_nonexistent can false-fail on a dirty /tmp.
Using find_jitdump_for_pid(0) scans the real /tmp and assumes no matching file exists. If a stale jit-0.dump or jit-<tid>-0-<hash> ever lingers (e.g., from a previous test run or a developer's machine), this test would fail nondeterministically. Consider either testing against a temp directory (would require making the /tmp path injectable) or using a PID with a random high-entropy suffix that is extremely unlikely to collide.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@profile-bee/src/jitdump.rs` around lines 689 - 692,
test_find_jitdump_nonexistent currently scans the real /tmp and can false-fail
if a stale jit file exists; update the test to use an isolated temporary
directory instead of the global /tmp by either (a) adding a helper that searches
a provided directory (e.g., find_jitdump_for_pid_in_dir(pid: u32, dir: &Path) or
make find_jitdump_for_pid accept a Path argument) and calling that helper from
the test with a tempdir created via tempfile::tempdir(), or (b) change the test
to use a randomized high PID that is extremely unlikely to exist; prefer option
(a) so change the test_find_jitdump_nonexistent to create a tempfile::TempDir,
call the new/modified search function (referencing find_jitdump_for_pid or new
find_jitdump_for_pid_in_dir) against that temp dir, and assert is_none().
| assert_stack_contains "$file" \ | ||
| "hot\|processData\|handleRequest\|serverLoop" \ | ||
| "Should contain JS function names from JITDump" |
There was a problem hiding this comment.
Weak JITDump assertion — pattern can match non-JIT frames.
The pattern hot\|processData\|handleRequest\|serverLoop is matched against the entire collapse file with grep -q, so any occurrence (including native Bun/JSC C++ symbols that happen to contain hot, e.g. hot, hotPath, etc., or internal runtime frames) will satisfy the assertion without proving JITDump resolution actually worked. Unlike the Node.js test, there's no fallback like LazyCompile|Builtins_JSEntry|v8:: to explicitly signal a JIT-stack-walking success path, meaning a regression where JITDump overrides silently stop applying could still pass this test as long as any symbol containing hot (substring) appears.
Consider anchoring the pattern more tightly to names that only exist in bun_callstack.js (e.g. more unique function names), or asserting on the collapse line format produced by JITDump resolution specifically, so the test actually fails when JITDump symbolization regresses.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/run_e2e.sh` around lines 661 - 663, The current regex passed to
assert_stack_contains ("hot\|processData\|handleRequest\|serverLoop") is too
loose and can match unrelated native symbols; tighten it by anchoring to exact
JS function names from bun_callstack.js (use word-boundaries like
\bprocessData\b,\bhandleRequest\b,\bserverLoop\b) or replace the check with an
assertion that matches JITDump's collapse-line format (e.g. the resolved
collapse line pattern produced by JITDump for bun_callstack.js) so the test only
passes when symbolization actually resolved JS function names; update the call
to assert_stack_contains to use the new anchored regex or format-specific
pattern.
Given that I'm using bunjs more often than nodejs makes sense to have this supported...
Add a JITDump binary format parser that resolves JIT-compiled JavaScript function names from runtimes like Bun (JavaScriptCore), Java HotSpot, and LuaJIT. Without this, JIT code in anonymous mmap'd pages shows as [unknown].
Implementation:
Summary by CodeRabbit