feat: add JITDump support for Bun/JSC JIT symbol resolution by zz85 · Pull Request #96 · zz85/profile-bee

zz85 · 2026-04-19T00:09:42Z

Given that I'm using bunjs more often than nodejs makes sense to have this supported...

Add a JITDump binary format parser that resolves JIT-compiled JavaScript function names from runtimes like Bun (JavaScriptCore), Java HotSpot, and LuaJIT. Without this, JIT code in anonymous mmap'd pages shows as [unknown].

Implementation:

New jitdump.rs: zero-dependency parser with BTreeMap for O(log n) address range lookups, incremental reload for streaming modes, tolerates truncated files. Handles JIT_CODE_LOAD, JIT_CODE_MOVE, JIT_CODE_DEBUG_INFO, and JIT_CODE_CLOSE records. Discovers both standard (jit-.dump) and JSC (jit---) file naming conventions.
trace_handler.rs: JIT tables per PID, override [unknown] symbols after V8 SFI resolution (V8 takes priority for Node.js, JITDump for non-V8).
event_loop.rs: auto-load JITDump on new PID detection, reload at end of each collection window to catch symbols written during profiling.
spawn.rs: auto-detect bun/bunx and inject BUN_JSC_useJITDump=1.
profile-bee.rs: warn when --pid targets Bun without JITDump file.
symbolize.rs: JITDump override in offline re-symbolization path.
E2E tests: bun_callstack.js fixture + 2 tests verifying sample collection and JS function name resolution via JITDump.

Summary by CodeRabbit

New Features
- Added full profiling support for Bun runtime with JIT code symbol resolution.
- Automatic detection and warnings when profiling Bun processes without JIT symbol support.
- Enhanced stack trace symbolization to resolve JIT-compiled function names with source location information.
- Improved symbol display for compiled code across multiple runtime environments.

Add a JITDump binary format parser that resolves JIT-compiled JavaScript function names from runtimes like Bun (JavaScriptCore), Java HotSpot, and LuaJIT. Without this, JIT code in anonymous mmap'd pages shows as [unknown]. Implementation: - New jitdump.rs: zero-dependency parser with BTreeMap for O(log n) address range lookups, incremental reload for streaming modes, tolerates truncated files. Handles JIT_CODE_LOAD, JIT_CODE_MOVE, JIT_CODE_DEBUG_INFO, and JIT_CODE_CLOSE records. Discovers both standard (jit-<pid>.dump) and JSC (jit-<tid>-<pid>-<random>) file naming conventions. - trace_handler.rs: JIT tables per PID, override [unknown] symbols after V8 SFI resolution (V8 takes priority for Node.js, JITDump for non-V8). - event_loop.rs: auto-load JITDump on new PID detection, reload at end of each collection window to catch symbols written during profiling. - spawn.rs: auto-detect bun/bunx and inject BUN_JSC_useJITDump=1. - profile-bee.rs: warn when --pid targets Bun without JITDump file. - symbolize.rs: JITDump override in offline re-symbolization path. - E2E tests: bun_callstack.js fixture + 2 tests verifying sample collection and JS function name resolution via JITDump.

coderabbitai · 2026-04-19T00:09:56Z

📝 Walkthrough

Walkthrough

The changes implement end-to-end JITDump symbol support for Bun profiling by introducing a new jitdump module that parses JIT code records, integrating JIT table loading/reloading into the event loop and symbolizer, extending the trace handler with per-PID JIT symbol resolution, injecting BUN_JSC_useJITDump=1 into Bun child processes, and emitting guidance warnings when profiling running Bun processes without JITDump enabled. Tests validate the functionality.

Changes

Cohort / File(s)	Summary
JITDump Parsing Module `profile-bee/src/jitdump.rs`	New core module implementing `JitSymbol` and `JitSymbolTable` with BTreeMap-backed range-based address resolution, tolerant header/record parsing supporting `JIT_CODE_LOAD`, `JIT_CODE_MOVE`, `JIT_CODE_DEBUG_INFO`, and `JIT_CODE_CLOSE` record types, incremental reload capability, and utilities for locating JITDump files and formatting resolved symbols.
Symbol Resolution Integration `profile-bee/src/event_loop.rs`, `profile-bee/src/symbolize.rs`, `profile-bee/src/trace_handler.rs`	Event loop now loads JIT tables per-PID on first observation and reloads them at cycle end; trace handler stores per-PID JIT tables and resolves `[unknown]` frames via registered tables; symbolizer loads JIT tables at start and uses them as fallback for unresolved user frames.
Bun Runtime Detection & Environment `profile-bee/src/spawn.rs`, `profile-bee/src/lib.rs`	Added `is_bun_program()` detection helper; extended environment builder to inject `BUN_JSC_useJITDump=1` for Bun processes; exported new jitdump module.
User-Facing Warnings `profile-bee/bin/profile-bee.rs`	Added `warn_bun_without_jitdump()` helper that detects Bun processes lacking JITDump and emits guidance; integrated into three control paths (main batch, combined TUI+serve, TUI-only) to warn when targeting running processes.
Test Infrastructure `tests/fixtures/src/bun_callstack.js`, `tests/run_e2e.sh`	New Bun fixture with nested calls and JIT warmup loop; new E2E tests validating sample collection and expected JS function names in resolved call stacks.

Sequence Diagram

sequenceDiagram
    participant User
    participant Spawner as Spawner<br/>(spawn.rs)
    participant EventLoop as Event Loop<br/>(event_loop.rs)
    participant TraceHandler as Trace Handler<br/>(trace_handler.rs)
    participant JITDump as JITDump Module<br/>(jitdump.rs)
    participant Symbolizer as Symbolizer<br/>(symbolize.rs)

    User->>Spawner: Profile Bun process
    Note over Spawner: Detect Bun via is_bun_program()
    Spawner->>Spawner: Inject BUN_JSC_useJITDump=1
    
    EventLoop->>JITDump: find_jitdump_for_pid(tgid)
    JITDump-->>EventLoop: /tmp/jit-*.dump path
    EventLoop->>JITDump: load_from_file()
    JITDump-->>EventLoop: JitSymbolTable
    EventLoop->>TraceHandler: register_jit_table(tgid, table)
    
    EventLoop->>EventLoop: Process events, drain_events()
    Note over EventLoop: After each cycle
    EventLoop->>EventLoop: reload_jitdump_tables()
    
    Symbolizer->>JITDump: find_jitdump_for_pid(pid)
    JITDump-->>Symbolizer: Path to JITDump file
    Symbolizer->>JITDump: load_from_file()
    JITDump-->>Symbolizer: JitSymbolTable
    
    Symbolizer->>Symbolizer: Enumerate frames
    alt Frame is [unknown]
        Symbolizer->>JITDump: resolve(addr)
        JITDump-->>Symbolizer: JitSymbol (range-based lookup)
        Symbolizer->>JITDump: format_jit_symbol(sym)
        JITDump-->>Symbolizer: Formatted symbol with file:line
    end
    
    Symbolizer-->>User: Resolved call stack with JIT symbols

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat: add Node.js profiling support with JIT symbol resolution #94: Shares the same call site modifications in profile-bee/bin/profile-bee.rs to emit runtime-specific warnings and extends spawn.rs with runtime detection and environment injection logic.
refactor Library API, add new Output Formats + AWS CodeGuru Integration #83: Extends the same library modules (event_loop, trace_handler, spawn, lib) that were refactored to support JITDump integration.
Blazesym support #13: Modifies TraceHandler internals and public APIs, directly related to the per-PID JIT table storage and resolution infrastructure introduced here.

Poem

🐰 Bun hops in, JIT compiles fast,
Symbols dance where JITDumps are cast,
Event loops reload stacks with delight,
Every address glows, no more [unknown] blight!
Profiles now show the Bun's true flight~ 🚀

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately and specifically summarizes the main feature added: JITDump support for resolving JIT-compiled symbol names from Bun/JSC runtimes, which is the primary objective of this changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch bun

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

profile-bee/src/trace_handler.rs (1)
367-373: ⚠️ Potential issue | 🟡 Minor

Caching concern: JITDump overrides won't reflect later reloads.

get_exp_stacked_frames caches the symbolized result by (tgid, ktrace_id, utrace_id) for reuse. In streaming/TUI mode, event_loop::reload_jitdump_tables picks up new symbols between windows, but any stack that hit the cache before those symbols were loaded will keep serving [unknown] from the cached vector — JITDump-based overrides only apply on the first symbolization for a given stack ID. For long-lived processes that JIT functions progressively, this can leave stale [unknown] frames in the flamegraph despite the reload.

Consider invalidating (or at least refreshing the JIT-override-dependent entries of) the cache when reload_jitdump_tables adds new symbols for a PID, or performing the JIT override on the cached result at lookup time rather than at insertion.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@profile-bee/src/trace_handler.rs` around lines 367 - 373, The cached
symbolized stacks returned by get_exp_stacked_frames are becoming stale when
reload_jitdump_tables adds new JIT symbols; update the cache logic so JIT-dump
overrides are not permanently baked into cached entries—either invalidate or
version cache entries when reload_jitdump_tables updates symbols for a PID or
apply the JIT override at lookup time. Concretely: add a JIT-generation/version
or last-jit-update marker per tgid that reload_jitdump_tables increments,
include that marker in the cache key or store it with entries and check it in
get_exp_stacked_frames before returning self.cache.get(tgid, ktrace_id,
utrace_id), and if out-of-date re-run the JIT-override step (or
refresh/invalidate the entry) so newly loaded JIT symbols are applied. Ensure
references to self.cache.get, get_exp_stacked_frames and reload_jitdump_tables
are the touch points for this change.

🧹 Nitpick comments (6)

profile-bee/src/event_loop.rs (2)
544-550: reload_jitdump_tables() runs on every drain_events call, including the collect_unsymbolized path.

In raw-capture mode (collect_unsymbolized), symbolization is deliberately skipped for speed, but this unconditional tail call still does filesystem I/O and parsing per PID per window. Consider skipping the reload when symbolize == false, since the raw output path never reads the JIT tables anyway.
♻️ Proposed refactor
-        // Reload JITDump symbol tables at the end of the collection window.
-        // JIT runtimes (Bun, Java, etc.) write symbols incrementally as code
-        // is compiled — the initial load on first PID sight may have found an
-        // empty or nonexistent file. This final pass picks up all symbols
-        // written during the profiling window.
-        self.reload_jitdump_tables();
+        // Reload JITDump symbol tables at the end of the collection window,
+        // but only when we intend to symbolize — raw-capture mode doesn't
+        // consult these tables.
+        if symbolize {
+            self.reload_jitdump_tables();
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@profile-bee/src/event_loop.rs` around lines 544 - 550, The call to
reload_jitdump_tables() in drain_events causes unnecessary I/O during the
collect_unsymbolized/raw-capture path; update drain_events to skip calling
reload_jitdump_tables() when symbolization is disabled (check the symbolization
flag used elsewhere, e.g. symbolize or the collect_unsymbolized code path) so
that reload_jitdump_tables() only runs when symbolization is enabled; locate the
drain_events implementation and add a conditional guard around the
reload_jitdump_tables() tail call (or early-return for the raw path) so JIT
parsing is avoided when symbolize == false or when collect_unsymbolized is
active.
342-368: self.known_tgids.clone() per drain cycle is unnecessary.

reload_jitdump_tables is called at the tail of every drain_events (so once per collection window / TUI refresh). Cloning the whole HashSet<u32> each time is avoidable — collect into a small Vec<u32> once via .iter().copied().collect() if you need to release the borrow on self.known_tgids, or restructure to avoid the borrow entirely.

Also, on every tick this function does a stat/open per known PID (via find_jitdump_for_pid, which does read_dir("/tmp") for the JSC pattern path). For a system-wide profile with many PIDs this is O(N × dir_size) syscalls per window — probably fine in practice, but worth noting if profiling dense workloads.
♻️ Proposed refactor
-    pub fn reload_jitdump_tables(&mut self) {
-        for &tgid in &self.known_tgids.clone() {
+    pub fn reload_jitdump_tables(&mut self) {
+        let tgids: Vec<u32> = self.known_tgids.iter().copied().collect();
+        for tgid in tgids {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@profile-bee/src/event_loop.rs` around lines 342 - 368, The current
reload_jitdump_tables clones self.known_tgids every call which is unnecessary
and expensive; change the iteration to first collect the PIDs into a small
Vec<u32> (e.g. self.known_tgids.iter().copied().collect()) so you release the
borrow on self before calling jitter lookups, then iterate that Vec and call
jitdump::find_jitdump_for_pid, trace_handler.jit_table_mut(tgid) and
try_load_jitdump_for_pid(tgid) as before; this removes the per-cycle HashSet
clone while preserving behavior and prevents holding a borrow across calls that
access the filesystem.
profile-bee/src/jitdump.rs (2)
108-117: reload_from_file doesn't re-validate on file rotation.

If a JIT runtime rotates or truncates /tmp/jit-<pid>.dump between reads (e.g., process exited and a reused PID wrote a fresh file), seeking to the old last_read_offset will land mid-record of the new file and produce garbage before tripping the truncation fallback. Not a bug today given the eBPF lifecycle hooks invalidate caches on exec/exit, but worth guarding — e.g., checking file.metadata().len() < self.last_read_offset and resetting, or comparing the header magic bytes at offset 0 before seeking.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@profile-bee/src/jitdump.rs` around lines 108 - 117, In reload_from_file,
guard against rotated/truncated files by validating the file before seeking:
open the file, check file.metadata().len() and if it's less than
self.last_read_offset set self.last_read_offset = 0 (or otherwise reset state),
and/or read and verify the header magic bytes at offset 0 match the expected JIT
dump signature before doing reader.seek(SeekFrom::Start(self.last_read_offset));
then proceed to call self.read_records(&mut reader) and update symbols as
before; make sure to reference and update the fields last_read_offset and
symbols and use the existing read_records method when implementing the fallback
reset.
342-365: Minor: format!("jit-") is unnecessary and PID substring match can over-accept.

Two small points on find_jitdump_for_pid:

Line 351: let prefix = format!("jit-"); allocates via format! with no interpolation — "jit-" can be used directly (clippy useless_format).

The JSC pattern match name.contains(&format!("-{}-", pid_str)) is a substring test on the whole filename. For small PIDs this can incidentally match the <tid> or <random> segment rather than the intended <pid> segment (e.g. pid=5 matching jit-5-<otherpid>-abc or jit-123-456-5abc variants). Splitting on - and checking the PID is at position 2 would be more precise.
♻️ Proposed refactor
-    // JSC/Bun convention: /tmp/jit-<tid>-<pid>-<random>
-    // Scan /tmp/ for files matching this pattern.
-    let prefix = format!("jit-");
-    let pid_str = pid.to_string();
-    if let Ok(entries) = std::fs::read_dir("/tmp") {
-        for entry in entries.flatten() {
-            let name = entry.file_name();
-            let name = name.to_string_lossy();
-            // Match pattern: jit-<digits>-<pid>-<alphanum>
-            if name.starts_with(&prefix) && name.contains(&format!("-{}-", pid_str)) {
-                return Some(entry.path());
-            }
-        }
-    }
+    // JSC/Bun convention: /tmp/jit-<tid>-<pid>-<random>
+    let pid_str = pid.to_string();
+    if let Ok(entries) = std::fs::read_dir("/tmp") {
+        for entry in entries.flatten() {
+            let name = entry.file_name();
+            let name = name.to_string_lossy();
+            // Match pattern: jit-<tid>-<pid>-<random>
+            let parts: Vec<&str> = name.split('-').collect();
+            if parts.len() >= 4 && parts[0] == "jit" && parts[2] == pid_str {
+                return Some(entry.path());
+            }
+        }
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@profile-bee/src/jitdump.rs` around lines 342 - 365, In find_jitdump_for_pid,
avoid the useless allocation from format!("jit-") by using the literal "jit-"
and make the JSC/Bun filename match precise by splitting the filename on '-' and
verifying the PID is in the expected segment: check that the split parts length
is >= 3 and parts[2] == pid_str (instead of name.contains(&format!("-{}-",
pid_str))). Keep the existing starts_with("jit-") check (use the literal) and
return entry.path() when the split PID matches.
profile-bee/bin/profile-bee.rs (1)
612-617: Three near-duplicate warning blocks — consider extracting.

The spawn.is_none() { if let Some(target_pid) = pid { warn_nodejs_without_perf_map(target_pid); warn_bun_without_jitdump(target_pid); } } pattern is now repeated verbatim across the batch path, run_combined_mode, and run_tui_mode. A tiny helper (e.g. warn_unsupported_jit_runtimes(target_pid) wrapping both calls) would make future additions (Deno, Java, …) a one-liner.

Also applies to: 1664-1670, 1773-1779
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@profile-bee/bin/profile-bee.rs` around lines 612 - 617, Extract the
duplicated block that checks spawn.is_none() and, when pid is Some, calls
warn_nodejs_without_perf_map and warn_bun_without_jitdump into a small helper
function (e.g. warn_unsupported_jit_runtimes(target_pid: i32)) and replace the
three verbatim sites (the batch path, run_combined_mode, and run_tui_mode) with
a single call to that helper; specifically, create the helper that accepts the
target_pid and invokes warn_nodejs_without_perf_map(target_pid) and
warn_bun_without_jitdump(target_pid), then in each location replace the if
spawn.is_none() { if let Some(target_pid) = pid { ... } } block with the same
pattern that calls the new helper to keep the spawn/pid check but centralize the
warnings.
tests/run_e2e.sh (1)
611-664: Optional: test_bun_samples_collected largely duplicates the sample check inside test_bun_jitdump_callstack.

Both tests gate on bun/fixture presence, run run_bun_profiler, and assert count_samples > 0. The only extra behavior in test_bun_jitdump_callstack is the assert_stack_contains check. This mirrors the Node.js split (test_nodejs_samples_collected + test_nodejs_callstack), so it's consistent — but be aware each test spins up a fresh profile run (~1s + overhead), effectively doubling Bun test runtime for a check already covered by the callstack test. Consider collapsing into a single test if CI time becomes a concern.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/run_e2e.sh` around lines 611 - 664, The two tests
test_bun_samples_collected and test_bun_jitdump_callstack duplicate the same
setup and sample-count check; to reduce redundant runs collapse them by removing
test_bun_samples_collected and moving its basic sample existence check into
test_bun_jitdump_callstack before the assert_stack_contains call (use the
existing run_bun_profiler and count_samples logic), or alternatively have
test_bun_jitdump_callstack call test_bun_samples_collected to reuse its check;
update references to run_bun_profiler, count_samples, and assert_stack_contains
accordingly so only one profiler run is performed per Bun fixture.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@profile-bee/src/jitdump.rs`:
- Around line 689-692: test_find_jitdump_nonexistent currently scans the real
/tmp and can false-fail if a stale jit file exists; update the test to use an
isolated temporary directory instead of the global /tmp by either (a) adding a
helper that searches a provided directory (e.g.,
find_jitdump_for_pid_in_dir(pid: u32, dir: &Path) or make find_jitdump_for_pid
accept a Path argument) and calling that helper from the test with a tempdir
created via tempfile::tempdir(), or (b) change the test to use a randomized high
PID that is extremely unlikely to exist; prefer option (a) so change the
test_find_jitdump_nonexistent to create a tempfile::TempDir, call the
new/modified search function (referencing find_jitdump_for_pid or new
find_jitdump_for_pid_in_dir) against that temp dir, and assert is_none().
- Around line 199-247: The bug is a key mismatch between parse_debug_info and
parse_code_load: change both sides to key pending_debug by code_addr (u64)
instead of using code_index; specifically, in parse_code_load replace the lookup
self.pending_debug.remove(&code_index) with
self.pending_debug.remove(&code_addr) and ensure parse_debug_info inserts into
pending_debug using the debug record's code_addr as the map key; also update the
parse_debug_info doc comment to state that code_addr is the correlation key (not
a "proxy for code_index") so future readers aren't confused.

In `@tests/run_e2e.sh`:
- Around line 661-663: The current regex passed to assert_stack_contains
("hot\|processData\|handleRequest\|serverLoop") is too loose and can match
unrelated native symbols; tighten it by anchoring to exact JS function names
from bun_callstack.js (use word-boundaries like
\bprocessData\b,\bhandleRequest\b,\bserverLoop\b) or replace the check with an
assertion that matches JITDump's collapse-line format (e.g. the resolved
collapse line pattern produced by JITDump for bun_callstack.js) so the test only
passes when symbolization actually resolved JS function names; update the call
to assert_stack_contains to use the new anchored regex or format-specific
pattern.

---

Outside diff comments:
In `@profile-bee/src/trace_handler.rs`:
- Around line 367-373: The cached symbolized stacks returned by
get_exp_stacked_frames are becoming stale when reload_jitdump_tables adds new
JIT symbols; update the cache logic so JIT-dump overrides are not permanently
baked into cached entries—either invalidate or version cache entries when
reload_jitdump_tables updates symbols for a PID or apply the JIT override at
lookup time. Concretely: add a JIT-generation/version or last-jit-update marker
per tgid that reload_jitdump_tables increments, include that marker in the cache
key or store it with entries and check it in get_exp_stacked_frames before
returning self.cache.get(tgid, ktrace_id, utrace_id), and if out-of-date re-run
the JIT-override step (or refresh/invalidate the entry) so newly loaded JIT
symbols are applied. Ensure references to self.cache.get, get_exp_stacked_frames
and reload_jitdump_tables are the touch points for this change.

---

Nitpick comments:
In `@profile-bee/bin/profile-bee.rs`:
- Around line 612-617: Extract the duplicated block that checks spawn.is_none()
and, when pid is Some, calls warn_nodejs_without_perf_map and
warn_bun_without_jitdump into a small helper function (e.g.
warn_unsupported_jit_runtimes(target_pid: i32)) and replace the three verbatim
sites (the batch path, run_combined_mode, and run_tui_mode) with a single call
to that helper; specifically, create the helper that accepts the target_pid and
invokes warn_nodejs_without_perf_map(target_pid) and
warn_bun_without_jitdump(target_pid), then in each location replace the if
spawn.is_none() { if let Some(target_pid) = pid { ... } } block with the same
pattern that calls the new helper to keep the spawn/pid check but centralize the
warnings.

In `@profile-bee/src/event_loop.rs`:
- Around line 544-550: The call to reload_jitdump_tables() in drain_events
causes unnecessary I/O during the collect_unsymbolized/raw-capture path; update
drain_events to skip calling reload_jitdump_tables() when symbolization is
disabled (check the symbolization flag used elsewhere, e.g. symbolize or the
collect_unsymbolized code path) so that reload_jitdump_tables() only runs when
symbolization is enabled; locate the drain_events implementation and add a
conditional guard around the reload_jitdump_tables() tail call (or early-return
for the raw path) so JIT parsing is avoided when symbolize == false or when
collect_unsymbolized is active.
- Around line 342-368: The current reload_jitdump_tables clones self.known_tgids
every call which is unnecessary and expensive; change the iteration to first
collect the PIDs into a small Vec<u32> (e.g.
self.known_tgids.iter().copied().collect()) so you release the borrow on self
before calling jitter lookups, then iterate that Vec and call
jitdump::find_jitdump_for_pid, trace_handler.jit_table_mut(tgid) and
try_load_jitdump_for_pid(tgid) as before; this removes the per-cycle HashSet
clone while preserving behavior and prevents holding a borrow across calls that
access the filesystem.

In `@profile-bee/src/jitdump.rs`:
- Around line 108-117: In reload_from_file, guard against rotated/truncated
files by validating the file before seeking: open the file, check
file.metadata().len() and if it's less than self.last_read_offset set
self.last_read_offset = 0 (or otherwise reset state), and/or read and verify the
header magic bytes at offset 0 match the expected JIT dump signature before
doing reader.seek(SeekFrom::Start(self.last_read_offset)); then proceed to call
self.read_records(&mut reader) and update symbols as before; make sure to
reference and update the fields last_read_offset and symbols and use the
existing read_records method when implementing the fallback reset.
- Around line 342-365: In find_jitdump_for_pid, avoid the useless allocation
from format!("jit-") by using the literal "jit-" and make the JSC/Bun filename
match precise by splitting the filename on '-' and verifying the PID is in the
expected segment: check that the split parts length is >= 3 and parts[2] ==
pid_str (instead of name.contains(&format!("-{}-", pid_str))). Keep the existing
starts_with("jit-") check (use the literal) and return entry.path() when the
split PID matches.

In `@tests/run_e2e.sh`:
- Around line 611-664: The two tests test_bun_samples_collected and
test_bun_jitdump_callstack duplicate the same setup and sample-count check; to
reduce redundant runs collapse them by removing test_bun_samples_collected and
moving its basic sample existence check into test_bun_jitdump_callstack before
the assert_stack_contains call (use the existing run_bun_profiler and
count_samples logic), or alternatively have test_bun_jitdump_callstack call
test_bun_samples_collected to reuse its check; update references to
run_bun_profiler, count_samples, and assert_stack_contains accordingly so only
one profiler run is performed per Bun fixture.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6edf4292-b853-4eae-9fdb-3dc02aead58e

📥 Commits

Reviewing files that changed from the base of the PR and between 7fcb39e and fb1682b.

📒 Files selected for processing (9)

profile-bee/bin/profile-bee.rs
profile-bee/src/event_loop.rs
profile-bee/src/jitdump.rs
profile-bee/src/lib.rs
profile-bee/src/spawn.rs
profile-bee/src/symbolize.rs
profile-bee/src/trace_handler.rs
tests/fixtures/src/bun_callstack.js
tests/run_e2e.sh

coderabbitai · 2026-04-19T00:14:50Z

+    /// Parse a JIT_CODE_LOAD record.
+    ///
+    /// Fixed fields after record header:
+    ///   pid(u32), tid(u32), vma(u64), code_addr(u64), code_size(u64), code_index(u64)
+    /// Followed by: null-terminated function name, then raw code bytes.
+    fn parse_code_load<R: Read>(&mut self, reader: &mut R, payload_size: u64) -> io::Result<()> {
+        // Fixed fields: 4 + 4 + 8 + 8 + 8 + 8 = 40 bytes
+        const FIXED_SIZE: u64 = 40;
+        if payload_size < FIXED_SIZE {
+            return Err(io::Error::new(
+                io::ErrorKind::InvalidData,
+                "JIT_CODE_LOAD too small",
+            ));
+        }
+
+        let _pid = read_u32(reader)?;
+        let _tid = read_u32(reader)?;
+        let _vma = read_u64(reader)?;
+        let code_addr = read_u64(reader)?;
+        let code_size = read_u64(reader)?;
+        let code_index = read_u64(reader)?;
+
+        // Remaining bytes: null-terminated name + code bytes
+        let remaining = payload_size - FIXED_SIZE;
+        let mut name_and_code = vec![0u8; remaining as usize];
+        reader.read_exact(&mut name_and_code)?;
+
+        // Extract null-terminated name
+        let name_end = name_and_code
+            .iter()
+            .position(|&b| b == 0)
+            .unwrap_or(name_and_code.len());
+        let name = String::from_utf8_lossy(&name_and_code[..name_end]).to_string();
+
+        // Check for pending debug info
+        let source = self.pending_debug.remove(&code_index);
+
+        self.symbols.insert(
+            code_addr,
+            JitSymbol {
+                code_addr,
+                code_size,
+                name,
+                source,
+            },
+        );
+
+        Ok(())
+    }


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

In the Linux perf jitdump specification, does the JIT_CODE_DEBUG_INFO record correlate with the following JIT_CODE_LOAD by matching code_addr, or by code_index?

💡 Result:

In the Linux perf jitdump specification, the JIT_CODE_DEBUG_INFO record correlates with the JIT_CODE_LOAD record by matching code_addr. Both records use the uint64_t code_addr field: in JIT_CODE_LOAD, it is the "code start address for the jitted code", and in JIT_CODE_DEBUG_INFO, it is the "address of function for which the debug information is generated". The specification requires that JIT_CODE_DEBUG_INFO must always be generated BEFORE the corresponding JIT_CODE_LOAD for the same function, which facilitates parsing by matching on code_addr (as addresses are known at debug info emission time). The code_index field in JIT_CODE_LOAD is a unique identifier but is not present in JIT_CODE_DEBUG_INFO; it is used in other records like JIT_CODE_MOVE.

Citations:

1: https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/jitdump-specification.txt

2: https://github.com/torvalds/linux/blob/46a51f4f5edade43ba66b3c151f0e25ec8b69cb6/tools/perf/Documentation/jitdump-specification.txt

3: https://android.googlesource.com/kernel/common/+/baa23246e93f/tools/perf/Documentation/jitdump-specification.txt?autodive=0%2F%2F%2F%2F%2F%2F%2F%2F

4: https://github.com/torvalds/linux/blob/master/tools/perf/util/jitdump.h

🏁 Script executed:

# First, let's examine the jitdump.rs file to verify the code at the specified lines head -250 profile-bee/src/jitdump.rs | tail -80

Repository: zz85/profile-bee

Length of output: 2676

🏁 Script executed:

# Also check the parse_debug_info function around lines 282-332 sed -n '282,332p' profile-bee/src/jitdump.rs

Repository: zz85/profile-bee

Length of output: 1994

🏁 Script executed:

# Find where pending_debug is stored/looked up to confirm the mismatch rg "pending_debug\." profile-bee/src/jitdump.rs -B 2 -A 2

Repository: zz85/profile-bee

Length of output: 320

Debug-info lookup key mismatch: code_index vs code_addr.

parse_debug_info stores pending_debug keyed by the record's code_addr (line 315), but parse_code_load looks up pending_debug with code_index (line 234). Per the JITDump spec, a JIT_CODE_DEBUG_INFO record's code_addr field matches its corresponding JIT_CODE_LOAD's code_addr (not code_index) — so as written, source will essentially always be None and format_jit_symbol will never emit (file:line). The doc comment on parse_debug_info even says "used as a proxy for code_index", which appears to be the source of the confusion.

Suggested fix: key both sides by code_addr.

🐛 Proposed fix

- // Check for pending debug info - let source = self.pending_debug.remove(&code_index); + // Check for pending debug info (keyed by code_addr per JITDump spec) + let source = self.pending_debug.remove(&code_addr);

And update the doc comment on parse_debug_info (lines 288–290) to reflect that code_addr is the actual correlation key, not a "proxy for code_index". A regression test that emits a JIT_CODE_DEBUG_INFO followed by JIT_CODE_LOAD and asserts sym.source.is_some() would also be worthwhile.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

/// Parse a JIT_CODE_LOAD record.

///

/// Fixed fields after record header:

/// pid(u32), tid(u32), vma(u64), code_addr(u64), code_size(u64), code_index(u64)

/// Followed by: null-terminated function name, then raw code bytes.

fn parse_code_load<R: Read>(&mut self, reader: &mut R, payload_size: u64) -> io::Result<()> {

// Fixed fields: 4 + 4 + 8 + 8 + 8 + 8 = 40 bytes

const FIXED_SIZE: u64 = 40;

if payload_size < FIXED_SIZE {

return Err(io::Error::new(

io::ErrorKind::InvalidData,

"JIT_CODE_LOAD too small",

));

}

let _pid = read_u32(reader)?;

let _tid = read_u32(reader)?;

let _vma = read_u64(reader)?;

let code_addr = read_u64(reader)?;

let code_size = read_u64(reader)?;

let code_index = read_u64(reader)?;

// Remaining bytes: null-terminated name + code bytes

let remaining = payload_size - FIXED_SIZE;

let mut name_and_code = vec![0u8; remaining as usize];

reader.read_exact(&mut name_and_code)?;

// Extract null-terminated name

let name_end = name_and_code

.iter()

.position(|&b| b == 0)

.unwrap_or(name_and_code.len());

let name = String::from_utf8_lossy(&name_and_code[..name_end]).to_string();

// Check for pending debug info

let source = self.pending_debug.remove(&code_index);

self.symbols.insert(

code_addr,

JitSymbol {

code_addr,

code_size,

name,

source,

},

);

Ok(())

}

/// Parse a JIT_CODE_LOAD record.

///

/// Fixed fields after record header:

/// pid(u32), tid(u32), vma(u64), code_addr(u64), code_size(u64), code_index(u64)

/// Followed by: null-terminated function name, then raw code bytes.

fn parse_code_load<R: Read>(&mut self, reader: &mut R, payload_size: u64) -> io::Result<()> {

// Fixed fields: 4 + 4 + 8 + 8 + 8 + 8 = 40 bytes

const FIXED_SIZE: u64 = 40;

if payload_size < FIXED_SIZE {

return Err(io::Error::new(

io::ErrorKind::InvalidData,

"JIT_CODE_LOAD too small",

));

}

let _pid = read_u32(reader)?;

let _tid = read_u32(reader)?;

let _vma = read_u64(reader)?;

let code_addr = read_u64(reader)?;

let code_size = read_u64(reader)?;

let code_index = read_u64(reader)?;

// Remaining bytes: null-terminated name + code bytes

let remaining = payload_size - FIXED_SIZE;

let mut name_and_code = vec![0u8; remaining as usize];

reader.read_exact(&mut name_and_code)?;

// Extract null-terminated name

let name_end = name_and_code

.iter()

.position(|&b| b == 0)

.unwrap_or(name_and_code.len());

let name = String::from_utf8_lossy(&name_and_code[..name_end]).to_string();

// Check for pending debug info (keyed by code_addr per JITDump spec)

let source = self.pending_debug.remove(&code_addr);

self.symbols.insert(

code_addr,

JitSymbol {

code_addr,

code_size,

name,

source,

},

);

Ok(())

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@profile-bee/src/jitdump.rs` around lines 199 - 247, The bug is a key mismatch between parse_debug_info and parse_code_load: change both sides to key pending_debug by code_addr (u64) instead of using code_index; specifically, in parse_code_load replace the lookup self.pending_debug.remove(&code_index) with self.pending_debug.remove(&code_addr) and ensure parse_debug_info inserts into pending_debug using the debug record's code_addr as the map key; also update the parse_debug_info doc comment to state that code_addr is the correlation key (not a "proxy for code_index") so future readers aren't confused.

coderabbitai · 2026-04-19T00:14:51Z

+    fn test_find_jitdump_nonexistent() {
+        // PID 0 should never have a JITDump file
+        assert!(find_jitdump_for_pid(0).is_none());
+    }


⚠️ Potential issue | 🟡 Minor

test_find_jitdump_nonexistent can false-fail on a dirty /tmp.

Using find_jitdump_for_pid(0) scans the real /tmp and assumes no matching file exists. If a stale jit-0.dump or jit-<tid>-0-<hash> ever lingers (e.g., from a previous test run or a developer's machine), this test would fail nondeterministically. Consider either testing against a temp directory (would require making the /tmp path injectable) or using a PID with a random high-entropy suffix that is extremely unlikely to collide.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@profile-bee/src/jitdump.rs` around lines 689 - 692, test_find_jitdump_nonexistent currently scans the real /tmp and can false-fail if a stale jit file exists; update the test to use an isolated temporary directory instead of the global /tmp by either (a) adding a helper that searches a provided directory (e.g., find_jitdump_for_pid_in_dir(pid: u32, dir: &Path) or make find_jitdump_for_pid accept a Path argument) and calling that helper from the test with a tempdir created via tempfile::tempdir(), or (b) change the test to use a randomized high PID that is extremely unlikely to exist; prefer option (a) so change the test_find_jitdump_nonexistent to create a tempfile::TempDir, call the new/modified search function (referencing find_jitdump_for_pid or new find_jitdump_for_pid_in_dir) against that temp dir, and assert is_none().

coderabbitai · 2026-04-19T00:14:51Z

+    assert_stack_contains "$file" \
+        "hot\|processData\|handleRequest\|serverLoop" \
+        "Should contain JS function names from JITDump"


⚠️ Potential issue | 🟡 Minor

Weak JITDump assertion — pattern can match non-JIT frames.

The pattern hot\|processData\|handleRequest\|serverLoop is matched against the entire collapse file with grep -q, so any occurrence (including native Bun/JSC C++ symbols that happen to contain hot, e.g. hot, hotPath, etc., or internal runtime frames) will satisfy the assertion without proving JITDump resolution actually worked. Unlike the Node.js test, there's no fallback like LazyCompile|Builtins_JSEntry|v8:: to explicitly signal a JIT-stack-walking success path, meaning a regression where JITDump overrides silently stop applying could still pass this test as long as any symbol containing hot (substring) appears.

Consider anchoring the pattern more tightly to names that only exist in bun_callstack.js (e.g. more unique function names), or asserting on the collapse line format produced by JITDump resolution specifically, so the test actually fails when JITDump symbolization regresses.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/run_e2e.sh` around lines 661 - 663, The current regex passed to assert_stack_contains ("hot\|processData\|handleRequest\|serverLoop") is too loose and can match unrelated native symbols; tighten it by anchoring to exact JS function names from bun_callstack.js (use word-boundaries like \bprocessData\b,\bhandleRequest\b,\bserverLoop\b) or replace the check with an assertion that matches JITDump's collapse-line format (e.g. the resolved collapse line pattern produced by JITDump for bun_callstack.js) so the test only passes when symbolization actually resolved JS function names; update the call to assert_stack_contains to use the new anchored regex or format-specific pattern.

coderabbitai Bot reviewed Apr 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add JITDump support for Bun/JSC JIT symbol resolution#96

feat: add JITDump support for Bun/JSC JIT symbol resolution#96
zz85 wants to merge 1 commit into
mainfrom
bun

zz85 commented Apr 19, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 19, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 19, 2026

Uh oh!

coderabbitai Bot Apr 19, 2026

Uh oh!

coderabbitai Bot Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zz85 commented Apr 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zz85 commented Apr 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 19, 2026 •

edited

Loading