Skip to content

Conversation

@hero78119
Copy link
Collaborator

@hero78119 hero78119 commented Jan 14, 2026

Design rationales

Refactor shard cycle logic into pre-flight tracer in low overhead way. With that, in preflight tracer we only update next_access with cross-shard memory access. This optimization bring much gain on preflight executor performance

benchmark

test on 23817600

Stage Before After
e2e 195s 161s
emulator.preflight-execute (total) 39.5 s · 20.10% 10.4 s · 6.44%
emulator.new-preflight-tracer 3.01 ms · 0.00%
emulator.init_mem 9.94 ms · 0.01%

@hero78119 hero78119 force-pushed the feat/optimize_executor branch from 0cdffd5 to 796c161 Compare January 14, 2026 14:48
@hero78119 hero78119 changed the base branch from feat/dispatch_and_alloc to feat/load_store_w January 14, 2026 14:49
@hero78119 hero78119 force-pushed the feat/optimize_executor branch from 796c161 to 3a3e7c3 Compare January 14, 2026 14:51
@hero78119 hero78119 changed the base branch from feat/load_store_w to feat/dispatch_and_alloc January 14, 2026 14:53
@hero78119 hero78119 force-pushed the feat/optimize_executor branch 6 times, most recently from 43e57cb to 21fd725 Compare January 15, 2026 14:40
@hero78119 hero78119 force-pushed the feat/optimize_executor branch from 21fd725 to de244ac Compare January 15, 2026 23:53
@hero78119 hero78119 force-pushed the feat/optimize_executor branch from de244ac to a17fdbd Compare January 15, 2026 23:57
@hero78119 hero78119 changed the title WIP optimize executor optimize preflight executor Jan 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants