Add: L3-L2 message queue design#1130
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughAdds ChangesL3↔L2 SPSC Message Queue Design Specification
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive design document for an L3-L2 SPSC message queue wrapper, detailing its architecture, public API, region layout, descriptor ABI, and error handling. The feedback suggests a performance improvement to align the descriptor rings to 64 bytes instead of 8 bytes, preventing individual 32-byte descriptor slots from crossing cache line boundaries and causing split-cache-line accesses.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| - `depth` must be a power of two and `depth <= 2^30`. | ||
| - Queue capacity is `depth` messages, not `depth - 1`. | ||
| - Descriptor slot size is fixed at 32 bytes. | ||
| - Descriptor rings are 8-byte aligned. |
There was a problem hiding this comment.
Since each descriptor slot is 32 bytes, an 8-byte alignment for the descriptor rings allows individual slots to cross 64-byte cache line boundaries (for example, a slot starting at offset 56 would span bytes 56 to 87, crossing the cache line boundary at 64). This can lead to split-cache-line accesses and performance degradation, especially during high-frequency SPSC polling across L3 and L2.
Aligning the descriptor rings to 64 bytes (or at least 32 bytes) ensures that:
- No individual 32-byte descriptor slot ever crosses a 64-byte cache line boundary.
- The descriptor rings themselves are cache-line aligned, preventing potential false sharing or split accesses.
| - Descriptor rings are 8-byte aligned. | |
| - Descriptor rings are 64-byte aligned. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/l3-l2-message-queue-design.md`:
- Around line 609-610: The STOP encoding snippet is missing the explicit
payload_offset=0 field, so update the quick format example to match the earlier
STOP constraint and keep both required zero values visible. Make the change in
the STOP format documentation near the existing seq/opcode/payload_nbytes
example so the snippet consistently shows payload_offset=0 alongside
payload_nbytes=0.
- Around line 355-358: The replay rules text in the message-queue design doc
uses mixed coordinate systems for payload_offset, comparing a region-relative
value against an arena-relative one. Update the wording in the replay/release
section to use a single coordinate system consistently, matching the earlier
payload_offset definition, and revise the related wrap-padding/base-queue
explanation so the comparison and advance logic are described in the same terms.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 86f707a9-261f-4c82-9398-cf3da584a821
📒 Files selected for processing (1)
docs/l3-l2-message-queue-design.md
| `payload_head % arena_bytes` with the descriptor's arena-relative payload | ||
| offset. If they differ, the only valid base-queue case is wrap padding: the | ||
| descriptor offset is the base offset of this direction's arena and the | ||
| releaser first advances `payload_head` to the next arena cycle. It then |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Unify payload-offset coordinate system in replay rules.
payload_offset is defined as region-relative (Lines 226-227), but this section compares against an “arena-relative” value directly. That inconsistency can cause incorrect replay/release math and false poison or data corruption decisions.
Proposed wording fix
-- Padding has no descriptor. On release, the consumer compares
-- `payload_head % arena_bytes` with the descriptor's arena-relative payload
-- offset. If they differ, the only valid base-queue case is wrap padding: the
-- descriptor offset is the base offset of this direction's arena and the
+- Padding has no descriptor. On release, the consumer computes
+- `desc_arena_off = payload_offset - arena_base_offset` and compares
+- `payload_head % arena_bytes` with `desc_arena_off`. If they differ, the only
+- valid base-queue case is wrap padding: `desc_arena_off == 0` (i.e.
+- `payload_offset == arena_base_offset`) and the🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/l3-l2-message-queue-design.md` around lines 355 - 358, The replay rules
text in the message-queue design doc uses mixed coordinate systems for
payload_offset, comparing a region-relative value against an arena-relative one.
Update the wording in the replay/release section to use a single coordinate
system consistently, matching the earlier payload_offset definition, and revise
the related wrap-padding/base-queue explanation so the comparison and advance
logic are described in the same terms.
| seq + opcode=STOP + payload_nbytes=0 | ||
| ``` |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Make STOP encoding snippet include payload_offset=0 for consistency.
The STOP format here omits payload_offset=0, which is required earlier (Line 258). Keep both constraints in the quick format line to avoid ambiguous implementations.
Proposed wording fix
-seq + opcode=STOP + payload_nbytes=0
+seq + opcode=STOP + payload_nbytes=0 + payload_offset=0📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| seq + opcode=STOP + payload_nbytes=0 | |
| ``` | |
| seq + opcode=STOP + payload_nbytes=0 + payload_offset=0 |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/l3-l2-message-queue-design.md` around lines 609 - 610, The STOP encoding
snippet is missing the explicit payload_offset=0 field, so update the quick
format example to match the earlier STOP constraint and keep both required zero
values visible. Make the change in the STOP format documentation near the
existing seq/opcode/payload_nbytes example so the snippet consistently shows
payload_offset=0 alongside payload_nbytes=0.
85595a4 to
a474a9a
Compare
- Define the staged base queue transport design and PR1/PR2 split. - Add the base implementation plan for the queue stack.
a474a9a to
5962af6
Compare
- Implement the PR1 L3 queue wrapper and L2 endpoint ABI on top of the primitive L3-L2 orchestration region transport. - Wire Orchestrator.create_l3_l2_queue and cover descriptor layout, zero-byte messages, abort flags, capacity, and fast-path buffers in Python and C++ unit tests.
- Drop the base implementation guide from tracked PR1 files while keeping it available locally for PR2 planning. - Keep the L3-L2 queue Python tests compatible with the pyright target and ruff formatting used by CI.
e38004a to
04e3a4c
Compare
d1b814d to
1adf3c7
Compare
- Fail closed on queue layout uint64 overflow in C++ and Python mirror calculations - Validate cached L2 input handle metadata before release and use cached descriptor state - Gate C++ spin-loop timer reads and clean up Python regions on partial construction failure
1adf3c7 to
6107c81
Compare
- Add the user-facing L3-L2 message queue documentation. - Link the primitive L3-L2 orchestration communication doc to the queue wrapper doc. - Remove the design-stage document from the branch while leaving the local copy available for follow-up work.
b2b3bba to
eaaccc8
Compare
- Add strict payload and counter size checks to the L3-L2 queue task args. - Validate L2 input payload offsets before exposing payload views. - Document timeout, layout, and queue free semantics, and expand no-hardware tests.
eaaccc8 to
bf505cb
Compare
Summary
This PR adds
docs/l3-l2-message-queue.mdand implements the base L3-L2 SPSCmessage queue transport on top of the L3-L2 orchestration communication
primitives introduced by PR #1015.
The queue layer does not change the underlying primitive transport. Instead, it
defines and implements a higher-level protocol over the existing region
descriptor, payload byte range, and
int32_tsignal counter model. The goal isto allow one L3 orchestrator to exchange a sequence of input and output messages
with one persistent L2 orchestrator run, avoiding the cost of stopping and
restarting L2 between individual tasks.
This PR includes the public queue contract, the L3 Python queue wrapper and
Orchestrator API entry point, the L2 AICPU endpoint implementation, and Python
and C++ unit coverage for the transport ABI and core ownership/error paths.
Design Overview
The implemented base queue transport provides a bidirectional queue abstraction
with:
The base transport lands in this PR for reviewability. A future L2-side input
window helper can be added as a policy on top of the same descriptor ABI, region
layout, counter layout, and L3 queue API, without changing the L3 API.