Skip to content

Add: L3-L2 message queue design#1130

Open
ccyywwen wants to merge 6 commits into
hw-native-sys:mainfrom
ccyywwen:l3-l2-orch-message-queue
Open

Add: L3-L2 message queue design#1130
ccyywwen wants to merge 6 commits into
hw-native-sys:mainfrom
ccyywwen:l3-l2-orch-message-queue

Conversation

@ccyywwen

@ccyywwen ccyywwen commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR adds docs/l3-l2-message-queue.md and implements the base L3-L2 SPSC
message queue transport on top of the L3-L2 orchestration communication
primitives introduced by PR #1015.

The queue layer does not change the underlying primitive transport. Instead, it
defines and implements a higher-level protocol over the existing region
descriptor, payload byte range, and int32_t signal counter model. The goal is
to allow one L3 orchestrator to exchange a sequence of input and output messages
with one persistent L2 orchestrator run, avoiding the cost of stopping and
restarting L2 between individual tasks.

This PR includes the public queue contract, the L3 Python queue wrapper and
Orchestrator API entry point, the L2 AICPU endpoint implementation, and Python
and C++ unit coverage for the transport ABI and core ownership/error paths.

Design Overview

The implemented base queue transport provides a bidirectional queue abstraction
with:

  • an input queue for L3-to-L2 task input-data messages;
  • an output queue for L2-to-L3 result messages;
  • descriptor rings for message metadata;
  • payload arenas for message bodies;
  • cache-line-separated signal counters for producer/consumer coordination;
  • STOP, ERROR, release, timeout, lifetime, and poison semantics;
  • ABI validation shared by the L3-created region descriptor and the L2 endpoint;
  • lockstep Python/C++ layout tests for the mirrored layout calculation.

The base transport lands in this PR for reviewability. A future L2-side input
window helper can be added as a policy on top of the same descriptor ABI, region
layout, counter layout, and L3 queue API, without changing the L3 API.

@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 29582634-07e6-4018-9d45-3d9984d241fe

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds docs/l3-l2-message-queue-design.md, an 877-line design specification for a bidirectional SPSC message queue wrapper over existing L3-L2 orchestration primitives. The document covers API shape, region layout, descriptor ABI, counter/cursor semantics, operation sequences, L2 input window extension, STOP/lifetime/error/poison semantics, and a staged implementation plan with test requirements.

Changes

L3↔L2 SPSC Message Queue Design Specification

Layer / File(s) Summary
Purpose, API shape, region layout, and validation rules
docs/l3-l2-message-queue-design.md
Establishes scope, non-goals, target queue API (enqueue/dequeue/stop/free and peek/read/release variants), physical region partitioning into descriptor rings and input/output arenas, counter placement, and queue-creation validation rules.
Descriptor ABI, opcodes, and counter/cursor semantics
docs/l3-l2-message-queue-design.md
Specifies the 32-byte descriptor slot with four 64-bit little-endian fields, DATA/STOP/ERROR opcodes with direction constraints, 64-byte-strided shared int32 head/tail sampling, signed delta reconstruction, local payload cursor and replay rules, arena wrap-padding behavior, and poison conditions.
Core operation sequences and ownership contracts
docs/l3-l2-message-queue-design.md
Defines L3 reserve→fill→publish and L2 peek/acquire→read→release flows in both directions, single-outstanding-reservation constraints, timeout/try_* APIs, returned message shapes, and the guarantee that input release precedes AICore task completion.
L2 input window extension
docs/l3-l2-message-queue-design.md
Specifies the max_l2_inflight policy, ACQUIRED→COMPLETED→RELEASED state machine, explicit completion ownership, FIFO-safe prefix watermarking to prevent release holes, and STOP draining interaction.
STOP semantics and queue lifetime/cleanup
docs/l3-l2-message-queue-design.md
Documents STOP descriptor ordering guarantees, graceful shutdown semantics, terminal-for-input behavior, try_request_stop and bounded request_stop timeout APIs, and cleanup sequencing where Worker.run drains before memory release and queue.free() is handle-only.
Error/poison framework, implementation plan, and test coverage
docs/l3-l2-message-queue-design.md
Defines the poison guiding rule, enumerates poison vs. non-poison conditions, documents Python region state mirroring and C++ fatal-error reporting, outlines the two-stage implementation plan with file locations and hook points, and lists detailed test coverage requirements and future work items.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 A spec hops into the docs today,
With queues and descriptors all lined up in a row,
STOP means stop, and poison means nope,
FIFO watermarks keep the releases in hope,
Two stages to build it, with tests all aglow!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding an L3-L2 message queue design document.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The description is about the L3-L2 message queue design and transport, which matches the documented queue specification changes.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive design document for an L3-L2 SPSC message queue wrapper, detailing its architecture, public API, region layout, descriptor ABI, and error handling. The feedback suggests a performance improvement to align the descriptor rings to 64 bytes instead of 8 bytes, preventing individual 32-byte descriptor slots from crossing cache line boundaries and causing split-cache-line accesses.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread docs/l3-l2-message-queue-design.md Outdated
- `depth` must be a power of two and `depth <= 2^30`.
- Queue capacity is `depth` messages, not `depth - 1`.
- Descriptor slot size is fixed at 32 bytes.
- Descriptor rings are 8-byte aligned.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Since each descriptor slot is 32 bytes, an 8-byte alignment for the descriptor rings allows individual slots to cross 64-byte cache line boundaries (for example, a slot starting at offset 56 would span bytes 56 to 87, crossing the cache line boundary at 64). This can lead to split-cache-line accesses and performance degradation, especially during high-frequency SPSC polling across L3 and L2.

Aligning the descriptor rings to 64 bytes (or at least 32 bytes) ensures that:

  1. No individual 32-byte descriptor slot ever crosses a 64-byte cache line boundary.
  2. The descriptor rings themselves are cache-line aligned, preventing potential false sharing or split accesses.
Suggested change
- Descriptor rings are 8-byte aligned.
- Descriptor rings are 64-byte aligned.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/l3-l2-message-queue-design.md`:
- Around line 609-610: The STOP encoding snippet is missing the explicit
payload_offset=0 field, so update the quick format example to match the earlier
STOP constraint and keep both required zero values visible. Make the change in
the STOP format documentation near the existing seq/opcode/payload_nbytes
example so the snippet consistently shows payload_offset=0 alongside
payload_nbytes=0.
- Around line 355-358: The replay rules text in the message-queue design doc
uses mixed coordinate systems for payload_offset, comparing a region-relative
value against an arena-relative one. Update the wording in the replay/release
section to use a single coordinate system consistently, matching the earlier
payload_offset definition, and revise the related wrap-padding/base-queue
explanation so the comparison and advance logic are described in the same terms.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 86f707a9-261f-4c82-9398-cf3da584a821

📥 Commits

Reviewing files that changed from the base of the PR and between ae59a8e and 85595a4.

📒 Files selected for processing (1)
  • docs/l3-l2-message-queue-design.md

Comment thread docs/l3-l2-message-queue-design.md Outdated
Comment on lines +355 to +358
`payload_head % arena_bytes` with the descriptor's arena-relative payload
offset. If they differ, the only valid base-queue case is wrap padding: the
descriptor offset is the base offset of this direction's arena and the
releaser first advances `payload_head` to the next arena cycle. It then

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Unify payload-offset coordinate system in replay rules.

payload_offset is defined as region-relative (Lines 226-227), but this section compares against an “arena-relative” value directly. That inconsistency can cause incorrect replay/release math and false poison or data corruption decisions.

Proposed wording fix
-- Padding has no descriptor. On release, the consumer compares
--   `payload_head % arena_bytes` with the descriptor's arena-relative payload
--   offset. If they differ, the only valid base-queue case is wrap padding: the
--   descriptor offset is the base offset of this direction's arena and the
+- Padding has no descriptor. On release, the consumer computes
+-   `desc_arena_off = payload_offset - arena_base_offset` and compares
+-   `payload_head % arena_bytes` with `desc_arena_off`. If they differ, the only
+-   valid base-queue case is wrap padding: `desc_arena_off == 0` (i.e.
+-   `payload_offset == arena_base_offset`) and the
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/l3-l2-message-queue-design.md` around lines 355 - 358, The replay rules
text in the message-queue design doc uses mixed coordinate systems for
payload_offset, comparing a region-relative value against an arena-relative one.
Update the wording in the replay/release section to use a single coordinate
system consistently, matching the earlier payload_offset definition, and revise
the related wrap-padding/base-queue explanation so the comparison and advance
logic are described in the same terms.

Comment thread docs/l3-l2-message-queue-design.md Outdated
Comment on lines +609 to +610
seq + opcode=STOP + payload_nbytes=0
```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Make STOP encoding snippet include payload_offset=0 for consistency.

The STOP format here omits payload_offset=0, which is required earlier (Line 258). Keep both constraints in the quick format line to avoid ambiguous implementations.

Proposed wording fix
-seq + opcode=STOP + payload_nbytes=0
+seq + opcode=STOP + payload_nbytes=0 + payload_offset=0
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
seq + opcode=STOP + payload_nbytes=0
```
seq + opcode=STOP + payload_nbytes=0 + payload_offset=0
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/l3-l2-message-queue-design.md` around lines 609 - 610, The STOP encoding
snippet is missing the explicit payload_offset=0 field, so update the quick
format example to match the earlier STOP constraint and keep both required zero
values visible. Make the change in the STOP format documentation near the
existing seq/opcode/payload_nbytes example so the snippet consistently shows
payload_offset=0 alongside payload_nbytes=0.

@ccyywwen ccyywwen force-pushed the l3-l2-orch-message-queue branch from 85595a4 to a474a9a Compare June 26, 2026 06:41
- Define the staged base queue transport design and PR1/PR2 split.
- Add the base implementation plan for the queue stack.
@ccyywwen ccyywwen force-pushed the l3-l2-orch-message-queue branch from a474a9a to 5962af6 Compare June 26, 2026 07:50
ccyywwen added 2 commits June 26, 2026 18:24
- Implement the PR1 L3 queue wrapper and L2 endpoint ABI on top of
  the primitive L3-L2 orchestration region transport.
- Wire Orchestrator.create_l3_l2_queue and cover descriptor layout,
  zero-byte messages, abort flags, capacity, and fast-path buffers in
  Python and C++ unit tests.
- Drop the base implementation guide from tracked PR1 files while keeping
  it available locally for PR2 planning.
- Keep the L3-L2 queue Python tests compatible with the pyright target and
  ruff formatting used by CI.
@ccyywwen ccyywwen force-pushed the l3-l2-orch-message-queue branch from e38004a to 04e3a4c Compare June 29, 2026 02:43
@ccyywwen ccyywwen force-pushed the l3-l2-orch-message-queue branch 2 times, most recently from d1b814d to 1adf3c7 Compare June 30, 2026 03:23
- Fail closed on queue layout uint64 overflow in C++ and Python mirror calculations

- Validate cached L2 input handle metadata before release and use cached descriptor state

- Gate C++ spin-loop timer reads and clean up Python regions on partial construction failure
@ccyywwen ccyywwen force-pushed the l3-l2-orch-message-queue branch from 1adf3c7 to 6107c81 Compare June 30, 2026 07:28
- Add the user-facing L3-L2 message queue documentation.

- Link the primitive L3-L2 orchestration communication doc to the queue wrapper doc.

- Remove the design-stage document from the branch while leaving the local copy available for follow-up work.
@ccyywwen ccyywwen force-pushed the l3-l2-orch-message-queue branch from b2b3bba to eaaccc8 Compare July 1, 2026 02:19
- Add strict payload and counter size checks to the L3-L2 queue task args.

- Validate L2 input payload offsets before exposing payload views.

- Document timeout, layout, and queue free semantics, and expand no-hardware tests.
@ccyywwen ccyywwen force-pushed the l3-l2-orch-message-queue branch from eaaccc8 to bf505cb Compare July 1, 2026 03:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant