Add: L3-L2 message queue design by ccyywwen · Pull Request #1130 · hw-native-sys/simpler

ccyywwen · 2026-06-24T07:55:13Z

Summary

This PR adds docs/l3-l2-message-queue.md and implements the base L3-L2 SPSC
message queue transport on top of the L3-L2 orchestration communication
primitives introduced by PR #1015.

The queue layer does not change the underlying primitive transport. Instead, it
defines and implements a higher-level protocol over the existing region
descriptor, payload byte range, and int32_t signal counter model. The goal is
to allow one L3 orchestrator to exchange a sequence of input and output messages
with one persistent L2 orchestrator run, avoiding the cost of stopping and
restarting L2 between individual tasks.

This PR includes the public queue contract, the L3 Python queue wrapper and
Orchestrator API entry point, the L2 AICPU endpoint implementation, and Python
and C++ unit coverage for the transport ABI and core ownership/error paths.

Design Overview

The implemented base queue transport provides a bidirectional queue abstraction
with:

an input queue for L3-to-L2 task input-data messages;
an output queue for L2-to-L3 result messages;
descriptor rings for message metadata;
payload arenas for message bodies;
cache-line-separated signal counters for producer/consumer coordination;
STOP, ERROR, release, timeout, lifetime, and poison semantics;
ABI validation shared by the L3-created region descriptor and the L2 endpoint;
lockstep Python/C++ layout tests for the mirrored layout calculation.

The base transport lands in this PR for reviewability. A future L2-side input
window helper can be added as a policy on top of the same descriptor ABI, region
layout, counter layout, and L3 queue API, without changing the L3 API.

coderabbitai · 2026-06-24T07:55:48Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 29582634-07e6-4018-9d45-3d9984d241fe

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds docs/l3-l2-message-queue-design.md, an 877-line design specification for a bidirectional SPSC message queue wrapper over existing L3-L2 orchestration primitives. The document covers API shape, region layout, descriptor ABI, counter/cursor semantics, operation sequences, L2 input window extension, STOP/lifetime/error/poison semantics, and a staged implementation plan with test requirements.

Changes

L3↔L2 SPSC Message Queue Design Specification

Layer / File(s)	Summary
Purpose, API shape, region layout, and validation rules `docs/l3-l2-message-queue-design.md`	Establishes scope, non-goals, target queue API (enqueue/dequeue/stop/free and peek/read/release variants), physical region partitioning into descriptor rings and input/output arenas, counter placement, and queue-creation validation rules.
Descriptor ABI, opcodes, and counter/cursor semantics `docs/l3-l2-message-queue-design.md`	Specifies the 32-byte descriptor slot with four 64-bit little-endian fields, DATA/STOP/ERROR opcodes with direction constraints, 64-byte-strided shared int32 head/tail sampling, signed delta reconstruction, local payload cursor and replay rules, arena wrap-padding behavior, and poison conditions.
Core operation sequences and ownership contracts `docs/l3-l2-message-queue-design.md`	Defines L3 reserve→fill→publish and L2 peek/acquire→read→release flows in both directions, single-outstanding-reservation constraints, timeout/try_* APIs, returned message shapes, and the guarantee that input release precedes AICore task completion.
L2 input window extension `docs/l3-l2-message-queue-design.md`	Specifies the `max_l2_inflight` policy, ACQUIRED→COMPLETED→RELEASED state machine, explicit completion ownership, FIFO-safe prefix watermarking to prevent release holes, and STOP draining interaction.
STOP semantics and queue lifetime/cleanup `docs/l3-l2-message-queue-design.md`	Documents STOP descriptor ordering guarantees, graceful shutdown semantics, terminal-for-input behavior, `try_request_stop` and bounded `request_stop` timeout APIs, and cleanup sequencing where `Worker.run` drains before memory release and `queue.free()` is handle-only.
Error/poison framework, implementation plan, and test coverage `docs/l3-l2-message-queue-design.md`	Defines the poison guiding rule, enumerates poison vs. non-poison conditions, documents Python region state mirroring and C++ fatal-error reporting, outlines the two-stage implementation plan with file locations and hook points, and lists detailed test coverage requirements and future work items.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 A spec hops into the docs today,
With queues and descriptors all lined up in a row,
STOP means stop, and poison means nope,
FIFO watermarks keep the releases in hope,
Two stages to build it, with tests all aglow!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: adding an L3-L2 message queue design document.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The description is about the L3-L2 message queue design and transport, which matches the documented queue specification changes.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

gemini-code-assist

Code Review

This pull request introduces a comprehensive design document for an L3-L2 SPSC message queue wrapper, detailing its architecture, public API, region layout, descriptor ABI, and error handling. The feedback suggests a performance improvement to align the descriptor rings to 64 bytes instead of 8 bytes, preventing individual 32-byte descriptor slots from crossing cache line boundaries and causing split-cache-line accesses.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-24T07:56:42Z

+- `depth` must be a power of two and `depth <= 2^30`.
+- Queue capacity is `depth` messages, not `depth - 1`.
+- Descriptor slot size is fixed at 32 bytes.
+- Descriptor rings are 8-byte aligned.


Since each descriptor slot is 32 bytes, an 8-byte alignment for the descriptor rings allows individual slots to cross 64-byte cache line boundaries (for example, a slot starting at offset 56 would span bytes 56 to 87, crossing the cache line boundary at 64). This can lead to split-cache-line accesses and performance degradation, especially during high-frequency SPSC polling across L3 and L2.

Aligning the descriptor rings to 64 bytes (or at least 32 bytes) ensures that:

No individual 32-byte descriptor slot ever crosses a 64-byte cache line boundary.

The descriptor rings themselves are cache-line aligned, preventing potential false sharing or split accesses.

Suggested change

- Descriptor rings are 8-byte aligned.

- Descriptor rings are 64-byte aligned.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/l3-l2-message-queue-design.md`:
- Around line 609-610: The STOP encoding snippet is missing the explicit
payload_offset=0 field, so update the quick format example to match the earlier
STOP constraint and keep both required zero values visible. Make the change in
the STOP format documentation near the existing seq/opcode/payload_nbytes
example so the snippet consistently shows payload_offset=0 alongside
payload_nbytes=0.
- Around line 355-358: The replay rules text in the message-queue design doc
uses mixed coordinate systems for payload_offset, comparing a region-relative
value against an arena-relative one. Update the wording in the replay/release
section to use a single coordinate system consistently, matching the earlier
payload_offset definition, and revise the related wrap-padding/base-queue
explanation so the comparison and advance logic are described in the same terms.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 86f707a9-261f-4c82-9398-cf3da584a821

📥 Commits

Reviewing files that changed from the base of the PR and between ae59a8e and 85595a4.

📒 Files selected for processing (1)

docs/l3-l2-message-queue-design.md

coderabbitai · 2026-06-24T07:59:34Z

+  `payload_head % arena_bytes` with the descriptor's arena-relative payload
+  offset. If they differ, the only valid base-queue case is wrap padding: the
+  descriptor offset is the base offset of this direction's arena and the
+  releaser first advances `payload_head` to the next arena cycle. It then


🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Unify payload-offset coordinate system in replay rules.

payload_offset is defined as region-relative (Lines 226-227), but this section compares against an “arena-relative” value directly. That inconsistency can cause incorrect replay/release math and false poison or data corruption decisions.

Proposed wording fix

-- Padding has no descriptor. On release, the consumer compares -- `payload_head % arena_bytes` with the descriptor's arena-relative payload -- offset. If they differ, the only valid base-queue case is wrap padding: the -- descriptor offset is the base offset of this direction's arena and the +- Padding has no descriptor. On release, the consumer computes +- `desc_arena_off = payload_offset - arena_base_offset` and compares +- `payload_head % arena_bytes` with `desc_arena_off`. If they differ, the only +- valid base-queue case is wrap padding: `desc_arena_off == 0` (i.e. +- `payload_offset == arena_base_offset`) and the

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/l3-l2-message-queue-design.md` around lines 355 - 358, The replay rules text in the message-queue design doc uses mixed coordinate systems for payload_offset, comparing a region-relative value against an arena-relative one. Update the wording in the replay/release section to use a single coordinate system consistently, matching the earlier payload_offset definition, and revise the related wrap-padding/base-queue explanation so the comparison and advance logic are described in the same terms.

coderabbitai · 2026-06-24T07:59:34Z

+seq + opcode=STOP + payload_nbytes=0
+```


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Make STOP encoding snippet include payload_offset=0 for consistency.

The STOP format here omits payload_offset=0, which is required earlier (Line 258). Keep both constraints in the quick format line to avoid ambiguous implementations.

Proposed wording fix

-seq + opcode=STOP + payload_nbytes=0 +seq + opcode=STOP + payload_nbytes=0 + payload_offset=0

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

seq + opcode=STOP + payload_nbytes=0

```

seq + opcode=STOP + payload_nbytes=0 + payload_offset=0

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/l3-l2-message-queue-design.md` around lines 609 - 610, The STOP encoding snippet is missing the explicit payload_offset=0 field, so update the quick format example to match the earlier STOP constraint and keep both required zero values visible. Make the change in the STOP format documentation near the existing seq/opcode/payload_nbytes example so the snippet consistently shows payload_offset=0 alongside payload_nbytes=0.

- Define the staged base queue transport design and PR1/PR2 split. - Add the base implementation plan for the queue stack.

- Implement the PR1 L3 queue wrapper and L2 endpoint ABI on top of the primitive L3-L2 orchestration region transport. - Wire Orchestrator.create_l3_l2_queue and cover descriptor layout, zero-byte messages, abort flags, capacity, and fast-path buffers in Python and C++ unit tests.

- Drop the base implementation guide from tracked PR1 files while keeping it available locally for PR2 planning. - Keep the L3-L2 queue Python tests compatible with the pyright target and ruff formatting used by CI.

- Fail closed on queue layout uint64 overflow in C++ and Python mirror calculations - Validate cached L2 input handle metadata before release and use cached descriptor state - Gate C++ spin-loop timer reads and clean up Python regions on partial construction failure

- Add the user-facing L3-L2 message queue documentation. - Link the primitive L3-L2 orchestration communication doc to the queue wrapper doc. - Remove the design-stage document from the branch while leaving the local copy available for follow-up work.

- Add strict payload and counter size checks to the L3-L2 queue task args. - Validate L2 input payload offsets before exposing payload views. - Document timeout, layout, and queue free semantics, and expand no-hardware tests.

gemini-code-assist Bot reviewed Jun 24, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 24, 2026

View reviewed changes

ccyywwen force-pushed the l3-l2-orch-message-queue branch from 85595a4 to a474a9a Compare June 26, 2026 06:41

Add: L3-L2 message queue design

5962af6

- Define the staged base queue transport design and PR1/PR2 split. - Add the base implementation plan for the queue stack.

ccyywwen force-pushed the l3-l2-orch-message-queue branch from a474a9a to 5962af6 Compare June 26, 2026 07:50

ccyywwen added 2 commits June 26, 2026 18:24

Update: clean up L3 L2 queue PR1

04e3a4c

- Drop the base implementation guide from tracked PR1 files while keeping it available locally for PR2 planning. - Keep the L3-L2 queue Python tests compatible with the pyright target and ruff formatting used by CI.

ccyywwen force-pushed the l3-l2-orch-message-queue branch from e38004a to 04e3a4c Compare June 29, 2026 02:43

ccyywwen mentioned this pull request Jun 29, 2026

Add: L3-L2 message queue example #1187

Open

ccyywwen force-pushed the l3-l2-orch-message-queue branch 2 times, most recently from d1b814d to 1adf3c7 Compare June 30, 2026 03:23

ccyywwen force-pushed the l3-l2-orch-message-queue branch from 1adf3c7 to 6107c81 Compare June 30, 2026 07:28

ccyywwen force-pushed the l3-l2-orch-message-queue branch from b2b3bba to eaaccc8 Compare July 1, 2026 02:19

Fix: harden L3-L2 queue review feedback

bf505cb

- Add strict payload and counter size checks to the L3-L2 queue task args. - Validate L2 input payload offsets before exposing payload views. - Document timeout, layout, and queue free semantics, and expand no-hardware tests.

ccyywwen force-pushed the l3-l2-orch-message-queue branch from eaaccc8 to bf505cb Compare July 1, 2026 03:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add: L3-L2 message queue design#1130

Add: L3-L2 message queue design#1130
ccyywwen wants to merge 6 commits into
hw-native-sys:mainfrom
ccyywwen:l3-l2-orch-message-queue

ccyywwen commented Jun 24, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 24, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 24, 2026

Uh oh!

coderabbitai Bot Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	- Descriptor rings are 8-byte aligned.
	- Descriptor rings are 64-byte aligned.

	seq + opcode=STOP + payload_nbytes=0
	```
	seq + opcode=STOP + payload_nbytes=0 + payload_offset=0

Uh oh!

Conversation

ccyywwen commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design Overview

Uh oh!

coderabbitai Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ccyywwen commented Jun 24, 2026 •

edited

Loading

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading