Add: L2 input window support to L3-L2 message queue by ccyywwen · Pull Request #1236 · hw-native-sys/simpler

ccyywwen · 2026-07-01T07:53:16Z

Summary

This is a stacked PR based on PR Add: L3-L2 message queue example #1187.
Add L2-side input-window support to the L3-L2 message queue through local endpoint config:
L3L2QueueEndpointConfig{.max_l2_input_inflight = N}.
Support multiple active L2 DATA / ERROR input handles, FIFO-safe delayed physical release, STOP drain, and queue.input().drained().
Fold the input-window scenario into the existing tensormap_and_ringbuffer/l3_l2_message_queue example, using PTO-ISA AIV tile compute while L2 keeps a persistent receive/process/publish loop.

Base / CI Notes

The PR base is main because #1187 has not merged into main yet. With this setup, GitHub CI will test the combined stack: #1187 plus this PR.

After #1187 is merged, this PR should be rebased onto the updated main so the review diff only shows the input-window changes.

- Define the staged base queue transport design and PR1/PR2 split. - Add the base implementation plan for the queue stack.

- Implement the PR1 L3 queue wrapper and L2 endpoint ABI on top of the primitive L3-L2 orchestration region transport. - Wire Orchestrator.create_l3_l2_queue and cover descriptor layout, zero-byte messages, abort flags, capacity, and fast-path buffers in Python and C++ unit tests.

- Drop the base implementation guide from tracked PR1 files while keeping it available locally for PR2 planning. - Keep the L3-L2 queue Python tests compatible with the pyright target and ruff formatting used by CI.

- Fail closed on queue layout uint64 overflow in C++ and Python mirror calculations - Validate cached L2 input handle metadata before release and use cached descriptor state - Gate C++ spin-loop timer reads and clean up Python regions on partial construction failure

- Add the user-facing L3-L2 message queue documentation. - Link the primitive L3-L2 orchestration communication doc to the queue wrapper doc. - Remove the design-stage document from the branch while leaving the local copy available for follow-up work.

- Add strict payload and counter size checks to the L3-L2 queue task args. - Validate L2 input payload offsets before exposing payload views. - Document timeout, layout, and queue free semantics, and expand no-hardware tests.

- Allow ordinary host buffers through lazy queue staging while keeping registered tensors on the no-staging fast path. - Add a tensormap_and_ringbuffer example that streams DATA and STOP messages through the L3-L2 queue on sim and onboard platforms.

- Keep dynamic tensor-like host buffer access explicit for pyright. - Build the example child callable across both positional and arg_index binding variants.

- Add the L2 input-window helper behavior to the queue endpoint - Add the new input-window example with PTO-ISA AIV tile compute - Extend C++ coverage for window capacity, STOP drain, and ownership

gemini-code-assist

Code Review

This pull request introduces the L3-L2 Message Queue, enabling ordered message exchange between an L3 Host Orchestrator and a persistent L2 AICPU Orchestrator task, complete with documentation, Python/C++ implementations, examples, and unit tests. The review feedback focuses on enhancing C++ memory safety and resource management within the l3_l2_message_queue.h header. Specifically, the reviewer recommends explicitly deleting copy and move operations for InputQueue and L3L2QueueEndpoint to prevent double-free vulnerabilities, and safeguarding InputQueue::initialize against potential memory leaks from multiple invocations.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-07-01T07:58:02Z

+        explicit InputQueue(L3L2QueueEndpoint *parent) :
+            parent_(parent) {}
+
+        ~InputQueue() { delete[] active_entries_; }


To prevent accidental copying or moving of InputQueue which manages the raw pointer active_entries_ (allocated via new[]), we should explicitly delete its copy and move constructors and assignment operators. This avoids potential double-free vulnerabilities or memory leaks.

explicit InputQueue(L3L2QueueEndpoint *parent) : parent_(parent) {} InputQueue(const InputQueue &) = delete; InputQueue &operator=(const InputQueue &) = delete; InputQueue(InputQueue &&) = delete; InputQueue &operator=(InputQueue &&) = delete; ~InputQueue() { delete[] active_entries_; }

gemini-code-assist · 2026-07-01T07:58:03Z

+    L3L2QueueEndpoint(
+        const L3L2OrchRegionDesc &desc, const L3L2QueueArgs &args, const L3L2QueueEndpointConfig &config
+    ) :
+        endpoint_(desc),
+        config_(config),
+        input_queue_(this),
+        output_queue_(this) {
+        if (endpoint_.error().kind != L3L2EndpointErrorKind::NONE ||
+            !l3_l2_queue_validate_region(desc, args, &layout_)) {
+            set_error(L3L2QueueErrorKind::BAD_DESCRIPTOR, "init", desc.region_id, "invalid queue descriptor");
+            return;
+        }
+        if (config_.max_l2_input_inflight == 0 || config_.max_l2_input_inflight > layout_.depth) {
+            set_error(L3L2QueueErrorKind::BAD_ARGUMENT, "init", desc.region_id, "invalid endpoint config");
+            return;
+        }
+        input_queue_.initialize(config_.max_l2_input_inflight);
+    }


To prevent accidental copying or moving of L3L2QueueEndpoint (which owns input_queue_ and manages its lifetime), we should explicitly delete its copy and move constructors and assignment operators. This ensures proper RAII and resource safety.

L3L2QueueEndpoint( const L3L2OrchRegionDesc &desc, const L3L2QueueArgs &args, const L3L2QueueEndpointConfig &config ) : endpoint_(desc), config_(config), input_queue_(this), output_queue_(this) { if (endpoint_.error().kind != L3L2EndpointErrorKind::NONE || !l3_l2_queue_validate_region(desc, args, &layout_)) { set_error(L3L2QueueErrorKind::BAD_DESCRIPTOR, "init", desc.region_id, "invalid queue descriptor"); return; } if (config_.max_l2_input_inflight == 0 || config_.max_l2_input_inflight > layout_.depth) { set_error(L3L2QueueErrorKind::BAD_ARGUMENT, "init", desc.region_id, "invalid endpoint config"); return; } input_queue_.initialize(config_.max_l2_input_inflight); } L3L2QueueEndpoint(const L3L2QueueEndpoint &) = delete; L3L2QueueEndpoint &operator=(const L3L2QueueEndpoint &) = delete; L3L2QueueEndpoint(L3L2QueueEndpoint &&) = delete; L3L2QueueEndpoint &operator=(L3L2QueueEndpoint &&) = delete;

gemini-code-assist · 2026-07-01T07:58:03Z

+        bool initialize(uint64_t max_l2_input_inflight) {
+            entry_capacity_ = max_l2_input_inflight + 1;
+            active_entries_ = new (std::nothrow) ActiveInputEntry[entry_capacity_];


If initialize is called multiple times on the same InputQueue instance, the previously allocated active_entries_ array will be leaked. To prevent memory leaks, we should explicitly delete any existing allocation before allocating a new one.

bool initialize(uint64_t max_l2_input_inflight) { if (active_entries_ != nullptr) { delete[] active_entries_; } entry_capacity_ = max_l2_input_inflight + 1; active_entries_ = new (std::nothrow) ActiveInputEntry[entry_capacity_];

coderabbitai · 2026-07-01T07:58:05Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 25129cda-823e-413e-8f53-3a0555a05e54

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds an ordered L3-L2 message queue: C++ ABI/endpoint header, Python queue wrapper (L3L2Queue) and orchestrator method, design docs, an end-to-end example (AIV kernel, orchestration kernel, Python script/test), and extensive C++/Python unit tests.

Changes

L3-L2 Message Queue

Layer / File(s)	Summary
Design documentation `docs/l3-l2-message-queue.md`, `docs/l3-l2-orch-comm.md`	New doc describes API, layout, descriptor ABI, counters, ownership/STOP semantics, and error handling; existing orch-comm doc cross-references it.
C++ queue ABI and endpoint `src/common/platform/include/aicpu/l3_l2_message_queue.h`	Defines opcodes, layout/validation, descriptor encoding, counter reconstruction, and `L3L2QueueEndpoint` with input/output queue peek/release/reserve/publish logic.
C++ endpoint tests `tests/ut/cpp/common/test_l3_l2_message_queue.cpp`, `tests/ut/cpp/CMakeLists.txt`	New GoogleTest suite and CTest target cover layout, encoding, counters, ownership, windowing, timeout/abort, and STOP-drain semantics.
Python queue wrapper `python/simpler/l3_l2_message_queue.py`	Implements layout computation, queue creation, host-buffer handling, and `L3L2Queue` enqueue/peek/read/release logic mirroring the C++ ABI.
Orchestrator integration `python/simpler/orchestrator.py`	Adds `create_l3_l2_queue` method delegating to the new module.
Python queue tests `tests/ut/py/test_worker/test_l3_l2_message_queue.py`	Fake orchestrator/client harness tests validate layout, creation, enqueue/dequeue, error handling, and timeouts.
End-to-end example `examples/a2a3/.../l3_l2_message_queue/*`	Adds AIV transform kernel, orchestration kernel processing DATA/STOP with windowed inputs, and Python script/pytest exercising the full flow.

Estimated code review effort: 5 (Critical) | ~120 minutes

Sequence Diagram(s)

sequenceDiagram
  participant L3Script as l3_l2_message_queue.py
  participant L3Queue as L3L2Queue (Python)
  participant Region as L3L2OrchRegion
  participant L2Endpoint as L3L2QueueEndpoint (C++)
  participant AIVKernel as kernel_queue_transform.cpp

  L3Script->>L3Queue: create_l3_l2_queue(depth, arena bytes)
  L3Queue->>Region: allocate descriptor rings + payload arenas
  L3Script->>L3Queue: enqueue(input payload, mode)
  L3Queue->>Region: write descriptor + payload, notify tail counter
  L2Endpoint->>Region: peek(timeout) input queue
  Region-->>L2Endpoint: DATA descriptor + payload
  L2Endpoint->>AIVKernel: rt_submit_aiv_task(tile/scalar op)
  AIVKernel-->>L2Endpoint: computed output tile
  L2Endpoint->>Region: reserve + publish output descriptor
  L3Script->>L3Queue: peek/read_into output
  L3Queue->>Region: read payload, release, advance head
  L3Script->>L3Queue: request_stop()
  L3Queue->>Region: publish STOP descriptor
  L2Endpoint->>Region: peek observes STOP, drains

Poem

A queue hops in, both ends aligned, 🐇
Descriptors ring where tiles reside,
STOP arrives, the drain runs clean,
Counters sync what each side has seen,
From L3 burrow to L2's den —
Message delivered, safe again!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.98% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: L2 input-window support for the L3-L2 message queue.
Description check	✅ Passed	The description matches the changeset and clearly describes the input-window feature, example updates, and stacked PR context.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

src/common/platform/include/aicpu/l3_l2_message_queue.h (1)

38-43: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Pin the descriptor ABI in the header.

The size/offset guarantees only live in the unit test right now. Adding static_asserts beside L3L2QueueDescSlot will make any ABI drift fail in every build, not just the test target.

Suggested hardening

 struct L3L2QueueDescSlot {
     uint64_t seq;
     uint64_t opcode;
     uint64_t payload_offset;
     uint64_t payload_nbytes;
 };
+
+static_assert(sizeof(L3L2QueueDescSlot) == 32, "L3L2QueueDescSlot ABI size changed");
+static_assert(offsetof(L3L2QueueDescSlot, seq) == 0, "L3L2QueueDescSlot::seq offset changed");
+static_assert(offsetof(L3L2QueueDescSlot, opcode) == 8, "L3L2QueueDescSlot::opcode offset changed");
+static_assert(offsetof(L3L2QueueDescSlot, payload_offset) == 16, "L3L2QueueDescSlot::payload_offset changed");
+static_assert(offsetof(L3L2QueueDescSlot, payload_nbytes) == 24, "L3L2QueueDescSlot::payload_nbytes changed");

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/common/platform/include/aicpu/l3_l2_message_queue.h` around lines 38 -
43, The descriptor ABI for L3L2QueueDescSlot is only enforced in tests right
now, so add compile-time static_assert checks next to the struct definition to
pin its size and member offsets. Use the L3L2QueueDescSlot type in the header to
assert the expected layout so any ABI drift fails in every build, not just the
unit test target.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/l3-l2-message-queue.md`:
- Around line 169-172: Update the message-queue docs around
queue.input().try_peek() and queue.output().try_reserve() to remove timeout from
the false-return contract, since these methods are non-blocking and false should
only cover no progress, validation failure, or poison; keep the guidance to
check queue.error().kind for terminal errors and ensure the wording is
consistent with the try_* API semantics.

In
`@examples/a2a3/tensormap_and_ringbuffer/l3_l2_message_queue/kernels/orchestration/l3_l2_message_queue_orch.cpp`:
- Around line 123-145: process_first_pair() leaves consumed ActiveRequest slots
populated after release_input(), which lets the next batch overwrite or reuse
stale handles. Clear the released entries in the active array after each
successful release and make the mode == 1 path verify that active[0] is free
before assigning a new input handle. Apply the same cleanup to the related batch
handling code in the other matching block so both paths reset
active[0]/active[1] consistently before continuing the receive/process/publish
loop.

In `@src/common/platform/include/aicpu/l3_l2_message_queue.h`:
- Around line 360-419: The input peek path in the queue handling logic accepts
STOP descriptors with payload bytes, which should be rejected by the STOP
contract. Update the `try_peek` flow in the `L3L2Queue` input handling to
validate `opcode == L3L2QueueOpcode::STOP` before any payload read, and poison
with `L3L2QueueErrorKind::INVALID_DESCRIPTOR` if `slot.payload_nbytes` is
nonzero. Ensure `payload_matches_head`, `endpoint_.payload_read`, and the
`active_entries_` / `input_payload_head_` bookkeeping are only reached for valid
zero-byte STOP descriptors.

---

Nitpick comments:
In `@src/common/platform/include/aicpu/l3_l2_message_queue.h`:
- Around line 38-43: The descriptor ABI for L3L2QueueDescSlot is only enforced
in tests right now, so add compile-time static_assert checks next to the struct
definition to pin its size and member offsets. Use the L3L2QueueDescSlot type in
the header to assert the expected layout so any ABI drift fails in every build,
not just the unit test target.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e6cfe6c1-0173-4585-8104-6d1cd2e9d978

📥 Commits

Reviewing files that changed from the base of the PR and between 0a8bc94 and b77bc2d.

📒 Files selected for processing (12)

docs/l3-l2-message-queue.md
docs/l3-l2-orch-comm.md
examples/a2a3/tensormap_and_ringbuffer/l3_l2_message_queue/kernels/aiv/kernel_queue_transform.cpp
examples/a2a3/tensormap_and_ringbuffer/l3_l2_message_queue/kernels/orchestration/l3_l2_message_queue_orch.cpp
examples/a2a3/tensormap_and_ringbuffer/l3_l2_message_queue/l3_l2_message_queue.py
examples/a2a3/tensormap_and_ringbuffer/l3_l2_message_queue/test_l3_l2_message_queue.py
python/simpler/l3_l2_message_queue.py
python/simpler/orchestrator.py
src/common/platform/include/aicpu/l3_l2_message_queue.h
tests/ut/cpp/CMakeLists.txt
tests/ut/cpp/common/test_l3_l2_message_queue.cpp
tests/ut/py/test_worker/test_l3_l2_message_queue.py

coderabbitai · 2026-07-01T08:13:54Z

+`queue.input().try_peek(&input)` and
+`queue.output().try_reserve(nbytes, &reservation)` are non-blocking. A `false`
+return can mean no progress, timeout, validation failure, or poison; check
+`queue.error().kind` to distinguish ordinary no-progress from terminal error.


📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Remove timeout from the try_* contract.

try_peek() and try_reserve() are non-blocking, so false here cannot mean timeout. Keeping timeout in this list makes the API contract internally inconsistent for callers.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/l3-l2-message-queue.md` around lines 169 - 172, Update the message-queue docs around queue.input().try_peek() and queue.output().try_reserve() to remove timeout from the false-return contract, since these methods are non-blocking and false should only cover no progress, validation failure, or poison; keep the guidance to check queue.error().kind for terminal errors and ensure the wording is consistent with the try_* API semantics.

coderabbitai · 2026-07-01T08:13:54Z

+bool process_first_pair(L3L2QueueEndpoint &queue, ActiveRequest *active, const L3L2QueueInputHandle &input) {
+    active[1].handle = input;
+    if (!parse_input_header(input, &active[1].header) || active[0].header.request_id == 0) {
+        rt_report_fatal(PTO2_ERROR_EXPLICIT_ORCH_FATAL, "invalid L3-L2 queue example request");
+        return false;
+    }
+    if (!publish_aiv_output(
+            queue, active[1].handle, active[1].handle, active[1].header.request_id, 20, 0, ADD_SCALAR, 20.0F
+        )) {
+        return false;
+    }
+    if (!release_input(queue, active[1].handle)) {
+        return false;
+    }
+    if (!publish_aiv_output(
+            queue, active[0].handle, active[0].handle, active[0].header.request_id, 10, 0, ADD_SCALAR, 10.0F
+        ) ||
+        !publish_aiv_output(
+            queue, active[0].handle, active[0].handle, active[0].header.request_id, 11, 0, ADD_SCALAR, 11.0F
+        )) {
+        return false;
+    }
+    return release_input(queue, active[0].handle);


🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Clear consumed active slots before the next batch.

process_first_pair() releases active[0]/active[1] but leaves both slots populated, and the mode == 1 path overwrites active[0] without checking whether a prior request is still pending. A second batch can therefore leak the overwritten handle or reuse a handle that was already released, which breaks the example’s persistent receive/process/publish loop.

Proposed fix

bool process_first_pair(L3L2QueueEndpoint &queue, ActiveRequest *active, const L3L2QueueInputHandle &input) { active[1].handle = input; if (!parse_input_header(input, &active[1].header) || active[0].header.request_id == 0) { rt_report_fatal(PTO2_ERROR_EXPLICIT_ORCH_FATAL, "invalid L3-L2 queue example request"); return false; @@ if (!release_input(queue, active[1].handle)) { return false; } + active[1] = {}; if (!publish_aiv_output( queue, active[0].handle, active[0].handle, active[0].header.request_id, 10, 0, ADD_SCALAR, 10.0F ) || !publish_aiv_output( queue, active[0].handle, active[0].handle, active[0].header.request_id, 11, 0, ADD_SCALAR, 11.0F )) { return false; } - return release_input(queue, active[0].handle); + if (!release_input(queue, active[0].handle)) { + return false; + } + active[0] = {}; + return true; } @@ if (header.mode == 1) { + if (active[0].header.request_id != 0) { + rt_report_fatal(PTO2_ERROR_EXPLICIT_ORCH_FATAL, "L3-L2 queue example received mode=1 while a request is still pending"); + return false; + } active[0].handle = input; active[0].header = header; return true; }

Also applies to: 169-178

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/a2a3/tensormap_and_ringbuffer/l3_l2_message_queue/kernels/orchestration/l3_l2_message_queue_orch.cpp` around lines 123 - 145, process_first_pair() leaves consumed ActiveRequest slots populated after release_input(), which lets the next batch overwrite or reuse stale handles. Clear the released entries in the active array after each successful release and make the mode == 1 path verify that active[0] is free before assigning a new input handle. Apply the same cleanup to the related batch handling code in the other matching block so both paths reset active[0]/active[1] consistently before continuing the receive/process/publish loop.

coderabbitai · 2026-07-01T08:13:54Z

+            L3L2QueueOpcode opcode = static_cast<L3L2QueueOpcode>(slot.opcode);
+            if (!l3_l2_queue_valid_opcode(opcode)) {
+                parent_->poison(L3L2QueueErrorKind::INVALID_DESCRIPTOR, "input.try_peek", "invalid input opcode");
+                return false;
+            }
+            bool counts_against_window = opcode == L3L2QueueOpcode::DATA || opcode == L3L2QueueOpcode::ERROR;
+            if (counts_against_window && active_non_stop_count_ >= parent_->config_.max_l2_input_inflight) {
+                return false;
+            }
+            if (active_count_ >= entry_capacity_) {
+                parent_->poison(L3L2QueueErrorKind::OWNERSHIP, "input.try_peek", "input window state full");
+                return false;
+            }
+
+            L3L2OrchPayloadView view{0, 0};
+            if (slot.payload_nbytes == 0) {
+                if (slot.payload_offset != 0) {
+                    parent_->poison(
+                        L3L2QueueErrorKind::INVALID_DESCRIPTOR, "input.try_peek",
+                        "zero-byte descriptor uses nonzero payload offset"
+                    );
+                    return false;
+                }
+            } else if (!parent_->payload_in_arena(
+                           slot.payload_offset, slot.payload_nbytes, parent_->layout_.input_arena_offset,
+                           parent_->layout_.input_arena_bytes
+                       )) {
+                parent_->poison(L3L2QueueErrorKind::INVALID_DESCRIPTOR, "input.try_peek", "input payload out of arena");
+                return false;
+            } else if (!parent_->payload_matches_head(
+                           parent_->input_payload_head_, slot.payload_offset, slot.payload_nbytes,
+                           parent_->layout_.input_arena_offset, parent_->layout_.input_arena_bytes, "input.try_peek"
+                       )) {
+                return false;
+            } else if (!parent_->endpoint_.payload_read(slot.payload_offset, slot.payload_nbytes, &view)) {
+                parent_->poison(
+                    L3L2QueueErrorKind::ENDPOINT_ERROR, "input.try_peek", parent_->endpoint_.error().message
+                );
+                return false;
+            }
+
+            *out = L3L2QueueInputHandle{slot.seq, opcode, slot.payload_offset, slot.payload_nbytes, view};
+            active_entries_[active_count_] =
+                ActiveInputEntry{slot.seq, opcode, slot.payload_offset, slot.payload_nbytes, view, false};
+            active_count_ += 1;
+            if (counts_against_window) {
+                active_non_stop_count_ += 1;
+            }
+            input_acquire_ += 1;
+            if (opcode == L3L2QueueOpcode::STOP) {
+                stop_observed_ = true;
+                if (parent_->input_tail_ != input_acquire_) {
+                    parent_->poison(
+                        L3L2QueueErrorKind::INVALID_DESCRIPTOR, "input.try_peek",
+                        "input descriptor published after STOP"
+                    );
+                    return false;
+                }
+            }
+            return true;


🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Reject STOP descriptors that carry payload bytes.

The STOP contract is zero-byte, but this path currently accepts opcode == STOP with nonzero payload_nbytes, exposes a payload view, and then advances input_payload_head_ on release. That lets malformed shared state bypass INVALID_DESCRIPTOR poison and breaks the documented STOP ABI.

Suggested fix

L3L2QueueOpcode opcode = static_cast<L3L2QueueOpcode>(slot.opcode); if (!l3_l2_queue_valid_opcode(opcode)) { parent_->poison(L3L2QueueErrorKind::INVALID_DESCRIPTOR, "input.try_peek", "invalid input opcode"); return false; } + if (opcode == L3L2QueueOpcode::STOP && (slot.payload_offset != 0 || slot.payload_nbytes != 0)) { + parent_->poison( + L3L2QueueErrorKind::INVALID_DESCRIPTOR, "input.try_peek", "STOP descriptor must be zero-byte" + ); + return false; + } bool counts_against_window = opcode == L3L2QueueOpcode::DATA || opcode == L3L2QueueOpcode::ERROR;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

L3L2QueueOpcode opcode = static_cast<L3L2QueueOpcode>(slot.opcode);

if (!l3_l2_queue_valid_opcode(opcode)) {

parent_->poison(L3L2QueueErrorKind::INVALID_DESCRIPTOR, "input.try_peek", "invalid input opcode");

return false;

}

bool counts_against_window = opcode == L3L2QueueOpcode::DATA || opcode == L3L2QueueOpcode::ERROR;

if (counts_against_window && active_non_stop_count_ >= parent_->config_.max_l2_input_inflight) {

return false;

}

if (active_count_ >= entry_capacity_) {

parent_->poison(L3L2QueueErrorKind::OWNERSHIP, "input.try_peek", "input window state full");

return false;

}

L3L2OrchPayloadView view{0, 0};

if (slot.payload_nbytes == 0) {

if (slot.payload_offset != 0) {

parent_->poison(

L3L2QueueErrorKind::INVALID_DESCRIPTOR, "input.try_peek",

"zero-byte descriptor uses nonzero payload offset"

);

return false;

}

} else if (!parent_->payload_in_arena(

slot.payload_offset, slot.payload_nbytes, parent_->layout_.input_arena_offset,

parent_->layout_.input_arena_bytes

)) {

parent_->poison(L3L2QueueErrorKind::INVALID_DESCRIPTOR, "input.try_peek", "input payload out of arena");

return false;

} else if (!parent_->payload_matches_head(

parent_->input_payload_head_, slot.payload_offset, slot.payload_nbytes,

parent_->layout_.input_arena_offset, parent_->layout_.input_arena_bytes, "input.try_peek"

)) {

return false;

} else if (!parent_->endpoint_.payload_read(slot.payload_offset, slot.payload_nbytes, &view)) {

parent_->poison(

L3L2QueueErrorKind::ENDPOINT_ERROR, "input.try_peek", parent_->endpoint_.error().message

);

return false;

}

*out = L3L2QueueInputHandle{slot.seq, opcode, slot.payload_offset, slot.payload_nbytes, view};

active_entries_[active_count_] =

ActiveInputEntry{slot.seq, opcode, slot.payload_offset, slot.payload_nbytes, view, false};

active_count_ += 1;

if (counts_against_window) {

active_non_stop_count_ += 1;

}

input_acquire_ += 1;

if (opcode == L3L2QueueOpcode::STOP) {

stop_observed_ = true;

if (parent_->input_tail_ != input_acquire_) {

parent_->poison(

L3L2QueueErrorKind::INVALID_DESCRIPTOR, "input.try_peek",

"input descriptor published after STOP"

);

return false;

}

}

return true;

L3L2QueueOpcode opcode = static_cast<L3L2QueueOpcode>(slot.opcode);

if (!l3_l2_queue_valid_opcode(opcode)) {

parent_->poison(L3L2QueueErrorKind::INVALID_DESCRIPTOR, "input.try_peek", "invalid input opcode");

return false;

}

if (opcode == L3L2QueueOpcode::STOP && (slot.payload_offset != 0 || slot.payload_nbytes != 0)) {

parent_->poison(

L3L2QueueErrorKind::INVALID_DESCRIPTOR, "input.try_peek", "STOP descriptor must be zero-byte"

);

return false;

}

bool counts_against_window = opcode == L3L2QueueOpcode::DATA || opcode == L3L2QueueOpcode::ERROR;

if (counts_against_window && active_non_stop_count_ >= parent_->config_.max_l2_input_inflight) {

return false;

}

if (active_count_ >= entry_capacity_) {

parent_->poison(L3L2QueueErrorKind::OWNERSHIP, "input.try_peek", "input window state full");

return false;

}

L3L2OrchPayloadView view{0, 0};

if (slot.payload_nbytes == 0) {

if (slot.payload_offset != 0) {

parent_->poison(

L3L2QueueErrorKind::INVALID_DESCRIPTOR, "input.try_peek",

"zero-byte descriptor uses nonzero payload offset"

);

return false;

}

} else if (!parent_->payload_in_arena(

slot.payload_offset, slot.payload_nbytes, parent_->layout_.input_arena_offset,

parent_->layout_.input_arena_bytes

)) {

parent_->poison(L3L2QueueErrorKind::INVALID_DESCRIPTOR, "input.try_peek", "input payload out of arena");

return false;

} else if (!parent_->payload_matches_head(

parent_->input_payload_head_, slot.payload_offset, slot.payload_nbytes,

parent_->layout_.input_arena_offset, parent_->layout_.input_arena_bytes, "input.try_peek"

)) {

return false;

} else if (!parent_->endpoint_.payload_read(slot.payload_offset, slot.payload_nbytes, &view)) {

parent_->poison(

L3L2QueueErrorKind::ENDPOINT_ERROR, "input.try_peek", parent_->endpoint_.error().message

);

return false;

}

*out = L3L2QueueInputHandle{slot.seq, opcode, slot.payload_offset, slot.payload_nbytes, view};

active_entries_[active_count_] =

ActiveInputEntry{slot.seq, opcode, slot.payload_offset, slot.payload_nbytes, view, false};

active_count_ += 1;

if (counts_against_window) {

active_non_stop_count_ += 1;

}

input_acquire_ += 1;

if (opcode == L3L2QueueOpcode::STOP) {

stop_observed_ = true;

if (parent_->input_tail_ != input_acquire_) {

parent_->poison(

L3L2QueueErrorKind::INVALID_DESCRIPTOR, "input.try_peek",

"input descriptor published after STOP"

);

return false;

}

}

return true;

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/common/platform/include/aicpu/l3_l2_message_queue.h` around lines 360 - 419, The input peek path in the queue handling logic accepts STOP descriptors with payload bytes, which should be rejected by the STOP contract. Update the `try_peek` flow in the `L3L2Queue` input handling to validate `opcode == L3L2QueueOpcode::STOP` before any payload read, and poison with `L3L2QueueErrorKind::INVALID_DESCRIPTOR` if `slot.payload_nbytes` is nonzero. Ensure `payload_matches_head`, `endpoint_.payload_read`, and the `active_entries_` / `input_payload_head_` bookkeeping are only reached for valid zero-byte STOP descriptors.

- Let L2 peek spin on input timeout while L3 reads outputs between bursts - Track input_payload_acquire_head separately from release head so multi- inflight non-zero payload peeks no longer fail offset validation - Add C++ UT for non-zero payload offsets before release

ccyywwen added 14 commits June 26, 2026 15:49

Add: L3-L2 message queue design

5962af6

- Define the staged base queue transport design and PR1/PR2 split. - Add the base implementation plan for the queue stack.

Update: clean up L3 L2 queue PR1

04e3a4c

- Drop the base implementation guide from tracked PR1 files while keeping it available locally for PR2 planning. - Keep the L3-L2 queue Python tests compatible with the pyright target and ruff formatting used by CI.

Fix: harden L3-L2 queue review feedback

bf505cb

- Add strict payload and counter size checks to the L3-L2 queue task args. - Validate L2 input payload offsets before exposing payload views. - Document timeout, layout, and queue free semantics, and expand no-hardware tests.

Fix: satisfy L3-L2 queue CI checks

5bdb5d8

- Keep dynamic tensor-like host buffer access explicit for pyright. - Build the example child callable across both positional and arg_index binding variants.

Fix: defer L3-L2 queue staging on backpressure

10d78d1

feat(example): switch L3-L2 queue to PTO-ISA tiles

08cd240

fix(example): pass full L3-L2 queue args to L2

353ebf6

Add: L2 input window example with AIV tiles

a508169

- Add the L2 input-window helper behavior to the queue endpoint - Add the new input-window example with PTO-ISA AIV tile compute - Extend C++ coverage for window capacity, STOP drain, and ownership

Fix: adapt input window to queue args

cc7f78c

Update: fold input window into queue example

b77bc2d

gemini-code-assist Bot reviewed Jul 1, 2026

View reviewed changes

coderabbitai Bot reviewed Jul 1, 2026

View reviewed changes

ccyywwen force-pushed the l3-l2-message-queue-input-window branch from 5da6cc4 to 008ba6f Compare July 1, 2026 09:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add: L2 input window support to L3-L2 message queue#1236

Add: L2 input window support to L3-L2 message queue#1236
ccyywwen wants to merge 15 commits into
hw-native-sys:mainfrom
ccyywwen:l3-l2-message-queue-input-window

ccyywwen commented Jul 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Uh oh!

coderabbitai Bot commented Jul 1, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jul 1, 2026

Uh oh!

coderabbitai Bot Jul 1, 2026

Uh oh!

coderabbitai Bot Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ccyywwen commented Jul 1, 2026

Summary

Base / CI Notes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jul 1, 2026 •

edited

Loading