feat(deepseek/v4): gather-free split-half RoPE for decode + prefill by lwDavid · Pull Request #570 · hw-native-sys/pypto-lib

lwDavid · 2026-06-22T08:06:39Z

Summary

Converts the DeepSeek-V4 RoPE path from the interleaved (GPT-J, gather-based) layout to split-half (GPT-NeoX, gather-free) for both decode and prefill, so the whole chain is layout-consistent. The rotation partner of lane k becomes lane k+HALF — a contiguous lo=[:HALF]/hi=[HALF:] slice instead of a j^1 swap-gather + j>>1 cos/sin dup-gather. Every RoPE rotation becomes contiguous slices + plain FMAs, with no cross-lane op and no in-kernel rope_cs dup-gather pre-pass.

Decode (commit 1): adopts the split-half conversion across qkv_proj_rope (forward, shared), decode_compressor_ratio{4,128} (forward), and decode_sparse_attn{,_hca,_swa} (inverse), with the callers feeding half-width FP32 rope_cos_half/rope_sin_half. Two precision details vs current main: the compressor gamma_rope is kept cast to FP32 (since norm_w is BF16), and the inverse-rope stage (already FP32) is read directly — no FP32→FP32 identity cast.
Prefill (commit 2): mirrors the same conversion onto prefill_compressor_ratio{4,128} (forward) and prefill_sparse_attn (inverse), with prefill_attention_{csa,hca,swa} building and passing the half-width tables. This removes the latent "half-converted" hazard from converting the shared qkv_proj_rope without converting prefill's own compressors/inverse. Rebased on top of Fix dsv4 prefill_sparse_attn per-head NOPE corruption; align to decode #569 — its per-head NOPE fix and the prefill_sparse_attn_padded_indices removal are preserved.

Forward: out_lo = x_lo*cos − x_hi*sin, out_hi = x_lo*sin + x_hi*cos. Inverse (conjugate): out_lo = x_lo*cos + x_hi*sin, out_hi = x_hi*cos − x_lo*sin.

The lightning indexer ({decode,prefill}_indexer{,_compressor}) is intentionally left interleaved: it is a self-contained RoPE subsystem (own query/KV rope from the freqs tables) that feeds sparse attention only integer top-k indices, so it is decoupled from the main path.

Why

On-device profiling showed the per-element gather (j^1 swap + j>>1 dup) — not the arithmetic — is the dominant RoPE cost. The earlier interleaved L2 swimlane on the HCA attention module measured rope compute 2970 → 877 µs (−70.5%) and module wall-clock −9.6% from this change.

Validation (a2a3sim, golden, all PASS)

decode_compressor_ratio4/128, decode_sparse_attn{,_hca,_swa}, qkv_proj_rope, decode_attention_{csa,hca,swa}, decode_layer; prefill_compressor_ratio4/128, prefill_sparse_attn, prefill_attention_{csa,hca,swa}, prefill_layer. The *_sparse_attn_swa standalone test is occasionally flaky (~4/6) but identically so on main (unseeded fixtures from #563 against a tight tolerance) — not introduced by this change.

Caveat

Real checkpoints need an offline interleaved→split-half permutation of the trained q-proj/k_pe/wo_a rope columns to stay bit-identical to the trained model. Synthetic tests need none. Not yet validated on-device or end-to-end by the serving system.

Related Issues

Builds on / supersedes feat(deepseek/v4): gather-free split-half RoPE for the decode path (draft, decode-only) #564 (decode-only draft) by completing the prefill conversion.
Rebased onto Fix dsv4 prefill_sparse_attn per-head NOPE corruption; align to decode #569 (prefill_sparse_attn NOPE fix), which it preserves.

… main Cherry-pick of c5aa9c5 (feat: gather-free split-half RoPE for the decode path) resolved onto main (ea299a1). 5 of 10 files auto-merged; 5 conflicted (both sides fully rewrote the same RoPE block). Resolution: - All 5 conflicts: adopt the PR split-half (NeoX) side, discard main's interleaved (GPT-J) side. - decode_compressor_ratio{4,128}: restore `gamma_rope = pl.cast(..., pl.FP32)` on the split-half branch. hw-native-sys#568 flipped norm_w FP32->BF16 and added the cast at every apply site; the PR (branched pre-hw-native-sys#568) dropped it. Without the cast the per-column gamma fold would run in BF16, asymmetric with the NOPE branch and the float golden rmsnorm. - decode_sparse_attn{,_hca,_swa}: drop the PR's `pl.cast(r_tile, FP32)`. The PR added it when attn_rope_stage was BF16; hw-native-sys#568 made the stage FP32, so the cast is an FP32->FP32 identity that the pypto op registry rejects (trace-time failure). Read the already-FP32 stage directly, matching main's interleaved version. Validated on a2a3sim (golden): decode_compressor_ratio4/128 PASS, qkv_proj_rope PASS, decode_sparse_attn PASS, decode_sparse_attn_hca PASS. decode_sparse_attn_swa is flaky (4/6) but identically so on clean main (FAIL PASS FAIL PASS PASS PASS on both) -- pre-existing unseeded-input tolerance flakiness from hw-native-sys#563, not a merge regression. Decode-only, inheriting the PR's caveat: qkv_proj_rope is shared with prefill, whose sparse_attn/compressors stay interleaved, so prefill is latently half-converted. Landing requires the prefill split-half follow-up + offline weight permutation for real checkpoints.

Rebased onto upstream hw-native-sys#569 (which rewrote prefill_sparse_attn + fixed the per-head NOPE corruption and removed prefill_sparse_attn_padded_indices). This re-applies the prefill split-half conversion on top of hw-native-sys#569 so the whole prefill chain is layout-consistent with the now-split-half shared qkv_proj_rope forward. Converted (kernel + golden + standalone fixtures), mirroring the validated decode analogs on this branch: - prefill_compressor_ratio4 / ratio128: forward rope P0101/P1010 even/odd gather+scatter -> contiguous lo/hi slices; gamma folded per-half in FP32. - prefill_sparse_attn: inverse rope -- removed the rope_cs dup-gather pre-pass and the per-head j^1 swap-gather; now a gather-free contiguous lo/hi conjugate rotate (out_lo=x_lo*cos+x_hi*sin, out_hi=x_hi*cos-x_lo*sin) reading half-width FP32 rope_cos_half/rope_sin_half (both prefill_sparse_attn and prefill_sparse_attn_test signatures + the in-file test call); golden + build_tensor_specs fixture updated. - prefill_attention_{csa,hca,swa}: build half-width FP32 rope_cos_half/sin_half and pass them to the (now directly-called) prefill_sparse_attn; golden dict keys renamed. qkv forward still gets the full BF16 tables and slices [:HALF] internally. Indexer (prefill_indexer / prefill_indexer_compressor) intentionally left interleaved -- self-contained, feeds sparse_attn only integer indices. Validated on a2a3sim (golden), all PASS: prefill_compressor_ratio4/128, prefill_sparse_attn, prefill_attention_csa/hca/swa, prefill_layer. Real checkpoints still need the offline interleaved->split-half permutation of the trained q-proj/k_pe/wo_a rope columns (unchanged PR caveat).

coderabbitai · 2026-06-22T08:06:53Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 88106ca8-137e-4402-9ee8-fdc2737afd0d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR migrates all DeepSeek-V4 decode and prefill RoPE rotation code from an interleaved swap/gather scheme to a gather-free split-half (NeoX) scheme. Sparse attention kernels receive new half-width FP32 rope_cos_half/rope_sin_half tables instead of full-width BF16 freqs_cos/freqs_sin. Compressor and QKV projection kernels replace even/odd lane gather rotation with contiguous lo/hi half rotation. All callers, golden references, and test harnesses are updated throughout.

Changes

Split-half NeoX RoPE migration across DeepSeek-V4

Layer / File(s)	Summary
Sparse attention kernel signature + inverse RoPE rewrite `models/deepseek/v4/decode_sparse_attn.py`, `models/deepseek/v4/decode_sparse_attn_hca.py`, `models/deepseek/v4/decode_sparse_attn_swa.py`, `models/deepseek/v4/prefill_sparse_attn.py`	`sparse_attn`, `sparse_attn_hca`, `sparse_attn_swa`, and `prefill_sparse_attn` replace `freqs_cos/freqs_sin` (BF16, `ROPE_DIM`) parameters with `rope_cos_half/rope_sin_half` (FP32, `HALF_ROPE`). The inverse RoPE kernel body in each file is rewritten from precomputed interleaved cosine/signed-sine + per-head `j^1` gather to gather-free split-half lo/hi rotation writing directly into `o_packed` rope columns. Test harnesses, golden references, and TensorSpec initializers are updated to generate and use the BF16-rounded FP32 half-width tables.
Attention orchestrators: allocate and wire half-width RoPE tables `models/deepseek/v4/decode_attention_csa.py`, `models/deepseek/v4/decode_attention_hca.py`, `models/deepseek/v4/decode_attention_swa.py`, `models/deepseek/v4/prefill_attention_csa.py`, `models/deepseek/v4/prefill_attention_hca.py`, `models/deepseek/v4/prefill_attention_swa.py`	All six attention orchestrators allocate FP32 `rope_cos_half_t`/`rope_sin_half_t` tensors (first `HALF_ROPE` columns of per-token RoPE snapshots) and pass them to the sparse attention calls instead of the prior full-width tables. Golden references in each file are updated in parallel.
Compressor forward RoPE rewrite `models/deepseek/v4/decode_compressor_ratio4.py`, `models/deepseek/v4/decode_compressor_ratio128.py`, `models/deepseek/v4/prefill_compressor_ratio4.py`, `models/deepseek/v4/prefill_compressor_ratio128.py`	All four compressor kernels replace gather-based interleaved even/odd forward RoPE with split-half NeoX rotation: rope segment is split into `lo/hi`, `gamma` is folded per half, and results are computed and written back as two contiguous halves of `normed_kv`. Golden references use the same `x_lo/x_hi` concat pattern.
QKV projection split-half RoPE rewrite `models/deepseek/v4/qkv_proj_rope.py`	Both Q and KV fused RoPE paths in `qkv_proj_rope` remove in-kernel swap/sign index construction and gather rotation, and instead fold `inv_rms`/`gamma` into contiguous `ROPE_HALF` lo/hi slices before applying the standard split-half formulas. The `apply_rope` golden reference is updated to the same lo/hi concat scheme.
Documentation `models/deepseek/v4/decode_layer.py`	Three comments added to `build_tensor_specs` clarifying that split-half NeoX cos/sin tables are sourced from the first half-columns of `freqs_cos`/`freqs_sin` with no separate interleaved table.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

hw-native-sys/pypto-lib#468: Updates qkv_proj_rope.py to use split-half contiguous lo/hi RoPE rotation, which is the same QKV RoPE layout change applied in this PR.
hw-native-sys/pypto-lib#568: Adjusts inverse-RoPE numerics and rounding (attn_rope_stage FP32, mode="rint") in the same decode_sparse_attn*.py kernels modified here.
hw-native-sys/pypto-lib#533: Modifies the decode RoPE and compressor paths in the same decode_compressor_ratio4.py and decode_compressor_ratio128.py files changed here.

Suggested labels

enhancement

🐇 The old gather indices are gone—hooray!
Two clean halves now rotate and play,
lo times cos, hi times sin,
NeoX style, no gather needed within.
Split-half tables, FP32 and bright—
The bunny hops left and right! 🌀

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 55.74% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main technical change: converting DeepSeek-V4 RoPE from gather-based to gather-free split-half across both decode and prefill paths.
Description check	✅ Passed	The description provides comprehensive context on the RoPE conversion, performance improvements, validation approach, and implementation details across decode and prefill paths.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request refactors the Rotary Position Embedding (RoPE) implementation across DeepSeek v4 attention and compressor modules to use a split-half (NeoX) layout instead of an interleaved layout. This simplifies the kernels by removing in-kernel index building and gather operations, using half-width unsigned cosine and sine tables instead. The code review feedback identifies several opportunities to optimize memory access in decode_attention_csa.py, decode_attention_hca.py, and prefill_attention_csa.py by slicing already-populated local tensors directly rather than performing redundant global memory lookups.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@models/deepseek/v4/prefill_attention_csa.py`:
- Around line 185-198: The rope_cos_half_t and rope_sin_half_t tensors are
created but only filled for rows where half_t < num_tokens, leaving padding rows
(half_t >= num_tokens) uninitialized which causes divergence in subsequent RoPE
operations. Initialize both rope_cos_half_t and rope_sin_half_t with finite
identity defaults (zeros) immediately after tensor creation and before the loop
starting with "for half_t in pl.range(T)" to ensure all T rows have valid values
before the RoPE multiply operations use them.

In `@models/deepseek/v4/prefill_attention_hca.py`:
- Around line 153-161: The rope_cos_half_t and rope_sin_half_t tensors are
created but only populated for rows where half_t < num_tokens, leaving rows >=
num_tokens uninitialized. Since the sparse-attn inverse-RoPE pass reads all T
rows, the uninitialized rows will cause issues. Add initialization code within
the pl.at context block to set rope_cos_half_t and rope_sin_half_t to identity
values (cos values of 1.0 and sin values of 0.0) for the entire T rows before
the loop that conditionally overwrites only the active token rows.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 79e262c1-51f8-42c9-8d98-ab64261de144

📥 Commits

Reviewing files that changed from the base of the PR and between 86c3b04 and 13f8853.

📒 Files selected for processing (16)

models/deepseek/v4/decode_attention_csa.py
models/deepseek/v4/decode_attention_hca.py
models/deepseek/v4/decode_attention_swa.py
models/deepseek/v4/decode_compressor_ratio128.py
models/deepseek/v4/decode_compressor_ratio4.py
models/deepseek/v4/decode_layer.py
models/deepseek/v4/decode_sparse_attn.py
models/deepseek/v4/decode_sparse_attn_hca.py
models/deepseek/v4/decode_sparse_attn_swa.py
models/deepseek/v4/prefill_attention_csa.py
models/deepseek/v4/prefill_attention_hca.py
models/deepseek/v4/prefill_attention_swa.py
models/deepseek/v4/prefill_compressor_ratio128.py
models/deepseek/v4/prefill_compressor_ratio4.py
models/deepseek/v4/prefill_sparse_attn.py
models/deepseek/v4/qkv_proj_rope.py

lwDavid · 2026-06-22T08:41:33Z

Performance (a2a3, real device, L2 swimlane)

Measured decode_attention_hca on a2a3 (real NPU), baseline = main (interleaved GPT-J) vs this branch (split-half GPT-NeoX), summing per-kernel AICore busy time from the L2 swimlane (--enable-l2-swimlane), one clean run per branch.

metric	baseline (interleaved)	optimized (split-half)	Δ
RoPE compute (Σ AICore busy)	3123.0 us	920.6 us	−70.5%
HCA attention module wall-clock	1508.6 us	1450.5 us	−3.8%
whole-module AICore busy	55418.9 us	53046.8 us	−4.3%

Per-kernel:

kernel	baseline	optimized	Δ
`q_head_rope_fused` (q fwd)	1443.8	439.2	−69.6%
`rope` (inverse)	1419.0	288.9	−79.6%
`kv_rope_fused` (kv fwd)	46.1	13.5	−70.7%
`rmsnorm_rope`	69.6	52.6	−24.4%
`rope_cs` (dup-gather pre-pass)	53.3	0	eliminated
`hca_rope`	91.3	126.5	+38% (tiny, ×1)

The RoPE kernels drop ~70% (gather → contiguous lo/hi slice + plain FMAs; every VGATHER and the whole rope_cs dup-gather pre-pass removed), independently reproducing the original −70.5% figure. End-to-end this module is ~−4% wall-clock because RoPE is only ~5.6% of its AICore busy time and the chip runs ~36 cores in parallel — the win grows on RoPE-heavier / more dispatch-bound configs.

Notes: the device shows an occasional transient 507018 fault (retried for a clean run); x_out validates bit-exact on device. The same gather-elimination applies to the prefill kernels converted here.

- prefill_attention_{csa,hca,swa}: identity-init rope_cos_half/rope_sin_half (cos=1, sin=0) over all T rows so padding rows (>= num_tokens) stay finite -- prefill_sparse_attn rotates all T rows (CodeRabbit review). Active rows are overwritten below as before; no effect when num_tokens == T. - prefill_attention_csa: fill the active rows by slicing the already-materialized rope_cos_t/rope_sin_t instead of re-reading freqs_cos from GM, consistent with hca/swa (gemini review). The gemini suggestions to slice the local cos_row/step_cos_row in the DECODE callers were NOT applied: in decode_attention_csa it trips a PTO2 runtime assertion (index < output_count_); kept the original freqs_cos slice. Validated a2a3sim: prefill_attention_csa/hca PASS, prefill_layer PASS. swa standalone stays pre-existing flaky (unseeded hw-native-sys#563 fixtures), unaffected.

lwDavid · 2026-06-22T09:14:24Z

Review comments addressed (`022b525`)

All 5 review threads resolved:

CodeRabbit (Major) — prefill_attention_csa / _hca padding rows: Fixed. rope_cos_half/rope_sin_half are now identity-initialized (cos=1, sin=0) over all T rows before the active-row overwrite, so padding rows (>= num_tokens) stay finite for the all-T-row inverse RoPE in prefill_sparse_attn. Applied to csa/hca/swa.
gemini — prefill_attention_csa redundant GM reads: Applied. csa now fills active rows by slicing the already-materialized rope_cos_t/rope_sin_t (consistent with hca/swa) instead of re-reading freqs_cos.
gemini — decode_attention_csa / _hca: Not applied. Slicing the local cos_row inside pl.assemble trips a PTO2 runtime assertion (index < output_count_) on a2a3sim; kept the original freqs_cos slice (one-time per-token setup-loop read, negligible).

Validated on a2a3sim: prefill_attention_csa/_hca PASS, prefill_layer PASS.

CI failure triage (none are code defects in this PR)

a2a3 (device): halMemCtl failed rc=13 / run_prepared code 13 on a few kernels — an intermittent CI-device register-access fault (csa/swa/decode_layer passed on the same run).
sim (a2a3sim / a5sim): decode_layer → No space left on device (CI-runner shm/disk); decode_sparse_attn{,_hca} attn_out slightly over tolerance (0.99–8.8%, a different one each run, swa passed) — the pre-existing unseeded-fixture flakiness from Refactor: drop decode fixture seeds + EP selector in decode_layer #563 (these standalone tests draw fresh torch.rand inputs against a tight max_error_ratio=0.005 and flake identically on main).

Every prefill kernel this PR adds passes on each run. A CI re-run is in progress for 022b525.

…refill" (#575) Reverts #570

#578) ## Summary - Retile the DeepSeek-V4 `qkv_proj_rope` projection matmuls to the 512B L2 cache line and fuse RMSNorm with RoPE. **Decode end-to-end −56%** (a2a3 L2 swimlane, 5-rep median: 936µs → 407µs); golden green on decode and prefill. - `qr_proj` / `kv_proj`: split-K (zero-seed + atomic-add) with N-tile 32 → 256, so each `wq_a`/`wkv` row-read fills a full 512B cache line instead of a 64B sub-line (was 8× weight over-fetch). Kernel occupancy −84% / −75%. - `qproj_matmul`: decouple the matmul N-tile from the dequant N-tile and bump matmul `TN` 128 → 256 (256B/row), capped by the L0C `Acc` limit (`TM*TN*4 ≤ 128KB`). `TN=512` needs an M-split (`TM=64`) and measured no faster end-to-end on device. - Fuse per-head RMSNorm + NOPE + RoPE into `q_head_rms_nope_rope`, and KV RMSNorm + RoPE into `kv_rms_norm_rope`: `inv_rms` stays in registers (no GM round-trip via the old `q_head_inv_rms_all` / `kv_inv_rms_tensor`), collapsing each pair of dispatches into one. RoPE keeps the interleaved (CANN A3) swap-gather layout. ## Related Issues - The RMSNorm+RoPE fusion re-introduces fused rope on top of the **interleaved** layout restored by #575 (the revert of #570); it does not bring back the split-half layout. The matmul retiling is independent of the rope layout.

lwDavid added 2 commits June 22, 2026 15:40

lwDavid self-assigned this Jun 22, 2026

lwDavid added the enhancement New feature or request label Jun 22, 2026

lwDavid added this to pto project Jun 22, 2026

lwDavid moved this to In Progress in pto project Jun 22, 2026

gemini-code-assist Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread models/deepseek/v4/decode_attention_csa.py

Comment thread models/deepseek/v4/decode_attention_hca.py

Comment thread models/deepseek/v4/prefill_attention_csa.py Outdated

coderabbitai Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread models/deepseek/v4/prefill_attention_csa.py Outdated

Comment thread models/deepseek/v4/prefill_attention_hca.py

lwDavid moved this from In Progress to Done in pto project Jun 22, 2026

zhangqi-chen merged commit cdb64e0 into hw-native-sys:main Jun 22, 2026
5 of 7 checks passed

lwDavid requested a review from zhangqi-chen June 22, 2026 09:50

lwDavid mentioned this pull request Jun 22, 2026

Revert "feat(deepseek/v4): gather-free split-half RoPE for decode + prefill" #575

Merged

zhangqi-chen mentioned this pull request Jun 22, 2026

Revert "feat(deepseek/v4): gather-free split-half RoPE for decode + prefill" #576

Closed

zhangqi-chen pushed a commit that referenced this pull request Jun 22, 2026

Revert "feat(deepseek/v4): gather-free split-half RoPE for decode + p…

5fdf7ec

…refill" (#575) Reverts #570

Hzfengsy mentioned this pull request Jun 22, 2026

perf(deepseek/v4): qkv_proj_rope tiling + fused rms+rope (decode -56%) #578

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(deepseek/v4): gather-free split-half RoPE for decode + prefill#570

feat(deepseek/v4): gather-free split-half RoPE for decode + prefill#570
zhangqi-chen merged 3 commits into
hw-native-sys:mainfrom
lwDavid:research/pr564-splithalf-rope

lwDavid commented Jun 22, 2026

Uh oh!

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

lwDavid commented Jun 22, 2026

Uh oh!

lwDavid commented Jun 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

lwDavid commented Jun 22, 2026

Summary

Why

Validation (a2a3sim, golden, all PASS)

Caveat

Related Issues

Uh oh!

coderabbitai Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lwDavid commented Jun 22, 2026

Performance (a2a3, real device, L2 swimlane)

Uh oh!

lwDavid commented Jun 22, 2026

Review comments addressed (022b525)

CI failure triage (none are code defects in this PR)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Jun 22, 2026 •

edited

Loading

Review comments addressed (`022b525`)