[IR] Verify explicit local tile address alignment#875
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces local address alignment verification for AllocTileOp and PointerCastOp in the PTO dialect. It adds helper functions to retrieve alignment requirements based on the memory space and verify that constant addresses are non-negative and properly aligned. Corresponding test cases are also added. The review feedback points out a logical issue in verifyConstantLocalAddress where negative address checks are performed before verifying if the memory space requires alignment (e.g., for global memory), and suggests reordering the checks to avoid incorrect rejections.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| static LogicalResult verifyConstantLocalAddress(Operation *op, Value addr, | ||
| Attribute memorySpace, | ||
| int addrIndex = -1) { | ||
| std::optional<int64_t> constantAddr = getConstantIntValue(addr); | ||
| if (!constantAddr) | ||
| return success(); | ||
|
|
||
| auto emitAddrError = [&]() { | ||
| InFlightDiagnostic diag = op->emitOpError(); | ||
| if (addrIndex >= 0) | ||
| diag << "addr[" << addrIndex << "]"; | ||
| else | ||
| diag << "addr"; | ||
| return diag; | ||
| }; | ||
|
|
||
| if (*constantAddr < 0) | ||
| return emitAddrError() << " must be non-negative, got " << *constantAddr; | ||
|
|
||
| std::optional<uint64_t> alignment = | ||
| getLocalAddressAlignmentBytes(memorySpace); | ||
| if (!alignment || *alignment == 0) | ||
| return success(); | ||
|
|
||
| uint64_t unsignedAddr = static_cast<uint64_t>(*constantAddr); | ||
| if ((unsignedAddr % *alignment) != 0) | ||
| return emitAddrError() | ||
| << " must be aligned to " << *alignment | ||
| << " bytes for local tile memory space, got " << unsignedAddr; | ||
|
|
||
| return success(); | ||
| } |
There was a problem hiding this comment.
The current implementation of verifyConstantLocalAddress checks if constantAddr < 0 before checking the memory space alignment. This means that even for GM (Global Memory) or default spaces—which are supposed to be ignored by this local tile address verifier—a negative constant address (or a high address that sign-extends to a negative value) will be incorrectly rejected.
To fix this and ensure GM/default spaces are completely ignored (as well as improving efficiency by avoiding constant value resolution for non-local spaces), we should retrieve and check the alignment first. If the space is not a local space (i.e., no alignment is returned), we can return success() immediately.
static LogicalResult verifyConstantLocalAddress(Operation *op, Value addr,
Attribute memorySpace,
int addrIndex = -1) {
std::optional<uint64_t> alignment =
getLocalAddressAlignmentBytes(memorySpace);
if (!alignment || *alignment == 0)
return success();
std::optional<int64_t> constantAddr = getConstantIntValue(addr);
if (!constantAddr)
return success();
auto emitAddrError = [&]() {
InFlightDiagnostic diag = op->emitOpError();
if (addrIndex >= 0)
diag << "addr[" << addrIndex << "]";
else
diag << "addr";
return diag;
};
if (*constantAddr < 0)
return emitAddrError() << " must be non-negative, got " << *constantAddr;
uint64_t unsignedAddr = static_cast<uint64_t>(*constantAddr);
if ((unsignedAddr % *alignment) != 0)
return emitAddrError()
<< " must be aligned to " << *alignment
<< " bytes for local tile memory space, got " << unsignedAddr;
return success();
}There was a problem hiding this comment.
Handled in the latest push. verifyConstantLocalAddress now checks the memory space first and returns early for non-local spaces before resolving the constant value, so GM/default spaces are fully ignored by this local tile verifier. I also added pointer_cast_addr_alignment_gm_ignored.mlir to cover the negative GM address case mentioned here. ninja -C build-pr PTOIR passes on the clean PR worktree.
|
Follow-up suggestion for PTO-ISA This PR adds the PTOAS-side IR verifier guard, but PTO-ISA should still add a second line of defense. Current behavior observed during the issue-1789 analysis:
Suggested PTO-ISA / codegen follow-ups:
This would avoid silent board-side timeout/hang if a future frontend or hand-written kernel still reaches the runtime |
4946cd3 to
a5ac7da
Compare
|
/run A3 |
|
已接收
页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。 |
A3 板测失败
失败用例
|
A3 板测失败详情:PR #875qwen3_decode_incore_1
post_rmsnorm
out_proj_residual
down_proj_residual
|
Codex Review该评论由 review 机器人自动更新。
SummaryPR #875 still leaves the Findings
|
zhangstevenunity
left a comment
There was a problem hiding this comment.
Reviewed by building this PR head (a5ac7da) against the patched LLVM and running the suite. The verifier itself is sound and regression-free: 714/714 existing lit tests pass, the check fires correctly for alloc_tile and pointer_cast (vec=32B, gm ignored, negative rejected), and the helper checks memory space before the constant value, so Gemini's negative-GM concern is already resolved. Direction is good and the fixes below are small.
Two changes needed before merge:
-
(correctness) The contract is bypassable on its primary path:
pto.tassign(the level3/manual rebind that lowers toTASSIGN(tile, addr), the exact form in issue-1789) is NOT covered. See inline in PTO.cpp. This is the Codex bot's P2 and is the single most important gap, since the motivating bug is literally a TASSIGN address. -
(testing) The 4 new regressions live in
test/basic/*.mlir, which the CI lit suite (ninja check-pto, rooted attest/lit/,.pto-only) does not discover, so they never run in CI. See inline on the test file.
Also: P3 the 32/512 alignment values are unsourced (pto-isa TASSIGN has no such static check to mirror) and inconsistent with the tile fractal (ACC fractal=1024, MAT 512-aligned in practice); plus a minor duplication of mlir::getConstantIntValue.
Note on the wiring: the extraClassDeclaration { verify(); } form (no let hasVerifier = 1;) is non-idiomatic vs sibling ops, but it IS invoked here (the base Op<>::verifyInvariants hook always calls cast<Concrete>(op).verify()), so it is correct, not dead code.
Context: the A3 board run on this PR is currently RED (OK 217 / FAIL 4: qwen3_decode_incore_1, post_rmsnorm at runtime aclrtSynchronizeStream 507014). The verifier did not cause these (they compiled and failed at device sync), but it shows the issue-1789 runtime-hang family is not demonstrably fixed by a compile-time-only guard that also misses the tassign path.
| return intAttr.getValue().getSExtValue(); | ||
| } | ||
|
|
||
| static LogicalResult verifyConstantLocalAddress(Operation *op, Value addr, |
There was a problem hiding this comment.
P1 (blocker): the new alignment contract is bypassable on the primary user path. verifyConstantLocalAddress is wired only into AllocTileOp::verify() and PointerCastOp::verify(), but not into TAssignOp::verify() (PTO.cpp:2984), which still only checks result type == tile type.
pto.tassign is the manual/level3 op that rebinds a declared tile to a new explicit address, and its EmitC lowering forwards that address straight into TASSIGN(tile, addr) (PTOToEmitC.cpp:6580) - i.e. the exact TASSIGN(tile, 70660) form named in this PR's motivation and in issue-1789. So pto.tassign %t, %c70660 : !pto.tile_buf<loc=vec, ...> still compiles after this PR and reaches the board unchecked.
The tile operand is a TileBufType carrying a memory space, so the fix is one call in TAssignOp::verify():
verifyConstantLocalAddress(getOperation(), getAddr(), cast<TileBufType>(getTile().getType()).getMemorySpace()).
(This matches the Codex bot's P2.)
| case AddressSpace::LEFT: | ||
| case AddressSpace::RIGHT: | ||
| case AddressSpace::ACC: | ||
| return 512; |
There was a problem hiding this comment.
P3: these 32 / 512 byte values are unsourced and inconsistent with the tile type's own fractal field.
- pto-isa's a2a3/a5
TASSIGN_IMPL(include/pto/npu/a2a3/TAssign.hpp) has NO address-alignment static_assert, so the PR description's rationale ("PTO-ISA's TASSIGN static checks are not triggered") does not match the headers - there is no such check here to mirror. - The values disagree with practice: ACC tiles carry
fractal=1024yet this requires only 512, and MAT (L1) addresses are 512-aligned in every existing test/sample yet this requires only 32 - so an unaligned MAT/ACC base that would hang still passes this verifier (under-checking).
Please cite the HW source for these numbers (and note they are applied arch-agnostically to a2/a3/a5), or derive the requirement from the arch + fractal rather than hardcoding.
| @@ -0,0 +1,11 @@ | |||
| // RUN: not ptoas --pto-level=level3 %s 2>&1 | FileCheck %s | |||
There was a problem hiding this comment.
P2 (blocker for the stated deliverable): these regression tests never run in CI. CI runs ninja check-pto (ci.yml:255), whose lit suite is rooted at test/lit/ with config.suffixes = ['.pto'] (test/lit/lit.cfg.py). test/basic/ is a brand-new sibling directory and these are .mlir files, so check-pto does not discover them - I built this PR and ran the suite: 714 tests discovered, none from test/basic. There is no test/CMakeLists.txt and no workflow references the top-level test/lit.cfg.py, so the only way these pass is by manually pointing llvm-lit at them.
Please move them under test/lit/ (e.g. test/lit/pto/) and rename to .pto (cf. test/lit/vpto/cube/textract_verify_invalid_right_layout.pto) so the regression actually guards against recurrence.
Minor related note: ptoas exits 0 on this verify failure (it prints the error + Error: Failed to parse MLIR but rc=0), so the not ptoas ... | FileCheck line passes via FileCheck on the piped output, not via the exit code - the PR description's "exits 1" claim does not match observed behavior.
| return std::nullopt; | ||
| } | ||
|
|
||
| static std::optional<int64_t> getConstantIntValue(Value value) { |
There was a problem hiding this comment.
Minor: this duplicates mlir::getConstantIntValue (mlir/Dialect/Utils/StaticValueUtils.h), which resolves folded / constant-like values via m_ConstantInt, not just a direct arith::ConstantOp. Reusing it would be more robust and let you drop this helper.
Summary
pto.alloc_tile addrandpto.pointer_cast(addrs=...)operands.Motivation
TASSIGN(tile, 70660)can hang on A3 even without explicit sync/TPipe complexity.TASSIGN(tile, addr), so PTO-ISA'sTASSIGN<Addr>static checks are not triggered.Design
Testing
ninja -C PTOAS/build pto-optptoas --pto-level=level3 test/basic/alloc_tile_addr_alignment_invalid.mlirexits 1 and FileCheck passes.ptoas --pto-level=level3 test/basic/pointer_cast_addr_alignment_invalid.mlirexits 1 and FileCheck passes.ptoas --pto-level=level3 test/basic/alloc_tile_addr_alignment_valid.mlirexits 0.ninja -C build-pr PTOIRpasses.Note
ptoasbuild from currenthw-native-sys/mainon this Mac is blocked by unrelated local LLVM/VPTO API mismatch (LLVMHiFloat8Type/CallingConv::SimtEntrymissing). The changed IR target itself builds in the clean main worktree.