Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions include/PTO/IR/PTOOps.td
Original file line number Diff line number Diff line change
Expand Up @@ -1425,6 +1425,10 @@ def PointerCastOp : PTO_Op<"pointer_cast", [AttrSizedOperandSegments, Pure]> {
$_builder.getDenseI32ArrayAttr({addrsSize, vRowSize, vColSize}));
}]>
];

let extraClassDeclaration = [{
::mlir::LogicalResult verify();
}];
}

// =============================================================================
Expand Down
90 changes: 90 additions & 0 deletions lib/PTO/IR/PTO.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2811,12 +2811,87 @@ void mlir::pto::annotatePTOEntryFunctions(ModuleOp module) {
return success();
}

static std::optional<uint64_t>
getLocalAddressAlignmentBytes(Attribute memorySpace) {
auto addrSpace = dyn_cast_or_null<AddressSpaceAttr>(memorySpace);
if (!addrSpace)
return std::nullopt;

switch (addrSpace.getAddressSpace()) {
case AddressSpace::VEC:
case AddressSpace::MAT:
case AddressSpace::BIAS:
case AddressSpace::SCALING:
return 32;
case AddressSpace::LEFT:
case AddressSpace::RIGHT:
case AddressSpace::ACC:
return 512;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: these 32 / 512 byte values are unsourced and inconsistent with the tile type's own fractal field.

  • pto-isa's a2a3/a5 TASSIGN_IMPL (include/pto/npu/a2a3/TAssign.hpp) has NO address-alignment static_assert, so the PR description's rationale ("PTO-ISA's TASSIGN static checks are not triggered") does not match the headers - there is no such check here to mirror.
  • The values disagree with practice: ACC tiles carry fractal=1024 yet this requires only 512, and MAT (L1) addresses are 512-aligned in every existing test/sample yet this requires only 32 - so an unaligned MAT/ACC base that would hang still passes this verifier (under-checking).

Please cite the HW source for these numbers (and note they are applied arch-agnostically to a2/a3/a5), or derive the requirement from the arch + fractal rather than hardcoding.

case AddressSpace::GM:
case AddressSpace::Zero:
return std::nullopt;
}
return std::nullopt;
}

static std::optional<int64_t> getConstantIntValue(Value value) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: this duplicates mlir::getConstantIntValue (mlir/Dialect/Utils/StaticValueUtils.h), which resolves folded / constant-like values via m_ConstantInt, not just a direct arith::ConstantOp. Reusing it would be more robust and let you drop this helper.

if (!value)
return std::nullopt;

auto constOp = value.getDefiningOp<arith::ConstantOp>();
if (!constOp)
return std::nullopt;

auto intAttr = dyn_cast<IntegerAttr>(constOp.getValue());
if (!intAttr)
return std::nullopt;

return intAttr.getValue().getSExtValue();
}

static LogicalResult verifyConstantLocalAddress(Operation *op, Value addr,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 (blocker): the new alignment contract is bypassable on the primary user path. verifyConstantLocalAddress is wired only into AllocTileOp::verify() and PointerCastOp::verify(), but not into TAssignOp::verify() (PTO.cpp:2984), which still only checks result type == tile type.

pto.tassign is the manual/level3 op that rebinds a declared tile to a new explicit address, and its EmitC lowering forwards that address straight into TASSIGN(tile, addr) (PTOToEmitC.cpp:6580) - i.e. the exact TASSIGN(tile, 70660) form named in this PR's motivation and in issue-1789. So pto.tassign %t, %c70660 : !pto.tile_buf<loc=vec, ...> still compiles after this PR and reaches the board unchecked.

The tile operand is a TileBufType carrying a memory space, so the fix is one call in TAssignOp::verify():
verifyConstantLocalAddress(getOperation(), getAddr(), cast<TileBufType>(getTile().getType()).getMemorySpace()).

(This matches the Codex bot's P2.)

Attribute memorySpace,
int addrIndex = -1) {
std::optional<uint64_t> alignment =
getLocalAddressAlignmentBytes(memorySpace);
if (!alignment || *alignment == 0)
return success();

std::optional<int64_t> constantAddr = getConstantIntValue(addr);
if (!constantAddr)
return success();

auto emitAddrError = [&]() {
InFlightDiagnostic diag = op->emitOpError();
if (addrIndex >= 0)
diag << "addr[" << addrIndex << "]";
else
diag << "addr";
return diag;
};

if (*constantAddr < 0)
return emitAddrError() << " must be non-negative, got " << *constantAddr;

uint64_t unsignedAddr = static_cast<uint64_t>(*constantAddr);
if ((unsignedAddr % *alignment) != 0)
return emitAddrError()
<< " must be aligned to " << *alignment
<< " bytes for local tile memory space, got " << unsignedAddr;

return success();
}
Comment on lines +2852 to +2883

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of verifyConstantLocalAddress checks if constantAddr < 0 before checking the memory space alignment. This means that even for GM (Global Memory) or default spaces—which are supposed to be ignored by this local tile address verifier—a negative constant address (or a high address that sign-extends to a negative value) will be incorrectly rejected.

To fix this and ensure GM/default spaces are completely ignored (as well as improving efficiency by avoiding constant value resolution for non-local spaces), we should retrieve and check the alignment first. If the space is not a local space (i.e., no alignment is returned), we can return success() immediately.

static LogicalResult verifyConstantLocalAddress(Operation *op, Value addr,
                                                Attribute memorySpace,
                                                int addrIndex = -1) {
  std::optional<uint64_t> alignment =
      getLocalAddressAlignmentBytes(memorySpace);
  if (!alignment || *alignment == 0)
    return success();

  std::optional<int64_t> constantAddr = getConstantIntValue(addr);
  if (!constantAddr)
    return success();

  auto emitAddrError = [&]() {
    InFlightDiagnostic diag = op->emitOpError();
    if (addrIndex >= 0)
      diag << "addr[" << addrIndex << "]";
    else
      diag << "addr";
    return diag;
  };

  if (*constantAddr < 0)
    return emitAddrError() << " must be non-negative, got " << *constantAddr;

  uint64_t unsignedAddr = static_cast<uint64_t>(*constantAddr);
  if ((unsignedAddr % *alignment) != 0)
    return emitAddrError()
           << " must be aligned to " << *alignment
           << " bytes for local tile memory space, got " << unsignedAddr;

  return success();
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled in the latest push. verifyConstantLocalAddress now checks the memory space first and returns early for non-local spaces before resolving the constant value, so GM/default spaces are fully ignored by this local tile verifier. I also added pointer_cast_addr_alignment_gm_ignored.mlir to cover the negative GM address case mentioned here. ninja -C build-pr PTOIR passes on the clean PR worktree.


LogicalResult AllocTileOp::verify() {
auto ty = getResult().getType(); // TileBufType

if (failed(verifyTileBufLayoutConstraints(*this, ty, "result")))
return failure();

if (failed(verifyConstantLocalAddress(getOperation(), getAddr(),
ty.getMemorySpace())))
return failure();

// op 上有没有传 operands
bool hasVR = getValidRow() != nullptr;
bool hasVC = getValidCol() != nullptr;
Expand Down Expand Up @@ -2848,6 +2923,21 @@ LogicalResult AllocTileOp::verify() {
return success();
}

LogicalResult PointerCastOp::verify() {
auto memRefTy = dyn_cast<BaseMemRefType>(getResult().getType());
if (!memRefTy)
return emitOpError("result must be a memref type");

for (auto [idx, addr] : llvm::enumerate(getAddrs())) {
if (failed(verifyConstantLocalAddress(getOperation(), addr,
memRefTy.getMemorySpace(),
static_cast<int>(idx))))
return failure();
}

return success();
}

LogicalResult MaterializeTileOp::verify() {
auto sourceTy = cast<MemRefType>(getSource().getType());
auto resultTy = cast<TileBufType>(getResult().getType());
Expand Down
11 changes: 11 additions & 0 deletions test/basic/alloc_tile_addr_alignment_invalid.mlir
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
// RUN: not ptoas --pto-level=level3 %s 2>&1 | FileCheck %s

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 (blocker for the stated deliverable): these regression tests never run in CI. CI runs ninja check-pto (ci.yml:255), whose lit suite is rooted at test/lit/ with config.suffixes = ['.pto'] (test/lit/lit.cfg.py). test/basic/ is a brand-new sibling directory and these are .mlir files, so check-pto does not discover them - I built this PR and ran the suite: 714 tests discovered, none from test/basic. There is no test/CMakeLists.txt and no workflow references the top-level test/lit.cfg.py, so the only way these pass is by manually pointing llvm-lit at them.

Please move them under test/lit/ (e.g. test/lit/pto/) and rename to .pto (cf. test/lit/vpto/cube/textract_verify_invalid_right_layout.pto) so the regression actually guards against recurrence.

Minor related note: ptoas exits 0 on this verify failure (it prints the error + Error: Failed to parse MLIR but rc=0), so the not ptoas ... | FileCheck line passes via FileCheck on the piped output, not via the exit code - the PR description's "exits 1" claim does not match observed behavior.


module {
func.func @unaligned_alloc_tile_addr() {
%bad = arith.constant 70660 : i64
// CHECK: pto.alloc_tile
// CHECK: addr must be aligned to 32 bytes for local tile memory space, got 70660
%tile = pto.alloc_tile addr = %bad : !pto.tile_buf<loc=vec, dtype=f16, rows=32, cols=64, v_row=32, v_col=64, blayout=row_major, slayout=none_box, fractal=512, pad=0>
return
}
}
9 changes: 9 additions & 0 deletions test/basic/alloc_tile_addr_alignment_valid.mlir
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
// RUN: ptoas --pto-level=level3 %s >/dev/null

module {
func.func @aligned_alloc_tile_addr() {
%ok = arith.constant 71232 : i64
%tile = pto.alloc_tile addr = %ok : !pto.tile_buf<loc=vec, dtype=f16, rows=32, cols=64, v_row=32, v_col=64, blayout=row_major, slayout=none_box, fractal=512, pad=0>
return
}
}
9 changes: 9 additions & 0 deletions test/basic/pointer_cast_addr_alignment_gm_ignored.mlir
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
// RUN: ptoas --pto-level=level3 %s >/dev/null

module {
func.func @gm_pointer_cast_addr_is_ignored() {
%neg = arith.constant -1 : i64
%buf = pto.pointer_cast(%neg) : memref<32x64xf16, #pto.address_space<gm>>
return
}
}
11 changes: 11 additions & 0 deletions test/basic/pointer_cast_addr_alignment_invalid.mlir
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
// RUN: not ptoas --pto-level=level3 %s 2>&1 | FileCheck %s

module {
func.func @unaligned_pointer_cast_addr() {
%bad = arith.constant 70660 : i64
// CHECK: pto.pointer_cast
// CHECK: addr[0] must be aligned to 32 bytes for local tile memory space, got 70660
%buf = pto.pointer_cast(%bad) : memref<32x64xf16, #pto.address_space<vec>>
return
}
}
Loading