Skip to content

Add VPTO vector address memory ops#871

Draft
mouliangyu wants to merge 4 commits into
hw-native-sys:mainfrom
mouliangyu:codex/vpto-vaddr-loop-memory
Draft

Add VPTO vector address memory ops#871
mouliangyu wants to merge 4 commits into
hw-native-sys:mainfrom
mouliangyu:codex/vpto-vaddr-loop-memory

Conversation

@mouliangyu

@mouliangyu mouliangyu commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Summary

  • add VPTO !pto.vaddr and vector-address memory ops, including vag, vald/vast, predicate load/store, x2, and unaligned update forms
  • document and verify vector-address loop semantics, including vag stride ordering and loop-shape constraints
  • add structured pto.vaddr_loop custom assembly with one to four i16 bounds and per-vaddr byte strides ordered outer-to-inner
  • lower vector-address ops through the VPTO LLVM path; vaddr_loop lowers to canonical nested loops, emits vaddr generation at the shallowest statically valid loop depth, and preserves zero-stride dimensions
  • add focused lit coverage plus VPTO SIM coverage for vector-address memory behavior

Validation

  • cmake --build build -j64 --target ptoas
  • llvm-lit -v build/test/lit/vpto/vector_address_vaddr_loop_vpto_llvm.pto build/test/lit/vpto/vector_address_vaddr_verify_invalid.pto build/test/lit/vpto/vector_address_vaddr_loop_verify_invalid.pto
  • ptoas --pto-arch=a5 --pto-backend=vpto test/lit/vpto/vector_address_vaddr_loop_vpto_llvm.pto -o /tmp/vaddr_loop_test.o
  • WORK_SPACE=/tmp/pto-vpto-vaddr-baseline CASE_NAME='micro-op/vector-address/multidim-vald-vast' DEVICE=SIM test/vpto/scripts/run_host_vpto_validation.sh
  • ad-hoc local SIM check: copied micro-op/vector-address/multidim-vald-vast under .work, replaced the 4D scf.for + pto.vag block with equivalent pto.vaddr_loop, then ran WORK_SPACE=/tmp/pto-vpto-vaddr-loop-sim CASES_ROOT="$PWD/.work/vaddr-loop-sim-cases" CASE_NAME='micro-op/vector-address/vaddr-loop-vald-vast' DEVICE=SIM test/vpto/scripts/run_host_vpto_validation.sh

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for VPTO vector-address loop memory operations, defining the !pto.vaddr type and implementing several new operations (such as pto.vag, pto.vald, pto.vast, and unaligned update chains) along with their verifiers and LLVM lowering patterns. The review feedback suggests adding a defensive null check in getVectorAddressElementTypeFragment to prevent potential null pointer dereferences, and removing redundant manual type checks in the verifiers for pto.vag and pto.valdu since they are already enforced by TableGen.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +564 to +574
static std::string getVectorAddressElementTypeFragment(Type type) {
if (type.isF16())
return "f16";
if (type.isBF16())
return "bf16";
if (type.isF32())
return "f32";
if (auto intType = dyn_cast<IntegerType>(type))
return "i" + std::to_string(intType.getWidth());
return {};
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In getVectorAddressElementTypeFragment, there is no null check for the input type. If getElementTypeFromVectorLike returns a null Type (e.g., if the input is not a valid vector type), calling type.isF16() will result in a null pointer dereference/crash. Adding a defensive null check at the beginning of the function will make it more robust.

static std::string getVectorAddressElementTypeFragment(Type type) {
  if (!type)
    return {};
  if (type.isF16())
    return "f16";
  if (type.isBF16())
    return "bf16";
  if (type.isF32())
    return "f32";
  if (auto intType = dyn_cast<IntegerType>(type))
    return "i" + std::to_string(intType.getWidth());
  return {};
}

Comment thread lib/PTO/IR/VPTO.cpp
Comment on lines +4721 to +4725
for (Value stride : getStrides()) {
if (!stride.getType().isInteger(32))
return emitOpError("requires all stride operands to be i32");
}
auto forOp = (*this)->getParentOfType<scf::ForOp>();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The manual type check for stride operands being i32 is redundant because the ODS definition for PTO_VagOp already specifies Variadic<I32>:$strides. TableGen automatically generates type verification for I32 operands, so this manual check can be safely removed.

  auto forOp = (*this)->getParentOfType<scf::ForOp>();

Comment thread lib/PTO/IR/VPTO.cpp
Comment on lines +4828 to +4831
if (!getInc().getType().isInteger(32))
return emitOpError("requires inc to be i32");
if (failed(verifySingleVectorAddressUpdateSeedUse(*this, getAddrIn(),
"addr_in")))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The manual type check !getInc().getType().isInteger(32) is redundant because the ODS definition for PTO_ValduOp already specifies I32:$inc. TableGen automatically generates type verification for I32 operands, so this manual check can be safely removed.

Suggested change
if (!getInc().getType().isInteger(32))
return emitOpError("requires inc to be i32");
if (failed(verifySingleVectorAddressUpdateSeedUse(*this, getAddrIn(),
"addr_in")))
if (failed(verifySingleVectorAddressUpdateSeedUse(*this, getAddrIn(),
"addr_in")))

@reedhecre

reedhecre commented Jun 29, 2026

Copy link
Copy Markdown

Codex Review

该评论由 review 机器人自动更新。

  • PR: Add VPTO vector address memory ops #871 Add VPTO vector address memory ops
  • Author: mouliangyu
  • Base/Head: main / codex/vpto-vaddr-loop-memory
  • Head SHA: fb23300d22c7
  • Trigger: PR 有新提交
  • Generated At: 2026-06-30T12:36:42Z
  • Previous Head SHA: 35e0874542ee
  • Status: failed at codex-review (timeout=1800s)

Summary

Review failed at stage codex-review: timeout=1800s

Findings

未生成结构化 findings,因为 review 过程提前失败。

Log Tail

docs/designs/vpto-vector-address-ops.md:394:| `!pto.vaddr<b32>` | `llvm.hivm.vag.32` on A5 32-bit VAG ABI targets |
docs/designs/vpto-vector-address-ops.md:416:  %addr = pto.vag %stride : i32 -> !pto.vaddr<b32>
docs/designs/vpto-vector-address-ops.md:478:- `G` must match the element or distribution granularity required by `DIST`.
docs/designs/vpto-vector-address-ops.md:576:- `%mask` granularity must be legal for `%value` and `DIST`.
docs/designs/vpto-vector-address-ops.md:1032:1. Add `VAddrType` with `b8`/`b16`/`b32` granularity.
docs/designs/vpto-vector-address-ops.md:1045:3. Add verifiers for vaddr granularity, UB pointer spaces, distribution
docs/isa/micro-isa/18-vaddr-loop-memory.md:28:!pto.vaddr<b8>
docs/isa/micro-isa/18-vaddr-loop-memory.md:29:!pto.vaddr<b16>
docs/isa/micro-isa/18-vaddr-loop-memory.md:30:!pto.vaddr<b32>
docs/isa/micro-isa/18-vaddr-loop-memory.md:33:The granularity records the address family used by the vector-address producer.
docs/isa/micro-isa/18-vaddr-loop-memory.md:52:    %addr = pto.vag %stride : i32 -> !pto.vaddr<b32>
exec
/bin/bash -lc "sed -n '460,500p' docs/designs/vpto-vector-address-ops.md" in /tmp/ptoas-pr-review-monitor/runs/20260630_200529_pr871/repo succeeded in 0ms:

### 4.1 `pto.vald`

```mlir
%result = pto.vald %base[%addr] {dist = "DIST"}
  : !pto.ptr<T, ub>, !pto.vaddr<G> -> !pto.vreg<NxT>

Semantics:

result = load_vector(base + addr, dist)

Verifier constraints:

  • %base must be a UB pointer or UB memref lowered to a VPTO pointer.
  • %addr must be !pto.vaddr<G>.
  • G must match the element or distribution granularity required by DIST.
  • DIST follows the existing pto.vlds distribution vocabulary.
    exec
    /bin/bash -lc "nl -ba lib/PTO/IR/VPTO.cpp | sed -n '4924,4948p'" in /tmp/ptoas-pr-review-monitor/runs/20260630_200529_pr871/repo succeeded in 0ms:
    4924 void ValduOp::getEffects(
    4925 SmallVectorImpl<SideEffects::EffectInstanceMemoryEffects::Effect>
    4926 &effects) {
    4927 effects.emplace_back(MemoryEffects::Read::get(), &getSourceMutable());
    4928 }
    4929
    4930 LogicalResult ValduOp::verify() {
    4931 if (!isBufferLike(getSource().getType()))
    4932 return emitOpError("requires a pointer-like source");
    4933 if (classifyMemoryRole(getSource().getType()) == MemoryRole::GM)
    4934 return emitOpError("requires a UB-backed source");
    4935 if (failed(verifyVAddrTypeLike(*this, getAddrIn().getType(), "addr_in type")) ||
    4936 failed(verifyVAddrTypeLike(*this, getAddrOut().getType(), "addr_out type")) ||
    4937 failed(verifyAlignTypeLike(*this, getAlignIn().getType(), "align_in type")) ||
    4938 failed(verifyAlignTypeLike(*this, getAlignOut().getType(), "align_out type")) ||
    4939 failed(verifyVRegTypeLike(*this, getResult().getType(), "result type")))
    4940 return failure();
    4941 if (getAddrIn().getType() != getAddrOut().getType())
    4942 return emitOpError("requires addr_in and addr_out to have identical types");
    4943 if (getAlignIn().getType() != getAlignOut().getType())
    exec
    /bin/bash -lc "nl -ba lib/PTO/IR/VPTO.cpp | sed -n '134,180p'" in /tmp/ptoas-pr-review-monitor/runs/20260630_200529_pr871/repo succeeded in 0ms:
    134 Type valueTy,
    135 StringRef opNameForDiag) {
    136 Type elemTy;
    137 if (auto pty = dyn_cast(ptrTy)) {
    138 elemTy = pty.getElementType();
    139 } else if (auto memTy = dyn_cast(ptrTy)) {
    140 elemTy = memTy.getElementType();
    141 } else {
    142 return op->emitOpError() << "expects " << opNameForDiag
    143 << " pointer operand to be !pto.ptr or memref";
    144 }
    145
    146 if (valueTy != elemTy) {
    147 return op->emitOpError() << "expects " << opNameForDiag
    148 << " value type to match pointer element type";
    149 }
    150 return success();
    151 }
    152
    153 static bool isMaskGranularityAdjacentWidening(StringRef inputGranularity,
    exec
    [monitor] stage codex-review exceeded timeout 1800s, terminating process group
    ===== END STAGE codex-review rc=-15 timeout=1800s @ 2026-06-30 20:36:42 =====

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants