Skip to content

feat: first stage of vmi#835

Draft
mouliangyu wants to merge 49 commits into
hw-native-sys:mainfrom
mouliangyu:feature-vmi
Draft

feat: first stage of vmi#835
mouliangyu wants to merge 49 commits into
hw-native-sys:mainfrom
mouliangyu:feature-vmi

Conversation

@mouliangyu

Copy link
Copy Markdown
Contributor

No description provided.

@reedhecre

reedhecre commented Jun 18, 2026

Copy link
Copy Markdown

Codex Review

该评论由 review 机器人自动更新。

  • PR: feat: first stage of vmi #835 feat: first stage of vmi
  • Author: mouliangyu
  • Base/Head: main / feature-vmi
  • Head SHA: 7bf010892f96
  • Trigger: PR 有新提交
  • Generated At: 2026-07-01T09:51:23Z
  • Previous Head SHA: 46dc35ef2cc6
  • Status: failed at codex-review (exit=1)

Summary

Review failed at stage codex-review: exit=1

Findings

未生成结构化 findings,因为 review 过程提前失败。

Log Tail

1480:                                               switchOp.getCaseOperands())) {
1491:      if (auto callOp = dyn_cast<func::CallOp>(op)) {
1496:      if (op->getName().getStringRef() == "func.call_indirect") {
1505:      if (auto funcOp = dyn_cast<func::FuncOp>(op)) {
1510:                 "external ABI materialization plan";
1526:  LogicalResult addIfConstraints(scf::IfOp ifOp) {
exec
/bin/bash -lc "sed -n '1488,1715p' lib/PTO/Transforms/VMILayoutAssignment.cpp" in /tmp/ptoas-pr-review-monitor/runs/20260701_174532_pr835/repo succeeded in 0ms:
          return WalkResult::interrupt();
        return WalkResult::advance();
      }
      if (auto callOp = dyn_cast<func::CallOp>(op)) {
        if (failed(addCallConstraints(callOp)))
          return WalkResult::interrupt();
        return WalkResult::advance();
      }
      if (op->getName().getStringRef() == "func.call_indirect") {
        if (hasVMIValueTypes(op)) {
          op->emitError()
              << kVMIDiagLayoutContractPrefix
              << "VMI typed call requires a direct internal callee with a body";
          return WalkResult::interrupt();
        }
        return WalkResult::advance();
      }
      if (auto funcOp = dyn_cast<func::FuncOp>(op)) {
        if (funcOp.empty() && hasVMIFunctionType(funcOp)) {
          funcOp.emitError()
exec
/bin/bash -lc "grep -R \"Vgather2BcOp\\|vgather2_bc\\|vgather2.bc\\|Vgather2Op\" -n lib/PTO/Transforms/VMIToVPTO.cpp lib/PTO/IR/VPTO.cpp test/lit/vmi | sed -n '1,260p'" in /tmp/ptoas-pr-review-monitor/runs/20260701_174532_pr835/repo succeeded in 0ms:
lib/PTO/Transforms/VMIToVPTO.cpp:1502:      op.getSource().getType(), "source", "pto.vgather2_bc",
lib/PTO/Transforms/VMIToVPTO.cpp:1503:      "pto.vgather2_bc reads only UB");
lib/PTO/Transforms/VMIToVPTO.cpp:1797:      op.getSource().getType(), "source", "pto.vgather2_bc",
lib/PTO/Transforms/VMIToVPTO.cpp:1798:      "pto.vgather2_bc reads only UB");
lib/PTO/Transforms/VMIToVPTO.cpp:5356:                                 .create<Vgather2Op>(op.getLoc(), resultType,
lib/PTO/Transforms/VMIToVPTO.cpp:5360:                                 .create<Vgather2BcOp>(op.getLoc(), resultType,
lib/PTO/Transforms/VMIToVPTO.cpp:5460:            .create<Vgather2BcOp>(op.getLoc(), resultType, gatherBase, indices,
lib/PTO/Transforms/VMIToVPTO.cpp:9853:          << "pto.vmi.gather lowers through pto.vgather2_bc + pto.vsel only "
lib/PTO/Transforms/VMIToVPTO.cpp:9868:             "+ pto.vgather2_bc + pto.vsel ("
lib/PTO/IR/VPTO.cpp:4364:void Vgather2Op::getEffects(
lib/PTO/IR/VPTO.cpp:4370:LogicalResult Vgather2Op::verify() {
lib/PTO/IR/VPTO.cpp:4489:void Vgather2BcOp::getEffects(
lib/PTO/IR/VPTO.cpp:4495:LogicalResult Vgather2BcOp::verify() {
test/lit/vmi/vmi_to_vpto_gather_scatter_shape_invalid.pto:27:// CHECK: VMI{{-}}UNSUPPORTED{{:}} pto.vmi.gather lowers through pto.vgather2_bc + pto.vsel only
test/lit/vmi/vmi_to_vpto_gather_scatter_shape_invalid.pto:48:// CHECK: VMI{{-}}UNSUPPORTED{{:}} pto.vmi.gather lowers through pto.vgather2_bc + pto.vsel only
test/lit/vmi/vmi_to_vpto_expand_load_runtime_mask.pto:36:// CHECK: %[[LOAD:.*]] = pto.vgather2_bc %[[BASE]], %[[IDX]], %arg2 : !pto.ptr<f32, ub>, !pto.vreg<64xi32>, !pto.mask<b32> -> !pto.vreg<64xf32>
test/lit/vmi/vmi_to_vpto_gather_f16_invalid.pto:27:// CHECK: VMI{{-}}UNSUPPORTED{{:}} pto.vmi.gather lowers through pto.vgather2_bc + pto.vsel only
test/lit/vmi/vmi_to_vpto_gather.pto:32:// CHECK: %[[GATHER:.*]] = pto.vgather2_bc %arg0, %arg1, %arg2 : !pto.ptr<f32, ub>, !pto.vreg<64xi32>, !pto.mask<b32> -> !pto.vreg<64xf32>
exec
/bin/bash -lc "sed -n '4360,4525p' lib/PTO/IR/VPTO.cpp" in /tmp/ptoas-pr-review-monitor/runs/20260701_174532_pr835/repo succeeded in 0ms:
    return emitOpError("index type must match result element type");
  return success();
}

void Vgather2Op::getEffects(
    SmallVectorImpl<SideEffects::EffectInstance<MemoryEffects::Effect>>
        &effects) {
  effects.emplace_back(MemoryEffects::Read::get(), &getSourceMutable());
}

LogicalResult Vgather2Op::verify() {
  if (!isBufferLike(getSource().getType()))
    return emitOpError("requires a pointer-like source");
  MemoryRole sourceRole = classifyMemoryRole(getSource().getType());
  if (sourceRole == MemoryRole::GM)
    return emitOpError("requires a UB-backed source");

  auto offsetsType = dyn_cast<VRegType>(getOffsets().getType());
  auto resultType = dyn_cast<VRegType>(getResult().getType());
  if (!offsetsType || !resultType)
Reconnecting... 1/5 (unexpected status 403 Forbidden: insufficient balance, url: https://codex.0u0o.com/responses, cf-ray: a14482d0dcd4b152-LAX, request id: 50d1c37e-4d2a-4cf9-b51d-d5b31642f137)
Reconnecting... 2/5 (unexpected status 403 Forbidden: insufficient balance, url: https://codex.0u0o.com/responses, cf-ray: a14482d60b51e9e3-LAX, request id: d113ac14-684f-48e0-8195-71eccc57f42c)
Reconnecting... 3/5 (unexpected status 403 Forbidden: insufficient balance, url: https://codex.0u0o.com/responses, cf-ray: a14482dc5d96cb9e-LAX, request id: deb26fad-774d-44d5-8cf6-f61a4488801f)
Reconnecting... 4/5 (unexpected status 403 Forbidden: insufficient balance, url: https://codex.0u0o.com/responses, cf-ray: a14482e6aa6b5730-LAX, request id: 65b7490a-a4eb-43fa-9103-e0e692299d07)
Reconnecting... 5/5 (unexpected status 403 Forbidden: {"code":"INSUFFICIENT_BALANCE","message":"Insufficient account balance"}, url: https://codex.0u0o.com/responses, cf-ray: a14482f4f81ef7e5-LAX, request id: 339c0d46-a820-41d7-bd38-3affa5328a82)
ERROR: unexpected status 403 Forbidden: {"code":"INSUFFICIENT_BALANCE","message":"Insufficient account balance"}, url: https://codex.0u0o.com/responses, cf-ray: a144830cfbe309cf-LAX, request id: ded2318c-ae9a-4129-8e26-b4e68dfb759f
Warning: no last agent message; wrote empty content to /tmp/ptoas-pr-review-monitor/runs/20260701_174532_pr835/codex_last_message.json
tokens used
139,104
===== END STAGE codex-review rc=1 @ 2026-07-01 17:51:23 =====

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the design and initial implementation of the VMI (Virtual Machine Interface) dialect, including its operations, types, attributes, layout assignment, and validation passes, along with extensive test coverage. The review feedback highlights several critical issues: potential compilation errors in VMILayoutAssignment.cpp due to missing overloads for MutableOperandRange in requestDataUse and requestMaskUse, a logical bug in rewriteFunctionType where stale return types could be used from firstReturnOperandsByFunc instead of dynamically querying the up-to-date ReturnOp, and a potential compilation issue in VMI.cpp when using dyn_cast_or_null on the mlir::Attribute value type.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +261 to +264

void requestDataUse(OpOperand &operand, VMILayoutAttr layout) {
if (isa<VMIVRegType>(operand.get().getType()))
dataUseRequests.push_back(DataUseRequest{&operand, layout});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The requestDataUse function is called with MutableOperandRange arguments (e.g., compress.getSourceMutable()), but it is only defined to accept OpOperand &. This will cause a compilation error because MutableOperandRange does not implicitly convert to OpOperand &. Adding an overload that accepts MutableOperandRange and forwards the first element resolves this issue.

  void requestDataUse(OpOperand &operand, VMILayoutAttr layout) {
    if (isa<VMIVRegType>(operand.get().getType()))
      dataUseRequests.push_back(DataUseRequest{&operand, layout});
  }

  void requestDataUse(MutableOperandRange range, VMILayoutAttr layout) {
    if (!range.empty())
      requestDataUse(range[0], layout);
  }

Comment on lines +294 to +305
LogicalResult requestMaskUse(OpOperand &operand, VMILayoutAttr layout,
StringRef granularity, Operation *op) {
if (!isa<VMIMaskType>(operand.get().getType()))
return success();
if (!layout || granularity.empty())
return op->emitError()
<< kVMIDiagLayoutContractPrefix
<< "cannot infer concrete mask use layout or granularity";
maskUseRequests.push_back(
MaskUseRequest{&operand, layout, granularity.str()});
return success();
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

To support passing MutableOperandRange directly to requestMaskUse (such as gather.getMaskMutable()), add an overload that accepts MutableOperandRange and forwards the first element.

  LogicalResult requestMaskUse(OpOperand &operand, VMILayoutAttr layout,
                               StringRef granularity, Operation *op) {
    if (!isa<VMIMaskType>(operand.get().getType()))
      return success();
    if (!layout || granularity.empty())
      return op->emitError()
             << kVMIDiagLayoutContractPrefix
             << "cannot infer concrete mask use layout or granularity";
    maskUseRequests.push_back(
        MaskUseRequest{&operand, layout, granularity.str()});
    return success();
  }

  LogicalResult requestMaskUse(MutableOperandRange range, VMILayoutAttr layout,
                               StringRef granularity, Operation *op) {
    if (range.empty())
      return success();
    return requestMaskUse(range[0], layout, granularity, op);
  }

Comment on lines +1260 to +1263
auto it = firstReturnOperandsByFunc.find(func);
if (it != firstReturnOperandsByFunc.end()) {
for (Value operand : it->second)
results.push_back(operand.getType());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In rewriteFunctionType, reading the return types from firstReturnOperandsByFunc can lead to stale types. If any return operand was replaced or materialized during insertDataUseMaterializations or insertMaskUseMaterializations (via request.operand->set(...)), the ReturnOp's operands will point to new values, but firstReturnOperandsByFunc will still hold the old values. This causes a mismatch between the rewritten function signature and the actual returned values. Dynamically walking the function to find the first ReturnOp ensures we always use the up-to-date types.

      func::ReturnOp returnOp;
      func.walk([&](func::ReturnOp op) {
        if (!returnOp)
          returnOp = op;
      });
      if (returnOp) {
        for (Value operand : returnOp.getOperands())
          results.push_back(operand.getType());
      } else {

Comment thread lib/PTO/IR/VMI.cpp
Comment on lines +136 to +138
auto layoutAttr = dyn_cast_or_null<VMILayoutAttr>(layout);
if (!layoutAttr)
return failure();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using dyn_cast_or_null on a value type like mlir::Attribute can cause compilation issues in some LLVM/MLIR versions because it expects a pointer type. It is safer and more idiomatic to check if the attribute is non-null first, and then use dyn_cast.

Suggested change
auto layoutAttr = dyn_cast_or_null<VMILayoutAttr>(layout);
if (!layoutAttr)
return failure();
if (!layout)
return failure();
auto layoutAttr = dyn_cast<VMILayoutAttr>(layout);

@mouliangyu mouliangyu marked this pull request as draft June 18, 2026 01:40

@zhangstevenunity zhangstevenunity left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the core of this first VMI stage — the dialect (VMI.cpp, VMITypeDefs/VMIOps.td), VMILayoutAssignment, and VMIToVPTO — against the VPTO contracts.

The design holds up well: the closed-set layout model (contiguous / deinterleaved={2,4}) with a precise lane-map, the union-find layout assignment that hard-errors on genuine conflicts, and the two-tier "permissive verifier + strict lowering-capability gate" all read as deliberate, and unsupported cases are generally bailed explicitly rather than silently miscompiled.

Three concrete correctness issues below, all on currently-untested paths:

  1. addWhileConstraints mis-aligns the two scf.while carry groups.
  2. truncf hardcodes rnd="R", emitting an illegal pto.vcvt for f32 -> HiFloat8.
  3. Float iota builds a FloatAttr with mismatched (IEEEdouble) semantics.

(Layout-coalescing cost in loops, integer width conversion, and the histogram/half layout appear to be deliberately out of first-stage scope per the design docs, so not raised here.)

if (index < beforeArgs.size() &&
failed(uniteEquivalentValues(anchor, beforeArgs[index], whileOp)))
return failure();
if (conditionOp && index < conditionOp.getArgs().size() &&

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addWhileConstraints mis-aligns the two independent scf.while carry groups.

scf.while has two separate carries:

  • before group: inits[i] == beforeArgs[i] == the after-region scf.yield operand i
  • after group: scf.condition operand j == afterArg[j] == result[j]

These lists can differ in arity and element type (the before region computes the forwarded scf.condition operands). This loop anchors all five lists on the single inits-indexed anchor, so conditionOp.getArgs()[index] (L864), afterBlock.getArgument(index) (L868) and whileOp.getResult(index) (L876) get united against the wrong carry group. For any non-pass-through scf.while this yields a spurious "conflicting layouts" error, or silently forces one carry's layout onto an unrelated value. Only the degenerate 1-element pass-through is tested (vmi_layout_assignment_scf_while.pto).

Suggest two independent unions: the before group over {inits[i], beforeArgs[i], yield.getOperand(i)}, and the after group over {conditionOp.getArgs()[j], afterArgs[j], results[j]} — the after group must not be anchored on inits.

return rewriter.notifyMatchFailure(op,
"failed to build truncf masks");

StringAttr rnd = rewriter.getStringAttr("R");

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

truncf hardcodes rnd = "R", but the 8-bit (P0-P3) branch is shared by fp8 and HiFloat8, and the pto.vcvt contract for f32 -> HiFloat8 requires rnd in {A, H} (lookupVcvtContract(F32, HiF8) -> allowedRnd="AH", enforced by VcvtOp::verify).

VMITruncFOp::verify accepts an hif8 result and checkSupportedTruncFShape checks only result bit-width/factor (not element type), so pto.vmi.truncf : ...f32 -> ...hif8 (deinterleaved=4 source) passes every VMI gate and then emits a pto.vcvt {rnd="R"} that fails its own verifier. No test covers f32 -> hif8 (existing truncf tests are f32->f16 / f32->fp8E*).

Suggest selecting rnd from the target type (R for f8E4M3/f8E5M2/f16/bf16; A or H for HiFloat8), or rejecting HiFloat8 in checkSupportedTruncFShape with a clear unsupported error.

Comment thread lib/PTO/Transforms/VMIToVPTO.cpp Outdated
if (auto floatType = dyn_cast<FloatType>(type)) {
return rewriter
.create<arith::ConstantOp>(
loc, FloatAttr::get(floatType,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llvm::APFloat(static_cast<double>(value)) carries IEEEdouble semantics, but floatType here is f16/bf16/f32. FloatAttr::get(Type, APFloat) requires the value's semantics to match the type (verifyInvariants checks &type.getFloatSemantics() == &value.getSemantics()), so this asserts in an asserts-enabled build (and yields an invalid attribute otherwise). Reachable for any non-f64 float iota with a nonzero lane offset (e.g. vmi_to_vpto_iota_f16_deint2_asc) and via the factor/zero constants in createIotaDeinterleavedChunk.

Use the semantics-aware overload FloatAttr::get(floatType, static_cast<double>(value)) (which converts to the type's semantics), or build APFloat(floatType.getFloatSemantics(), ...) and round.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants