Skip to content

clear residual op in vpto emit#855

Open
Likai-19 wants to merge 1 commit into
hw-native-sys:mainfrom
Likai-19:vpto_clear_residual_op
Open

clear residual op in vpto emit#855
Likai-19 wants to merge 1 commit into
hw-native-sys:mainfrom
Likai-19:vpto_clear_residual_op

Conversation

@Likai-19

Copy link
Copy Markdown

问题总结

VPTO lowering pipeline 完成后,module 中残留的 tile 元数据 op 阻塞了 LLVM 导出。这些 op 分为两类:

  1. 语义已完成但未清理的死 oppto.set_valid_shape(始终无 user)、无 user 的 pto.get_valid_shape、无 user 的 pto.treshape、无 user 的 pto.alloc_tile、无 user 的 UnrealizedConversionCastOp

  2. 可被折叠的 cast 链pto.get_valid_shapeUnrealizedConversionCastOp → LLVM i64 的 cast 链,其等价的 LLVM i64 值已存在于原始 pto.alloc_tile 的 valid row/col 属性中,但未被复用。

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces folding for pto.get_validshape intrinsics into materialized tile handles, along with erasing unused pto.set_validshape operations and performing dead code elimination (DCE) on leftover tile-handle view and allocation operations. The review feedback suggests optimizing the iterative DCE implementation to a worklist-based approach to improve complexity from $O(N \times M)$ to $O(N + M)$, and adding a safety check to ensure UnrealizedConversionCastOp has exactly one result before accessing it.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +680 to +699
bool tileDceChanged = true;
while (tileDceChanged) {
tileDceChanged = false;
SmallVector<Operation *, 8> deadTileOps;
func.walk([&](Operation *op) {
if (!op->use_empty())
return;
if (isa<pto::TReshapeOp, pto::MaterializeTileOp, pto::AllocTileOp>(op))
deadTileOps.push_back(op);
else if (auto castOp = dyn_cast<UnrealizedConversionCastOp>(op)) {
if (castOp.getNumOperands() == 1 &&
isa<pto::TileBufType>(castOp.getResult(0).getType()))
deadTileOps.push_back(op);
}
});
for (auto *op : llvm::reverse(deadTileOps)) {
op->erase();
tileDceChanged = true;
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current iterative DCE implementation performs a full function walk (func.walk) inside a while loop for every level of dead tile operations. For a chain of $N$ dead operations, this results in $O(N \times M)$ complexity (where $M$ is the number of operations in the function).

We can optimize this to $O(N + M)$ by using a worklist-based approach. When an operation is erased, we check if its operands have become dead and add them to the worklist if they match the target types. Additionally, we should verify that UnrealizedConversionCastOp has exactly 1 result before accessing getResult(0) to prevent potential out-of-bounds assertions.

    SmallVector<Operation *, 8> worklist;
    func.walk([&](Operation *op) {
      if (!op->use_empty())
        return;
      if (isa<pto::TReshapeOp, pto::MaterializeTileOp, pto::AllocTileOp>(op)) {
        worklist.push_back(op);
      } else if (auto castOp = dyn_cast<UnrealizedConversionCastOp>(op)) {
        if (castOp.getNumOperands() == 1 && castOp.getNumResults() == 1 &&
            isa<pto::TileBufType>(castOp.getResult(0).getType()))
          worklist.push_back(op);
      }
    });

    while (!worklist.empty()) {
      Operation *op = worklist.pop_back_val();
      SmallVector<Value, 4> operands(op->getOperands());
      op->erase();
      for (Value operand : operands) {
        if (auto *defOp = operand.getDefiningOp()) {
          if (!defOp->use_empty())
            continue;
          if (isa<pto::TReshapeOp, pto::MaterializeTileOp, pto::AllocTileOp>(defOp)) {
            worklist.push_back(defOp);
          } else if (auto castOp = dyn_cast<UnrealizedConversionCastOp>(defOp)) {
            if (castOp.getNumOperands() == 1 && castOp.getNumResults() == 1 &&
                isa<pto::TileBufType>(castOp.getResult(0).getType()))
              worklist.push_back(defOp);
          }
        }
      }
    }

@Likai-19 Likai-19 force-pushed the vpto_clear_residual_op branch from 6c34f51 to eb6e8d3 Compare June 23, 2026 06:40
@reedhecre

reedhecre commented Jun 23, 2026

Copy link
Copy Markdown

Codex Review

该评论由 review 机器人自动更新。

  • PR: clear residual op in vpto emit #855 clear residual op in vpto emit
  • Author: Likai-19
  • Base/Head: main / vpto_clear_residual_op
  • Head SHA: b8cc3802fb00
  • Trigger: PR 有新提交
  • Generated At: 2026-06-29T17:11:33Z
  • Previous Head SHA: eb6e8d3849d1
  • Status: completed

Summary

PR #855 在清理 residual valid-shape ops 时引入了一个会返回错误 valid shape 的回归,并且破坏了 addr-only 模式的公开契约。

Findings

  1. P1 `pto.get_validshape` 折叠没有保留 direct tile 上的 `set_validshape` 更新 lib/PTO/Transforms/FoldTileBufIntrinsics.cpp:423

这里把所有 tile_buf 形式的 pto.get_validshape 都改写成 resolveTileHandle(...)->validRow/validCol。但这个 resolver 只会在 pto.treshape 分支里尝试读取 set_validshape 覆盖值;对直接来自 pto.alloc_tile / pto.materialize_tile 的 tile,它仍然返回分配时/物化时的 valid_row/valid_col。于是像 pto.set_validshape %tile, %r, %c 之后再 pto.get_validshape %tile 这样的合法序列,会被错误折叠回旧值;随后 759-766 行又把真正更新 metadata 的 pto.set_validshape 删掉,最终静默产生错误结果。GetValidShapeOp 的契约就是读取当前 runtime valid shape,所以这是直接的 correctness 回归。

  1. P2 `addr-only` 模式错误地删除了 shape metadata writer lib/PTO/Transforms/FoldTileBufIntrinsics.cpp:759

fold-mode=addr-only 现在不再是纯地址折叠。417-453 行在 addr family 下额外处理了 pto.get_validshape,759-766 行还会无条件删除所有 pto.set_validshape;但同一个 pass 在 addr-only 模式下并不会折叠 pto.tile_valid_rows/cols。结果是单独运行 createFoldTileBufIntrinsicsPass("addr-only") 后,IR 里仍可能保留 shape readers,却已经丢掉了提供最新 valid shape 的 writer,后续 shape-only fold 或自定义 pipeline 会看到陈旧 metadata。考虑到 fold-mode 是公开选项,这属于明确的 contract / compatibility regression。

@Likai-19 Likai-19 force-pushed the vpto_clear_residual_op branch from eb6e8d3 to b8cc380 Compare June 29, 2026 06:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants