Skip to content

feat: preserve debug name hints in ptoas emitc#658

Open
HecreReed wants to merge 9 commits into
hw-native-sys:mainfrom
HecreReed:issue337-debug-name-hints
Open

feat: preserve debug name hints in ptoas emitc#658
HecreReed wants to merge 9 commits into
hw-native-sys:mainfrom
HecreReed:issue337-debug-name-hints

Conversation

@HecreReed

@HecreReed HecreReed commented May 12, 2026

Copy link
Copy Markdown
Collaborator

实现 issue #337 的调试命名提示链路:

  • 前端通过 Location 承载名字提示
  • ptoas 在 emitc lowering 后重写局部变量名,尽量保留原始语义名字
  • 补充设计文档与回归用例
    2026-7-1

编译器CSE上有部分显示,目前按照issue的要求,实现以下内容:

  1. 变量名保留(name hint)

前端可以在 IR 的 Location 里写 loc("query_tile"),PTOAS 在生成 kernel.cpp 时会把对应的 vN 重命名为 query_tile。如果名字含非法字符就规范化(空格→下划线、关键字→加 _v),冲突就加 _1/_2。
有两套输入路径:

textual .pto:自动从源码文本里把 SSA 名(%sum、%call_result)和 block argument 名(%merged_value)提取出来挂到 IR 上,无需前端额外标注
前端通过 API 传:用 loc("name") 的方式透传 Python 侧的变量名

  1. 溯源注释(provenance)

遇到纯数字 SSA 名(%0、%24),它不会变成 v0、v24(因为 CSE 会打乱编号,对不上了),而是变成 _0 并在行尾加注释 // pto: %0。这样你在 kernel.cpp 里看到任意一行,都能直接知道它在 .pto 里的来源。
设计选择:整个命名逻辑放在 translateToCpp 之后的 C++ 字符串后处理阶段,而不是改造 EmitC emitter。PTOAS 在 IR 里插 /* PTOAS_NAME_HINTS:xxx / 和 / PTOAS_PROVENANCE:xxx */ 标记,后处理阶段解析这些标记做重命名 + 注释生成,最后把标记行删干净。

一个大的例子:

func.func @matmul_bias_relu(
    %A_global: memref<256x256xf16>,
    %B_global: memref<256x256xf16>,
    %bias_global: memref<256xf16>,
    %C_global: memref<256x256xf16>
) {
  // 创建 tile
  %tileA = pto.tile_alloc { tile_shape = #pto.tile<16x64> } : 
    !pto.tile<f16, 16x64> loc("tileA")
  %tileB = pto.tile_alloc { tile_shape = #pto.tile<64x16> } : 
    !pto.tile<f16, 64x16> loc("tileB")
  %tileC = pto.tile_alloc { tile_shape = #pto.tile<16x16> } : 
    !pto.tile<f16, 16x16> loc("tileC_acc")
  %bias = pto.tile_alloc { tile_shape = #pto.tile<16x1> } : 
    !pto.tile<f16, 16x1> loc("bias_tile")

  scf.for %i = %c0 to %c256 step %c16 {
    scf.for %j = %c0 to %c256 step %c64 {
      // 搬运 A 的一块
      pto.tload %tileA, %A_global[%i, %j] : ... loc("load_A")
      
      scf.for %k = %c0 to %c256 step %c64 {
        pto.tload %tileB, %B_global[%k, %j] : ... loc("load_B")
        
        %partial = pto.tmatmul %tileA, %tileB -> %tileC_accum : ...
          loc("matmul_step")
      }

      // 搬运 bias
      pto.tload %bias_tile, %bias_global[%i] : ... loc("load_bias")
      
      // bias add
      %result = pto.tadd %tileC_accum, %bias_tile : ... loc("bias_add")
      
      // relu
      %zero = pto.constant 0.0 : f16 loc("zero_const")
      %activated = pto.tmax %result, %zero : ... loc("relu_out")
      
      // 写回
      pto.tstore %activated, %C_global[%i, %j] : ... loc("store_result")
    }
  }
  return
}

会变成:

AICORE void matmul_bias_relu(
    __gm__ half* A_global, __gm__ half* B_global,
    __gm__ half* bias_global, __gm__ half* C_global) {

  int64_t _0 = get_block_idx();  // pto: %4
  __gm__ half* _1 = A_global + _0 * 65536;   // pto: %5
  __gm__ half* _2 = B_global + _0 * 65536;   // pto: %6
  __gm__ half* _3 = bias_global + _0 * 256;  // pto: %7
  __gm__ half* _4 = C_global + _0 * 65536;   // pto: %8

  const int64_t c0 = 0;       // pto: %c0
  const int64_t c256 = 256;   // pto: %c256
  const int64_t c16 = 16;     // pto: %c16
  const int64_t c64 = 64;     // pto: %c64

  for (size_t i = c0; i < c256; i += c16) {            // pto: %i
    for (size_t j = c0; j < c256; j += c64) {            // pto: %j
      Tile<TileType::Vec, half, 16, 64, ...> tileA = ...;
      TLOAD(tileA, _1 + i * 64);                          // pto: %tileA
      
      for (size_t k = c0; k < c256; k += c64) {          // pto: %k
        Tile<TileType::Vec, half, 64, 16, ...> tileB = ...;
        TLOAD(tileB, _2 + k * 256 + j);                   // pto: %tileB
        
        Tile<TileType::Mat, half, 16, 16, ...> tileC_accum;
        TMATMUL(tileC_accum, tileA, tileB);               // pto: %matmul_step
      }

      Tile<TileType::Vec, half, 16, 1, ...> bias_tile = ...;
      TLOAD(bias_tile, _3 + i);                           // pto: %bias_tile
      
      Tile<TileType::Vec, half, 16, 16, ...> bias_add;
      TADD(bias_add, tileC_accum, bias_tile);             // pto: %bias_add
      
      const half zero_const = 0.0f;                       // pto: %zero_const
      Tile<TileType::Vec, half, 16, 16, ...> relu_out;
      TMAX(relu_out, bias_add, zero_const);               // pto: %relu_out
      
      TSTORE(relu_out, _4 + i * 256 + j);                 // pto: %store_result
    }
  }
}

部分通过注释形式追溯

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements variable name preservation from PTO IR to the generated C++ code to improve debuggability. It introduces a design document, test cases, and logic in ptoas to extract name hints from Location metadata, annotate the IR, and post-process the C++ output to replace generic variable names with hinted ones. Feedback includes addressing potential name collisions by populating the used names set, optimizing keyword lookups using llvm::StringSwitch, improving the performance of marker stripping from O(N^2) to O(N), and replacing magic numbers with named constants.

Comment thread tools/ptoas/ptoas.cpp Outdated
static void rewriteNameHintMarkers(std::string &cpp) {
constexpr llvm::StringLiteral kMarkerPrefix = "/* PTOAS_NAME_HINTS:";
llvm::StringMap<std::string> replacements;
std::set<std::string> usedNames;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The usedNames set is currently empty when renaming starts. This means that if a name hint conflicts with an existing identifier in the C++ code (such as a function parameter, a global variable, or a local variable that didn't have a hint), makeUniqueCppIdentifier will not detect the collision. You should populate usedNames with all existing identifiers in the cpp string before starting the renaming process.

Comment thread tools/ptoas/ptoas.cpp
Comment on lines +324 to +344
static bool isReservedCppIdentifier(llvm::StringRef name) {
static const std::set<std::string> kReserved = {
"alignas", "alignof", "asm", "auto", "bool",
"break", "case", "catch", "char", "char8_t",
"char16_t", "char32_t", "class", "const", "consteval",
"constexpr", "constinit", "const_cast","continue", "co_await",
"co_return", "co_yield", "decltype", "default", "delete",
"do", "double", "dynamic_cast", "else", "enum",
"explicit", "export", "extern", "false", "float",
"for", "friend", "goto", "if", "inline",
"int", "long", "mutable", "namespace", "new",
"noexcept", "nullptr", "operator", "private", "protected",
"public", "register", "reinterpret_cast", "requires",
"return", "short", "signed", "sizeof", "static",
"static_assert", "static_cast", "struct", "switch", "template",
"this", "thread_local", "throw", "true", "try",
"typedef", "typeid", "typename", "union", "unsigned",
"using", "virtual", "void", "volatile", "wchar_t",
"while"};
return kReserved.count(name.str()) != 0;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using std::set<std::string> for keyword lookup is functional but less efficient than using llvm::StringSwitch. In LLVM-based tools, StringSwitch is more idiomatic and provides better performance for small sets of strings.

static bool isReservedCppIdentifier(llvm::StringRef name) {
  return llvm::StringSwitch<bool>(name)
      .Cases("alignas", "alignof", "asm", "auto", "bool", true)
      .Cases("break", "case", "catch", "char", "char8_t", true)
      .Cases("char16_t", "char32_t", "class", "const", "consteval", true)
      .Cases("constexpr", "constinit", "const_cast", "continue", "co_await", true)
      .Cases("co_return", "co_yield", "decltype", "default", "delete", true)
      .Cases("do", "double", "dynamic_cast", "else", "enum", true)
      .Cases("explicit", "export", "extern", "false", "float", true)
      .Cases("for", "friend", "goto", "if", "inline", true)
      .Cases("int", "long", "mutable", "namespace", "new", true)
      .Cases("noexcept", "nullptr", "operator", "private", "protected", true)
      .Cases("public", "register", "reinterpret_cast", "requires", true)
      .Cases("return", "short", "signed", "sizeof", "static", true)
      .Cases("static_assert", "static_cast", "struct", "switch", "template", true)
      .Cases("this", "thread_local", "throw", "true", "try", true)
      .Cases("typedef", "typeid", "typename", "union", "unsigned", true)
      .Cases("using", "virtual", "void", "volatile", "wchar_t", true)
      .Case("while", true)
      .Default(false);
}

Comment thread tools/ptoas/ptoas.cpp Outdated
Comment on lines +937 to +956
static void stripNameHintMarkers(std::string &cpp) {
constexpr llvm::StringLiteral kMarkerPrefix = "/* PTOAS_NAME_HINTS:";
size_t searchPos = 0;
while (true) {
size_t markerPos = cpp.find(kMarkerPrefix.str(), searchPos);
if (markerPos == std::string::npos)
break;

size_t markerEnd = cpp.find("*/", markerPos + kMarkerPrefix.size());
if (markerEnd == std::string::npos)
break;
markerEnd += 2;
while (markerEnd < cpp.size() &&
(cpp[markerEnd] == '\r' || cpp[markerEnd] == '\n'))
++markerEnd;

cpp.erase(markerPos, markerEnd - markerPos);
searchPos = markerPos;
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of stripNameHintMarkers has $O(N^2)$ time complexity in the worst case because std::string::erase shifts all subsequent characters on every call. For large generated C++ files, this can become a performance bottleneck. It is better to perform the stripping in a single $O(N)$ pass, possibly integrated into the main replacement loop.

Comment thread tools/ptoas/ptoas.cpp Outdated
if (!hints)
continue;

size_t windowEnd = std::min(searchPos + static_cast<size_t>(2048),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The search window size 2048 is a magic number. While likely sufficient for most cases, it should be defined as a named constant (e.g., kSearchWindowSize) to improve maintainability and clarify the intent.

Suggested change
size_t windowEnd = std::min(searchPos + static_cast<size_t>(2048),
constexpr size_t kSearchWindowSize = 2048;
size_t windowEnd = std::min(searchPos + kSearchWindowSize, cpp.size());

@reedhecre

reedhecre commented May 12, 2026

Copy link
Copy Markdown

Codex Review

该评论由 review 机器人自动更新。

  • PR: feat: preserve debug name hints in ptoas emitc #658 feat: preserve debug name hints in ptoas emitc
  • Author: HecreReed
  • Base/Head: main / issue337-debug-name-hints
  • Head SHA: 0c42ce3db187
  • Trigger: PR 有新提交
  • Generated At: 2026-07-01T16:00:52Z
  • Previous Head SHA: 7cb513546f85
  • Status: failed at codex-review (exit=1)

Summary

Review failed at stage codex-review: exit=1

Findings

未生成结构化 findings,因为 review 过程提前失败。

Log Tail

 test/lit/pto/trecip_precision_emitc.pto            |    2 +-
 test/lit/pto/trem_emitc.pto                        |    2 +-
 test/lit/pto/trem_precision_emitc.pto              |    2 +-
 test/lit/pto/treshape_static_valid_shape_emitc.pto |    6 +-
 test/lit/pto/trowexpandadd_tmp_emitc.pto           |    4 +-
 test/lit/pto/trowexpanddiv_precision_emitc.pto     |    4 +-
 test/lit/pto/trsqrt_emitc.pto                      |    4 +-
 test/lit/pto/trsqrt_precision_emitc.pto            |    4 +-
 test/lit/pto/tsort32_emitc.pto                     |    4 +-
 test/lit/pto/tsqrt_precision_emitc.pto             |    2 +-
 test/lit/pto/tstore_forms_emitc.pto                |    6 +-
 .../deepseek_v4_decode_golden_lib.py               |   27 +-
 .../deepseek_v4_decode_golden_lib.py               |   27 +-
 test/samples/Gemvmx/gemvmx_golden.py               |   17 +-
 .../matmul_mx_low_precision_golden.py              |   21 +-
 .../Qwen3DecodeA3/qwen3_decode_golden_lib.py       |   30 +
 .../Qwen3DecodeA5/qwen3_decode_golden_lib.py       |   30 +
 tools/ptoas/driver.cpp                             |    9 +-
 tools/ptoas/ptoas.cpp                              | 1529 +++++++++++++++++++-
 tools/ptoas/ptoas.h                                |    6 +
 97 files changed, 2574 insertions(+), 369 deletions(-)
===== END STAGE clone rc=0 @ 2026-07-02 00:00:42 =====

===== STAGE codex-review @ 2026-07-02 00:00:42 =====
set -euo pipefail
cd '/tmp/ptoas-pr-review-monitor/runs/20260702_000034_pr658/repo'
'codex' exec -C '/tmp/ptoas-pr-review-monitor/runs/20260702_000034_pr658/repo' -s read-only -c 'model_provider="codereview"' -c 'model="gpt-5.4"' -c 'model_reasoning_effort="xhigh"' --output-schema '/tmp/ptoas-pr-review-monitor/runs/20260702_000034_pr658/review_schema.json' -o '/tmp/ptoas-pr-review-monitor/runs/20260702_000034_pr658/codex_last_message.json' --color never - < '/tmp/ptoas-pr-review-monitor/runs/20260702_000034_pr658/review_prompt.txt'
[monitor] stage timeout: 1800s
OpenAI Codex v0.115.0 (research preview)
--------
workdir: /tmp/ptoas-pr-review-monitor/runs/20260702_000034_pr658/repo
model: gpt-5.4
provider: codereview
approval: never
sandbox: read-only
reasoning effort: xhigh
reasoning summaries: none
session id: 019f1e69-44e1-7b22-8126-4fb6237b4012
--------
user
你现在在审查 GitHub PR。

仓库:hw-native-sys/PTOAS
PR:#658 feat: preserve debug name hints in ptoas emitc
作者:HecreReed
base branch:origin/main
head branch:HEAD(当前已 checkout 到 PR head)

要求:
1. 只审查这个 PR 相对 origin/main 的改动,必要时可以看上下文文件。
2. 重点找真实的 correctness / regression / contract mismatch / CI / runtime / compatibility 问题。
3. 不要提纯风格建议,不要提低价值猜测。
4. 严格按优先级输出:
   - P1:高概率会导致错误结果、编译/运行失败、严重回归、发布阻断
   - P2:重要缺陷、行为回归、遗漏校验/测试、较大兼容性问题
   - P3:次要但明确可改的问题
5. 如果没有问题,summary 直接写:未检查到 PR #658 存在问题,并返回 findings=[]。
6. 如果有问题,summary 简洁概括,findings 里每条都要给出:
   - severity
   - title
   - body(说明为什么是问题,尽量具体)
   - file(尽量给相对路径)
   - line(能确定就填整数,否则 null)

建议先查看:
- git status --short
- git diff --stat origin/main...HEAD
- git diff --unified=80 origin/main...HEAD

最终输出必须严格匹配 JSON schema。

mcp startup: no servers
Reconnecting... 1/5 (unexpected status 403 Forbidden: {"code":"INSUFFICIENT_BALANCE","message":"Insufficient account balance"}, url: https://codex.0u0o.com/responses, cf-ray: a146a011be9b8486-SJC, request id: eee8655f-e1af-4c74-9001-dac176bb7cc3)
Reconnecting... 2/5 (unexpected status 403 Forbidden: {"code":"INSUFFICIENT_BALANCE","message":"Insufficient account balance"}, url: https://codex.0u0o.com/responses, cf-ray: a146a015ae722ee5-LAX, request id: 45f7ccfd-5da5-47f1-917c-b696ed3ae44d)
Reconnecting... 3/5 (unexpected status 403 Forbidden: {"code":"INSUFFICIENT_BALANCE","message":"Insufficient account balance"}, url: https://codex.0u0o.com/responses, cf-ray: a146a01a99bcd116-SJC, request id: 5b31a31f-d097-4cc4-a4f2-fe3bd1d8c5d3)
Reconnecting... 4/5 (unexpected status 403 Forbidden: {"code":"INSUFFICIENT_BALANCE","message":"Insufficient account balance"}, url: https://codex.0u0o.com/responses, cf-ray: a146a0221d1da63a-LAX, request id: 35103340-d543-4ee1-8177-37fcfb9dbe6d)
Reconnecting... 5/5 (unexpected status 403 Forbidden: {"code":"INSUFFICIENT_BALANCE","message":"Insufficient account balance"}, url: https://codex.0u0o.com/responses, cf-ray: a146a02e3d068775-SJC, request id: a1aaa435-63c5-4541-9e8f-901341f1f64c)
ERROR: unexpected status 403 Forbidden: {"code":"INSUFFICIENT_BALANCE","message":"Insufficient account balance"}, url: https://codex.0u0o.com/responses, cf-ray: a146a0449d72dc9b-SJC, request id: 8e0256dc-8339-4569-9118-792531327b55
Warning: no last agent message; wrote empty content to /tmp/ptoas-pr-review-monitor/runs/20260702_000034_pr658/codex_last_message.json
===== END STAGE codex-review rc=1 @ 2026-07-02 00:00:52 =====

@HecreReed HecreReed marked this pull request as ready for review May 13, 2026 07:52

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ca1265fd6c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tools/ptoas/ptoas.cpp Outdated
Comment on lines +397 to +398
if (size_t commentPos = line.find("//"); commentPos != llvm::StringRef::npos)
body = line.take_front(commentPos);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Parse MLIR comments without truncating string literals

stripMlirLineComments removes everything after the first // on each line, even when // appears inside a quoted MLIR string (e.g. location metadata like loc("a//b")). Because extractTextualNameHints depends on this preprocessed text for function/SSA hint alignment, valid textual IR containing // in string literals can produce misaligned or missing hints and incorrect downstream renaming. This is a regression in the new textual-hint pipeline and should use a comment scanner that ignores // within string literals.

Useful? React with 👍 / 👎.

Comment thread tools/ptoas/ptoas.cpp Outdated
Comment on lines +1753 to +1754
if (trimmed.starts_with("using "))
break;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Continue scanning declarations after top-level using aliases

findTopLevelGeneratedDeclarations stops at the first using ... line, so functions that begin with a type alias never collect later hoisted generated declarations. collectPendingIdentifierRenames relies on this list to map CFG/block-arg hints back onto hoisted vN temporaries, so in any function with both a leading using and block-arg hints, the block-arg rename path is silently disabled. This breaks the stated name-preservation behavior for a common EmitC function prologue pattern.

Useful? React with 👍 / 👎.

@HecreReed

Copy link
Copy Markdown
Collaborator Author

/run a3

@reedhecre

Copy link
Copy Markdown

已接收 /run a3,A3 板测器会处理这条请求。

页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。

@reedhecre

Copy link
Copy Markdown

A3 板测失败

  • 触发方式:manual
  • 源码提交:93d260e811e4
  • 结果汇总:OK 196 / FAIL 17 / SKIP 1
  • 日志:/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260513_160705_manual_pr658.log
  • 手动指令:/run a3
  • 触发人:HecreReed
  • 触发评论:feat: preserve debug name hints in ptoas emitc #658 (comment)
  • 失败阶段:board-validation / exit=1

失败用例

  • qwen3_decode_incore_4 (run, exit=1)
  • qwen3_decode_incore_3 (run, exit=1)
  • qwen3_decode_incore_8 (run, exit=1)
  • qwen3_decode_incore_0 (run, exit=1)
  • qwen3_decode_incore_1 (run, exit=1)
  • qwen3_decode_incore_10 (run, exit=1)
  • qwen3_decode_incore_14 (run, exit=1)
  • qwen3_decode_incore_11 (run, exit=1)
  • qwen3_decode_incore_9 (run, exit=1)
  • qwen3_decode_incore_6 (run, exit=1)
  • qwen3_decode_incore_2 (run, exit=1)
  • qwen3_decode_incore_13 (run, exit=1)
  • qwen3_decode_incore_16 (run, exit=1)
  • qwen3_decode_incore_7 (run, exit=1)
  • qwen3_decode_incore_5 (run, exit=1)
  • qwen3_decode_incore_15 (run, exit=1)
  • qwen3_decode_incore_12 (run, exit=1)

@reedhecre

Copy link
Copy Markdown

A3 板测失败详情:PR #658

qwen3_decode_incore_4

stage=run info=exit=1

Traceback (most recent call last):
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_4/./golden.py", line 14, in <module>
    run_case('qwen3_decode_incore_4')
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_4/qwen3_decode_golden_lib.py", line 484, in run_case
    buffers, golden = BUILDERS[case_name](meta, generator, ints)
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_4/qwen3_decode_golden_lib.py", line 188, in build_case_4
    'v1': make_padded_rows_bf16(generator, meta.elem_counts['v1'], cols=HEAD_DIM, rows_per_group=Q_HEAD_PAD, active_rows=Q_HEAD_BATCH, scale=0.05),
KeyError: 'v1'
[2026-05-13 16:28:02] ERROR: testcase failed (exit 1): qwen3_decode_incore_4
qwen3_decode_incore_3

stage=run info=exit=1

Traceback (most recent call last):
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_3/./golden.py", line 14, in <module>
    run_case('qwen3_decode_incore_3')
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_3/qwen3_decode_golden_lib.py", line 484, in run_case
    buffers, golden = BUILDERS[case_name](meta, generator, ints)
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_3/qwen3_decode_golden_lib.py", line 174, in build_case_3
    'v1': make_bf16(generator, meta.elem_counts['v1'], scale=0.05),
KeyError: 'v1'
[2026-05-13 16:28:04] ERROR: testcase failed (exit 1): qwen3_decode_incore_3
qwen3_decode_incore_8

stage=run info=exit=1

Traceback (most recent call last):
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_8/./golden.py", line 14, in <module>
    run_case('qwen3_decode_incore_8')
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_8/qwen3_decode_golden_lib.py", line 484, in run_case
    buffers, golden = BUILDERS[case_name](meta, generator, ints)
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_8/qwen3_decode_golden_lib.py", line 310, in build_case_8
    'v1': make_fp32(generator, meta.elem_counts['v1'], scale=0.05, positive=True),
KeyError: 'v1'
[2026-05-13 16:28:07] ERROR: testcase failed (exit 1): qwen3_decode_incore_8
qwen3_decode_incore_0

stage=run info=exit=1

Traceback (most recent call last):
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_0/./golden.py", line 14, in <module>
    run_case('qwen3_decode_incore_0')
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_0/qwen3_decode_golden_lib.py", line 484, in run_case
    buffers, golden = BUILDERS[case_name](meta, generator, ints)
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_0/qwen3_decode_golden_lib.py", line 88, in build_case_0
    'v1': make_bf16(generator, meta.elem_counts['v1'], scale=0.05),
KeyError: 'v1'
[2026-05-13 16:28:09] ERROR: testcase failed (exit 1): qwen3_decode_incore_0
qwen3_decode_incore_1

stage=run info=exit=1

Traceback (most recent call last):
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_1/./golden.py", line 14, in <module>
    run_case('qwen3_decode_incore_1')
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_1/qwen3_decode_golden_lib.py", line 484, in run_case
    buffers, golden = BUILDERS[case_name](meta, generator, ints)
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_1/qwen3_decode_golden_lib.py", line 115, in build_case_1
    'v1': make_bf16(generator, meta.elem_counts['v1'], scale=0.05),
KeyError: 'v1'
[2026-05-13 16:28:12] ERROR: testcase failed (exit 1): qwen3_decode_incore_1
qwen3_decode_incore_10

stage=run info=exit=1

Traceback (most recent call last):
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_10/./golden.py", line 14, in <module>
    run_case('qwen3_decode_incore_10')
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_10/qwen3_decode_golden_lib.py", line 484, in run_case
    buffers, golden = BUILDERS[case_name](meta, generator, ints)
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_10/qwen3_decode_golden_lib.py", line 355, in build_case_10
    'v1': make_bf16(generator, meta.elem_counts['v1'], scale=0.05),
KeyError: 'v1'
[2026-05-13 16:28:14] ERROR: testcase failed (exit 1): qwen3_decode_incore_10
qwen3_decode_incore_14

stage=run info=exit=1

Traceback (most recent call last):
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_14/./golden.py", line 14, in <module>
    run_case('qwen3_decode_incore_14')
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_14/qwen3_decode_golden_lib.py", line 484, in run_case
    buffers, golden = BUILDERS[case_name](meta, generator, ints)
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_14/qwen3_decode_golden_lib.py", line 414, in build_case_14
    'v1': make_fp32(generator, meta.elem_counts['v1'], scale=0.05),
KeyError: 'v1'
[2026-05-13 16:28:17] ERROR: testcase failed (exit 1): qwen3_decode_incore_14
qwen3_decode_incore_11

stage=run info=exit=1

Traceback (most recent call last):
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_11/./golden.py", line 14, in <module>
    run_case('qwen3_decode_incore_11')
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_11/qwen3_decode_golden_lib.py", line 484, in run_case
    buffers, golden = BUILDERS[case_name](meta, generator, ints)
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_11/qwen3_decode_golden_lib.py", line 369, in build_case_11
    'v1': np.zeros(meta.elem_counts['v1'], dtype=meta.np_types['v1']),
KeyError: 'v1'
[2026-05-13 16:28:20] ERROR: testcase failed (exit 1): qwen3_decode_incore_11
qwen3_decode_incore_9

stage=run info=exit=1

Traceback (most recent call last):
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_9/./golden.py", line 14, in <module>
    run_case('qwen3_decode_incore_9')
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_9/qwen3_decode_golden_lib.py", line 484, in run_case
    buffers, golden = BUILDERS[case_name](meta, generator, ints)
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_9/qwen3_decode_golden_lib.py", line 337, in build_case_9
    'v1': make_bf16(generator, meta.elem_counts['v1'], scale=0.05),
KeyError: 'v1'
[2026-05-13 16:28:22] ERROR: testcase failed (exit 1): qwen3_decode_incore_9
qwen3_decode_incore_6

stage=run info=exit=1

Traceback (most recent call last):
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_6/./golden.py", line 14, in <module>
    run_case('qwen3_decode_incore_6')
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_6/qwen3_decode_golden_lib.py", line 484, in run_case
    buffers, golden = BUILDERS[case_name](meta, generator, ints)
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_6/qwen3_decode_golden_lib.py", line 259, in build_case_6
    'v1': np.zeros(meta.elem_counts['v1'], dtype=meta.np_types['v1']),
KeyError: 'v1'
[2026-05-13 16:28:25] ERROR: testcase failed (exit 1): qwen3_decode_incore_6
qwen3_decode_incore_2

stage=run info=exit=1

Traceback (most recent call last):
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_2/./golden.py", line 14, in <module>
    run_case('qwen3_decode_incore_2')
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_2/qwen3_decode_golden_lib.py", line 484, in run_case
    buffers, golden = BUILDERS[case_name](meta, generator, ints)
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_2/qwen3_decode_golden_lib.py", line 140, in build_case_2
    'v1': np.zeros(meta.elem_counts['v1'], dtype=meta.np_types['v1']),
KeyError: 'v1'
[2026-05-13 16:28:29] ERROR: testcase failed (exit 1): qwen3_decode_incore_2
qwen3_decode_incore_13

stage=run info=exit=1

Traceback (most recent call last):
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_13/./golden.py", line 14, in <module>
    run_case('qwen3_decode_incore_13')
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_13/qwen3_decode_golden_lib.py", line 484, in run_case
    buffers, golden = BUILDERS[case_name](meta, generator, ints)
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_13/qwen3_decode_golden_lib.py", line 408, in build_case_13
    return build_case_12(meta, generator, ints)
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_13/qwen3_decode_golden_lib.py", line 392, in build_case_12
    'v1': make_bf16(generator, meta.elem_counts['v1'], scale=0.05),
KeyError: 'v1'
[2026-05-13 16:28:32] ERROR: testcase failed (exit 1): qwen3_decode_incore_13
qwen3_decode_incore_16

stage=run info=exit=1

Traceback (most recent call last):
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_16/./golden.py", line 14, in <module>
    run_case('qwen3_decode_incore_16')
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_16/qwen3_decode_golden_lib.py", line 484, in run_case
    buffers, golden = BUILDERS[case_name](meta, generator, ints)
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_16/qwen3_decode_golden_lib.py", line 448, in build_case_16
    'v1': make_fp32(generator, meta.elem_counts['v1'], scale=0.05),
KeyError: 'v1'
[2026-05-13 16:28:35] ERROR: testcase failed (exit 1): qwen3_decode_incore_16
qwen3_decode_incore_7

stage=run info=exit=1

Traceback (most recent call last):
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_7/./golden.py", line 14, in <module>
    run_case('qwen3_decode_incore_7')
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_7/qwen3_decode_golden_lib.py", line 484, in run_case
    buffers, golden = BUILDERS[case_name](meta, generator, ints)
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_7/qwen3_decode_golden_lib.py", line 290, in build_case_7
    'v1': make_padded_rows_bf16(generator, meta.elem_counts['v1'], cols=SEQ_TILE, rows_per_group=Q_HEAD_PAD, active_rows=Q_HEAD_BATCH, scale=0.05, positive=True),
KeyError: 'v1'
[2026-05-13 16:28:38] ERROR: testcase failed (exit 1): qwen3_decode_incore_7
qwen3_decode_incore_5

stage=run info=exit=1

Traceback (most recent call last):
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_5/./golden.py", line 14, in <module>
    run_case('qwen3_decode_incore_5')
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_5/qwen3_decode_golden_lib.py", line 484, in run_case
    buffers, golden = BUILDERS[case_name](meta, generator, ints)
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_5/qwen3_decode_golden_lib.py", line 239, in build_case_5
    'v1': np.zeros(meta.elem_counts['v1'], dtype=meta.np_types['v1']),
KeyError: 'v1'
[2026-05-13 16:28:41] ERROR: testcase failed (exit 1): qwen3_decode_incore_5
qwen3_decode_incore_15

stage=run info=exit=1

Traceback (most recent call last):
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_15/./golden.py", line 14, in <module>
    run_case('qwen3_decode_incore_15')
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_15/qwen3_decode_golden_lib.py", line 484, in run_case
    buffers, golden = BUILDERS[case_name](meta, generator, ints)
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_15/qwen3_decode_golden_lib.py", line 430, in build_case_15
    'v1': make_bf16(generator, meta.elem_counts['v1'], scale=0.05),
KeyError: 'v1'
[2026-05-13 16:28:43] ERROR: testcase failed (exit 1): qwen3_decode_incore_15
qwen3_decode_incore_12

stage=run info=exit=1

Traceback (most recent call last):
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_12/./golden.py", line 14, in <module>
    run_case('qwen3_decode_incore_12')
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_12/qwen3_decode_golden_lib.py", line 484, in run_case
    buffers, golden = BUILDERS[case_name](meta, generator, ints)
  File "/home/zhongxuan/ptoas-board-monitor/runtime/runs/20260513_160705_manual_pr658/npu_validation/Qwen3DecodeA3/qwen3_decode_incore_12/qwen3_decode_golden_lib.py", line 392, in build_case_12
    'v1': make_bf16(generator, meta.elem_counts['v1'], scale=0.05),
KeyError: 'v1'
[2026-05-13 16:28:45] ERROR: testcase failed (exit 1): qwen3_decode_incore_12

@HecreReed

Copy link
Copy Markdown
Collaborator Author

/run a3

@reedhecre

Copy link
Copy Markdown

已接收 /run a3,A3 板测器会处理这条请求。

页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。

@reedhecre

Copy link
Copy Markdown

A3 板测完成(有跳过)

  • 触发方式:manual
  • 源码提交:78c8e7b8061e
  • 结果汇总:OK 213 / FAIL 0 / SKIP 1
  • 日志:/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260513_165606_manual_pr658.log
  • 结果 TSV:/home/zhongxuan/ptoas-board-monitor/runtime/logs/20260513_165606_manual_pr658.tsv
  • 手动指令:/run a3
  • 触发人:HecreReed
  • 触发评论:feat: preserve debug name hints in ptoas emitc #658 (comment)

@HecreReed HecreReed force-pushed the issue337-debug-name-hints branch from 1a1738c to a979a26 Compare June 29, 2026 02:31
@HecreReed

Copy link
Copy Markdown
Collaborator Author

本轮更新:rebase + squash + 补全 issue #337 第 1 点

1. 冲突解决 + rebase 到最新 main

  • PR 之前与 main 冲突(mergeable: false, dirty)。已 rebase 到最新 main(v0.48),解决 13 处冲突:
    • tools/ptoas/ptoas.cpp(5 段):main 把解析逻辑搬到了 driver.cpp、新增 runVPTOBackendPipeline、新增 rewriteMalformedVerbatimSemicolons 等。name-hints 代码块嫁接到新结构上,extractTextualNameHints/applyParsedTextualNameHints 通过 ptoas.h 暴露为 applyTextualNameHintsToModule,在 driver.cppparseTextualModule 解析后立即调用。
    • 12 个 .pto 测试:main 改了函数签名(去掉内部函数的 static)、类型(int64_t→int32_t 等)。取 main 结构 + PR658 的 CHECK 模式放宽(v[0-9]+[_A-Za-z]...),并用 ptoas 实跑校对。
  • 另有约 30 个 emitc 测试因 name-hints 重命名把 vN 改成语义名而 CHECK 失配,已逐个放宽,本地全量 lit(412 项)全过。

2. squash 成单 commit

  • 原 6 个 feature/fix commit + 1 个 merge commit 已压缩为单个 commit,历史干净。

3. 补全 issue #337 第 1 点(可定位性)

issue 第 1 点要求“.pto%N.cppvN 序号一致以便定位”。但 level3 有 CSE/合并/新值物化,vN 编号由 EmitC 按遍历顺序重排,原理上无法逐号对齐(与维护者此前说明一致)。本 PR 采用溯源注释 + 尽力保留名

  • 对每个可溯源到输入 .pto SSA 名的 emitc 值,在声明行尾输出 // pto: %N 注释,携带未经 sanitize 的原始 SSA 名(纯数字 %0、语义名 %query_tile 都能保留)。
  • 纯数字名 %0 经 sanitize 得 _0(数字保留,仅补下划线前缀),配合注释,定位能力完整。
  • 例:int32_t _0 = helper(lhs, rhs); // pto: %0uint64_t tile = (uint64_t) v4; // pto: %tile

本地验证

  • patched LLVM 19.1.7 构建 ptoas 0.48 通过。
  • 全量 lit 412 项全过(含 5 个 name-hints 测试 + 1 个新增 provenance 测试 + 12 个 rebase 冲突文件 + ~30 个放宽的 emitc 测试)。
  • level3 真实用例(tassign 等)确认 // pto: 注释正确挂载。

说明

  • issue 附件原始 kernel.txt 在最新 main 上有 tmov shape 校验报错(main 新增的 IR 校验,与旧 pto 不兼容),非本 PR 引入;等价 level3 用例已验证溯源生效。
  • 板测(A3)之前的 KeyError: 'v1' 已由原 PR 最后一个 commit 改 golden 适配;rebase 后 golden 改动保留,但若板测仍按 vN 取值可能需复核(重命名后变量名变语义名)。

…-native-sys#337)

Implements issue hw-native-sys#337 (both points) on top of latest main:

Point 2 (SSA name hint): front-end carries name hints via Location
metadata (NameLoc/FusedLoc). ptoas rewrites local variable names after
emitc lowering to preserve semantic names (best-effort; CSE/merge sites
degrade gracefully). Textual .pto inputs additionally recover SSA arg,
block-arg, and op-result names from source text.

Point 1 (locatability): emit `// pto: %N` provenance comments mapping each
generated C++ local back to its input .pto SSA name, even for pure-digit
names (%0 -> _0 + ' // pto: %0'). This gives per-line locatability without
requiring strict %N == vN alignment (impossible after CSE/renumbering),
matching the maintainer's stated constraint.

Architecture notes (main moved parsing into driver.cpp):
- extractTextualNameHints/applyParsedTextualNameHints now exposed via
  ptoas.h as applyTextualNameHintsToModule; called from driver.cpp's
  parseTextualModule right after parsing.
- collectFunctionArgNameHints/collectFunctionBlockArgNameHints moved to
  the start of the refactored compilePTOASModule (before lowering).

Tests: 5 name-hints tests + 1 new provenance test + 12 rebased conflict
files + ~30 emitc tests relaxed for name-preservation. Full lit suite
(412 tests) passes locally with patched LLVM 19.1.7.

Rebased onto main (v0.48) and squashed to a single commit.
@HecreReed HecreReed force-pushed the issue337-debug-name-hints branch from a979a26 to 576e15a Compare June 29, 2026 02:35
@HecreReed

Copy link
Copy Markdown
Collaborator Author

/review

@reedhecre

Copy link
Copy Markdown

Manual Codex Review

该评论由 /review 手动触发。

Summary

收到 /review,正在执行 Codex review。

Findings

Review in progress.

@reedhecre

Copy link
Copy Markdown

Manual Codex Review

该评论由 /review 手动触发。

Summary

PR 中的多结果 textual name-hint 解析会错位后续重命名,且仍有多份 sample golden 脚本未适配新的参数命名,会导致验证流程回归。

Findings

  1. P2 Textual multi-result SSA names are parsed incorrectly and misalign later hints tools/ptoas/ptoas.cpp:668

extractTextualNameHints() only records a result name when = appears immediately after %name, so it does not recognize standard MLIR multi-result syntax like %final:2 = ... / %yield:2 = .... That syntax already exists in-tree (for example test/lit/pto/kernel_kind_vector_scf_while_emitc.pto:19 and :34). Because applyParsedTextualNameHints() later assigns opResultHints by preorder index, skipping one multi-result line does not just lose that hint: it shifts every later textual result hint onto the wrong operation. The new rename/provenance path can therefore emit incorrect C++ names/comments after the first multi-result op in a textual .pto file.

  1. P2 Deepseek board-validation goldens are still pinned to legacy `vN` buffer names test/samples/DeepseekV4DecodeA3/deepseek_v4_decode_golden_lib.py:61

This PR makes textual parameter names observable in generated EmitC/C++ output, but the Deepseek custom golden library still hard-codes the legacy v1/v2/v3 contract: it rejects any other meta.outputs / meta.read_order, then reads and writes buffers by those fixed names. The Deepseek sample PTO files use %arg0/%arg1/%arg2, so after this change their generated validation metadata will use arg0/arg1/arg2 instead of v1/v2/v3, and the golden step will raise ValueError before producing any data. The A5 sibling has the same bug.

  1. P2 Custom MX sample goldens still write data to obsolete `vN` keys test/samples/Gemvmx/gemvmx_golden.py:78

gemvmx_golden.py seeds default_buffers(meta) and then populates hard-coded keys v1..v4 instead of using meta.inputs / meta.outputs. After this PR, the generated testcase metadata for gemvmx-pto.pto will expose the textual argument names (arg0..arg4), so write_buffers(meta, buffers) will still write the untouched real input buffers (arg0..arg3) as zeros while the computed data sits under unused v1..v4 keys. The kernel then runs on zeroed inputs and the compare step fails. test/samples/MatmulMxLowPrecision/matmul_mx_low_precision_golden.py has the same regression pattern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants