feat: preserve debug name hints in ptoas emitc#658
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements variable name preservation from PTO IR to the generated C++ code to improve debuggability. It introduces a design document, test cases, and logic in ptoas to extract name hints from Location metadata, annotate the IR, and post-process the C++ output to replace generic variable names with hinted ones. Feedback includes addressing potential name collisions by populating the used names set, optimizing keyword lookups using llvm::StringSwitch, improving the performance of marker stripping from O(N^2) to O(N), and replacing magic numbers with named constants.
| static void rewriteNameHintMarkers(std::string &cpp) { | ||
| constexpr llvm::StringLiteral kMarkerPrefix = "/* PTOAS_NAME_HINTS:"; | ||
| llvm::StringMap<std::string> replacements; | ||
| std::set<std::string> usedNames; |
There was a problem hiding this comment.
The usedNames set is currently empty when renaming starts. This means that if a name hint conflicts with an existing identifier in the C++ code (such as a function parameter, a global variable, or a local variable that didn't have a hint), makeUniqueCppIdentifier will not detect the collision. You should populate usedNames with all existing identifiers in the cpp string before starting the renaming process.
| static bool isReservedCppIdentifier(llvm::StringRef name) { | ||
| static const std::set<std::string> kReserved = { | ||
| "alignas", "alignof", "asm", "auto", "bool", | ||
| "break", "case", "catch", "char", "char8_t", | ||
| "char16_t", "char32_t", "class", "const", "consteval", | ||
| "constexpr", "constinit", "const_cast","continue", "co_await", | ||
| "co_return", "co_yield", "decltype", "default", "delete", | ||
| "do", "double", "dynamic_cast", "else", "enum", | ||
| "explicit", "export", "extern", "false", "float", | ||
| "for", "friend", "goto", "if", "inline", | ||
| "int", "long", "mutable", "namespace", "new", | ||
| "noexcept", "nullptr", "operator", "private", "protected", | ||
| "public", "register", "reinterpret_cast", "requires", | ||
| "return", "short", "signed", "sizeof", "static", | ||
| "static_assert", "static_cast", "struct", "switch", "template", | ||
| "this", "thread_local", "throw", "true", "try", | ||
| "typedef", "typeid", "typename", "union", "unsigned", | ||
| "using", "virtual", "void", "volatile", "wchar_t", | ||
| "while"}; | ||
| return kReserved.count(name.str()) != 0; | ||
| } |
There was a problem hiding this comment.
Using std::set<std::string> for keyword lookup is functional but less efficient than using llvm::StringSwitch. In LLVM-based tools, StringSwitch is more idiomatic and provides better performance for small sets of strings.
static bool isReservedCppIdentifier(llvm::StringRef name) {
return llvm::StringSwitch<bool>(name)
.Cases("alignas", "alignof", "asm", "auto", "bool", true)
.Cases("break", "case", "catch", "char", "char8_t", true)
.Cases("char16_t", "char32_t", "class", "const", "consteval", true)
.Cases("constexpr", "constinit", "const_cast", "continue", "co_await", true)
.Cases("co_return", "co_yield", "decltype", "default", "delete", true)
.Cases("do", "double", "dynamic_cast", "else", "enum", true)
.Cases("explicit", "export", "extern", "false", "float", true)
.Cases("for", "friend", "goto", "if", "inline", true)
.Cases("int", "long", "mutable", "namespace", "new", true)
.Cases("noexcept", "nullptr", "operator", "private", "protected", true)
.Cases("public", "register", "reinterpret_cast", "requires", true)
.Cases("return", "short", "signed", "sizeof", "static", true)
.Cases("static_assert", "static_cast", "struct", "switch", "template", true)
.Cases("this", "thread_local", "throw", "true", "try", true)
.Cases("typedef", "typeid", "typename", "union", "unsigned", true)
.Cases("using", "virtual", "void", "volatile", "wchar_t", true)
.Case("while", true)
.Default(false);
}| static void stripNameHintMarkers(std::string &cpp) { | ||
| constexpr llvm::StringLiteral kMarkerPrefix = "/* PTOAS_NAME_HINTS:"; | ||
| size_t searchPos = 0; | ||
| while (true) { | ||
| size_t markerPos = cpp.find(kMarkerPrefix.str(), searchPos); | ||
| if (markerPos == std::string::npos) | ||
| break; | ||
|
|
||
| size_t markerEnd = cpp.find("*/", markerPos + kMarkerPrefix.size()); | ||
| if (markerEnd == std::string::npos) | ||
| break; | ||
| markerEnd += 2; | ||
| while (markerEnd < cpp.size() && | ||
| (cpp[markerEnd] == '\r' || cpp[markerEnd] == '\n')) | ||
| ++markerEnd; | ||
|
|
||
| cpp.erase(markerPos, markerEnd - markerPos); | ||
| searchPos = markerPos; | ||
| } | ||
| } |
There was a problem hiding this comment.
The current implementation of stripNameHintMarkers has std::string::erase shifts all subsequent characters on every call. For large generated C++ files, this can become a performance bottleneck. It is better to perform the stripping in a single
| if (!hints) | ||
| continue; | ||
|
|
||
| size_t windowEnd = std::min(searchPos + static_cast<size_t>(2048), |
There was a problem hiding this comment.
The search window size 2048 is a magic number. While likely sufficient for most cases, it should be defined as a named constant (e.g., kSearchWindowSize) to improve maintainability and clarify the intent.
| size_t windowEnd = std::min(searchPos + static_cast<size_t>(2048), | |
| constexpr size_t kSearchWindowSize = 2048; | |
| size_t windowEnd = std::min(searchPos + kSearchWindowSize, cpp.size()); |
Codex Review该评论由 review 机器人自动更新。
SummaryReview failed at stage Findings未生成结构化 findings,因为 review 过程提前失败。 Log Tail |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ca1265fd6c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (size_t commentPos = line.find("//"); commentPos != llvm::StringRef::npos) | ||
| body = line.take_front(commentPos); |
There was a problem hiding this comment.
Parse MLIR comments without truncating string literals
stripMlirLineComments removes everything after the first // on each line, even when // appears inside a quoted MLIR string (e.g. location metadata like loc("a//b")). Because extractTextualNameHints depends on this preprocessed text for function/SSA hint alignment, valid textual IR containing // in string literals can produce misaligned or missing hints and incorrect downstream renaming. This is a regression in the new textual-hint pipeline and should use a comment scanner that ignores // within string literals.
Useful? React with 👍 / 👎.
| if (trimmed.starts_with("using ")) | ||
| break; |
There was a problem hiding this comment.
Continue scanning declarations after top-level using aliases
findTopLevelGeneratedDeclarations stops at the first using ... line, so functions that begin with a type alias never collect later hoisted generated declarations. collectPendingIdentifierRenames relies on this list to map CFG/block-arg hints back onto hoisted vN temporaries, so in any function with both a leading using and block-arg hints, the block-arg rename path is silently disabled. This breaks the stated name-preservation behavior for a common EmitC function prologue pattern.
Useful? React with 👍 / 👎.
|
/run a3 |
|
已接收
页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。 |
A3 板测失败
失败用例
|
A3 板测失败详情:PR #658qwen3_decode_incore_4
qwen3_decode_incore_3
qwen3_decode_incore_8
qwen3_decode_incore_0
qwen3_decode_incore_1
qwen3_decode_incore_10
qwen3_decode_incore_14
qwen3_decode_incore_11
qwen3_decode_incore_9
qwen3_decode_incore_6
qwen3_decode_incore_2
qwen3_decode_incore_13
qwen3_decode_incore_16
qwen3_decode_incore_7
qwen3_decode_incore_5
qwen3_decode_incore_15
qwen3_decode_incore_12
|
|
/run a3 |
|
已接收
页面会自动刷新,可以直接看当前阶段、排队情况和最近结果。 |
A3 板测完成(有跳过)
|
1a1738c to
a979a26
Compare
本轮更新:rebase + squash + 补全 issue #337 第 1 点1. 冲突解决 + rebase 到最新 main
2. squash 成单 commit
3. 补全 issue #337 第 1 点(可定位性)issue 第 1 点要求“
本地验证
说明
|
…-native-sys#337) Implements issue hw-native-sys#337 (both points) on top of latest main: Point 2 (SSA name hint): front-end carries name hints via Location metadata (NameLoc/FusedLoc). ptoas rewrites local variable names after emitc lowering to preserve semantic names (best-effort; CSE/merge sites degrade gracefully). Textual .pto inputs additionally recover SSA arg, block-arg, and op-result names from source text. Point 1 (locatability): emit `// pto: %N` provenance comments mapping each generated C++ local back to its input .pto SSA name, even for pure-digit names (%0 -> _0 + ' // pto: %0'). This gives per-line locatability without requiring strict %N == vN alignment (impossible after CSE/renumbering), matching the maintainer's stated constraint. Architecture notes (main moved parsing into driver.cpp): - extractTextualNameHints/applyParsedTextualNameHints now exposed via ptoas.h as applyTextualNameHintsToModule; called from driver.cpp's parseTextualModule right after parsing. - collectFunctionArgNameHints/collectFunctionBlockArgNameHints moved to the start of the refactored compilePTOASModule (before lowering). Tests: 5 name-hints tests + 1 new provenance test + 12 rebased conflict files + ~30 emitc tests relaxed for name-preservation. Full lit suite (412 tests) passes locally with patched LLVM 19.1.7. Rebased onto main (v0.48) and squashed to a single commit.
a979a26 to
576e15a
Compare
|
/review |
Manual Codex Review该评论由
Summary收到 FindingsReview in progress. |
Manual Codex Review该评论由
SummaryPR 中的多结果 textual name-hint 解析会错位后续重命名,且仍有多份 sample golden 脚本未适配新的参数命名,会导致验证流程回归。 Findings
This PR makes textual parameter names observable in generated EmitC/C++ output, but the Deepseek custom golden library still hard-codes the legacy
|
实现 issue #337 的调试命名提示链路:
2026-7-1
编译器CSE上有部分显示,目前按照issue的要求,实现以下内容:
前端可以在 IR 的 Location 里写 loc("query_tile"),PTOAS 在生成 kernel.cpp 时会把对应的 vN 重命名为 query_tile。如果名字含非法字符就规范化(空格→下划线、关键字→加 _v),冲突就加 _1/_2。
有两套输入路径:
textual .pto:自动从源码文本里把 SSA 名(%sum、%call_result)和 block argument 名(%merged_value)提取出来挂到 IR 上,无需前端额外标注
前端通过 API 传:用 loc("name") 的方式透传 Python 侧的变量名
遇到纯数字 SSA 名(%0、%24),它不会变成 v0、v24(因为 CSE 会打乱编号,对不上了),而是变成 _0 并在行尾加注释 // pto: %0。这样你在 kernel.cpp 里看到任意一行,都能直接知道它在 .pto 里的来源。
设计选择:整个命名逻辑放在 translateToCpp 之后的 C++ 字符串后处理阶段,而不是改造 EmitC emitter。PTOAS 在 IR 里插 /* PTOAS_NAME_HINTS:xxx / 和 / PTOAS_PROVENANCE:xxx */ 标记,后处理阶段解析这些标记做重命名 + 注释生成,最后把标记行删干净。
一个大的例子:
会变成:
部分通过注释形式追溯