Skip to content

[Bug] rope_kv_cache精度问题 #866

Description

@Likai-19

Component

PTO Dialect / ODS (include/PTO/IR)

Description

测试 kernel:rope_kv_cache.pto(已加 pto.kernel
测试方法:run.sh 全流程(ptoas → bisheng → NPU → compare)

后端 sync 模式 ptoas 编译 NPU 运行 compare 精度
EmitC --enable-insert-sync ✅ 通过
EmitC --enable-inject-barrier-all-sync ❌ ULP 30K+
VPTO --enable-insert-sync ❌ ULP 30K+
VPTO --enable-inject-barrier-all-sync 未测试全流程
VPTO(原 run.sh) 两者同时 互斥报错

关键发现:EmitC + insert-sync 是唯一全过的组合。其他组合 NPU 能跑不崩但结果不对。

需要解决的是VPTO路径下的精度问题,目前VPTO路径下用--enable-insert-sync或--enable-inject-barrier-all-sync精度都不对。

issue0625.zip

Reproduction (minimal)

- 复现场景:`test_for_ptoas``source env_remote.sh` 配置环境,里面的路径根据自己环境修改 `bash setup_all.sh` 初始化,`run_rope_kv_cache/`- 目标 kernel:`rope_kv_cache.pto`(A5 vec kernel,92 个 tile op)

Expected behavior

compare golden成功

Actual behavior / error logs

comapre golden失败

Git commit

0

Host platform

None

Target Ascend arch (if relevant)

None

PTOAS build level (if relevant)

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions