增加算子库性能分析校验计数 by ghangz · Pull Request #44 · MetaX-MACA/mcoplib

ghangz · 2026-06-10T15:30:26Z

这次改动补上了算子库性能分析校验计数，主要是为了解决算子库构建和诊断流程里相关信息不够集中、人工整理成本较高的问题，让日常排查、验证和结果归档更直接。

实现上补充了对应工具或脚本逻辑，补上了对应测试，同时尽量保持现有用法不变，避免影响已有流程。

这一分支已经在沐曦算力环境完成实际验证，相关检查均已通过，现提交合入。

gemini-code-assist

Code Review

This pull request introduces validation and normalization for the "warmup" and "repeat" parameters in the profiler decorator, along with corresponding unit tests. However, a logic issue was identified where setting "warmup=warmup_count" in the PyTorch Profiler schedule conflicts with the manual warm-up loop executed beforehand. This can prevent the profiler from reaching the active state or lead to incorrect recording counts. It is recommended to set "warmup=0" in the profiler schedule.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-10T15:32:57Z

                        schedule=torch.profiler.schedule(
                            wait=0,
-                            warmup=warmup,
+                            warmup=warmup_count,
                            active=1,
-                            repeat=repeat
+                            repeat=repeat_count
                        ),


问题分析

这里存在一个关于 PyTorch Profiler 调度器（schedule）与实际循环次数不匹配的严重逻辑问题。

在代码的前半部分（第 113-115 行），已经通过手动循环完成了 warmup_count 次的预热（Warm-up）：

for _ in range(warmup_count): func(*args, **kwargs)

而在 torch.profiler.profile 中，设置了 warmup=warmup_count。这意味着 PyTorch Profiler 内部也期望在开始记录（ACTIVE）之前，先经历 warmup_count 次 prof.step()。

然而，在实际执行的 profiling 循环中（第 144-146 行），循环只执行了 repeat_count 次：

for _ in range(repeat_count): result = func(*args, **kwargs) prof.step()

这会导致以下问题：

无法记录数据：如果 warmup_count >= repeat_count，由于 prof.step() 只被调用了 repeat_count 次，Profiler 将永远无法到达 ACTIVE 状态，从而导致导出的 trace 没有任何性能数据。

记录次数不符：如果 repeat_count > warmup_count，前 warmup_count 次迭代在 Profiler 内部仍被视为 WARMUP 阶段，只有剩余 of repeat_count - warmup_count 次迭代才会被真正记录，这与预期的 repeat_count 次记录不符。

解决方案

既然已经在外部手动执行了预热，Profiler 内部的 schedule 应该将 warmup 设为 0。这样，Profiler 启动后的每一步都是 ACTIVE 状态，正好与 repeat_count 次循环完美匹配。

Suggested change

schedule=torch.profiler.schedule(

wait=0,

warmup=warmup,

warmup=warmup_count,

active=1,

repeat=repeat

repeat=repeat_count

),

schedule=torch.profiler.schedule(

wait=0,

warmup=0,

active=1,

repeat=repeat_count

),

Validate profiler warmup and repeat counts

c11f0fd

gemini-code-assist Bot reviewed Jun 10, 2026

View reviewed changes

Align profiler schedule with manual warmup

0e2e01c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

增加算子库性能分析校验计数#44

增加算子库性能分析校验计数#44
ghangz wants to merge 2 commits into
MetaX-MACA:mainfrom
ghangz:mengz/mcoplib-profiler-validate-counts

ghangz commented Jun 10, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ghangz commented Jun 10, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 10, 2026

Choose a reason for hiding this comment

问题分析

解决方案

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant