-
Notifications
You must be signed in to change notification settings - Fork 4
增加算子库性能分析校验计数 #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ghangz
wants to merge
2
commits into
MetaX-MACA:main
Choose a base branch
from
ghangz:mengz/mcoplib-profiler-validate-counts
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
增加算子库性能分析校验计数 #44
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,87 @@ | ||
| import types | ||
| import sys | ||
| import unittest | ||
| from pathlib import Path | ||
|
|
||
| sys.path.insert(0, str(Path(__file__).resolve().parents[1])) | ||
|
|
||
| from mcoplib.profiler import _normalize_profile_count, profiler | ||
|
|
||
|
|
||
| class ProfilerCountValidationTest(unittest.TestCase): | ||
| def test_normalize_profile_count_accepts_numeric_strings(self): | ||
| self.assertEqual(_normalize_profile_count("repeat", "3", 1), 3) | ||
|
|
||
| def test_normalize_profile_count_rejects_invalid_values(self): | ||
| with self.assertRaisesRegex(ValueError, "warmup must be an integer"): | ||
| _normalize_profile_count("warmup", "bad", 0) | ||
|
|
||
| def test_normalize_profile_count_rejects_values_below_minimum(self): | ||
| with self.assertRaisesRegex(ValueError, "repeat must be >= 1"): | ||
| _normalize_profile_count("repeat", 0, 1) | ||
|
|
||
| def test_profiler_validates_counts_when_decorator_is_created(self): | ||
| with self.assertRaisesRegex(ValueError, "repeat must be >= 1"): | ||
| profiler(repeat=0) | ||
|
|
||
| def test_profiler_schedule_uses_zero_internal_warmup(self): | ||
| schedule_kwargs = {} | ||
| fake_torch = types.ModuleType("torch") | ||
| fake_torch.cuda = types.SimpleNamespace(is_available=lambda: False) | ||
|
|
||
| profiler_module = types.ModuleType("torch.profiler") | ||
|
|
||
| def fake_schedule(**kwargs): | ||
| schedule_kwargs.update(kwargs) | ||
| return kwargs | ||
|
|
||
| class FakeProfile: | ||
| def __init__(self, **kwargs): | ||
| self.kwargs = kwargs | ||
|
|
||
| def __enter__(self): | ||
| return self | ||
|
|
||
| def __exit__(self, exc_type, exc, tb): | ||
| return False | ||
|
|
||
| def step(self): | ||
| return None | ||
|
|
||
| profiler_module.profile = lambda **kwargs: FakeProfile(**kwargs) | ||
| profiler_module.schedule = fake_schedule | ||
| profiler_module.ProfilerActivity = types.SimpleNamespace(CPU="cpu", CUDA="cuda") | ||
| fake_torch.profiler = profiler_module | ||
|
|
||
| previous_torch = sys.modules.get("torch") | ||
| previous_profiler = sys.modules.get("torch.profiler") | ||
| sys.modules["torch"] = fake_torch | ||
| sys.modules["torch.profiler"] = profiler_module | ||
| try: | ||
| calls = [] | ||
|
|
||
| @profiler(warmup=2, repeat=3) | ||
| def sample(): | ||
| calls.append("run") | ||
| return "ok" | ||
|
|
||
| self.assertEqual(sample(), "ok") | ||
| finally: | ||
| if previous_torch is None: | ||
| sys.modules.pop("torch", None) | ||
| else: | ||
| sys.modules["torch"] = previous_torch | ||
| if previous_profiler is None: | ||
| sys.modules.pop("torch.profiler", None) | ||
| else: | ||
| sys.modules["torch.profiler"] = previous_profiler | ||
|
|
||
| self.assertEqual(schedule_kwargs["wait"], 0) | ||
| self.assertEqual(schedule_kwargs["warmup"], 0) | ||
| self.assertEqual(schedule_kwargs["active"], 1) | ||
| self.assertEqual(schedule_kwargs["repeat"], 3) | ||
| self.assertEqual(len(calls), 5) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| unittest.main() |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
问题分析
这里存在一个关于 PyTorch Profiler 调度器(
schedule)与实际循环次数不匹配的严重逻辑问题。在代码的前半部分(第 113-115 行),已经通过手动循环完成了
warmup_count次的预热(Warm-up):而在
torch.profiler.profile中,设置了warmup=warmup_count。这意味着 PyTorch Profiler 内部也期望在开始记录(ACTIVE)之前,先经历warmup_count次prof.step()。然而,在实际执行的 profiling 循环中(第 144-146 行),循环只执行了
repeat_count次:这会导致以下问题:
warmup_count >= repeat_count,由于prof.step()只被调用了repeat_count次,Profiler 将永远无法到达ACTIVE状态,从而导致导出的 trace 没有任何性能数据。repeat_count > warmup_count,前warmup_count次迭代在 Profiler 内部仍被视为WARMUP阶段,只有剩余 ofrepeat_count - warmup_count次迭代才会被真正记录,这与预期的repeat_count次记录不符。解决方案
既然已经在外部手动执行了预热,Profiler 内部的
schedule应该将warmup设为0。这样,Profiler 启动后的每一步都是ACTIVE状态,正好与repeat_count次循环完美匹配。