Skip to content

增加算子库性能分析去重轨迹#43

Open
ghangz wants to merge 2 commits into
MetaX-MACA:mainfrom
ghangz:mengz/mcoplib-profiler-unique-traces
Open

增加算子库性能分析去重轨迹#43
ghangz wants to merge 2 commits into
MetaX-MACA:mainfrom
ghangz:mengz/mcoplib-profiler-unique-traces

Conversation

@ghangz

@ghangz ghangz commented Jun 10, 2026

Copy link
Copy Markdown

这次改动补上了算子库性能分析去重轨迹,主要是为了解决算子库构建和诊断流程里相关信息不够集中、人工整理成本较高的问题,让日常排查、验证和结果归档更直接。

实现上补充了对应工具或脚本逻辑,补上了对应测试,同时尽量保持现有用法不变,避免影响已有流程。

这一分支已经在沐曦算力环境完成实际验证,相关检查均已通过,现提交合入。

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the profiler to generate unique trace file paths by appending microseconds, process ID, and thread ID to the filename, preventing benchmark runs from overwriting existing traces. It also adds unit tests to verify filename uniqueness and sanitization. Feedback suggests truncating the sanitized function name to avoid exceeding filesystem filename length limits, and mocking the timestamp in the uniqueness test to prevent test flakiness.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread mcoplib/profiler.py
Comment on lines +22 to +28
def _trace_file_path(output_dir, func_name, rank):
safe_name = "".join(ch if ch.isalnum() or ch in "._-" else "_" for ch in func_name)
filename = (
f"{safe_name}_trace_rank_{rank}_"
f"{_timestamp()}_pid_{os.getpid()}_tid_{threading.get_ident()}.json"
)
return os.path.join(output_dir, filename)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

在某些文件系统(如 ext4、NTFS 等)中,文件名长度限制为 255 个字符。如果 func_name 非常长(例如在一些经过多层装饰或自动生成的函数中),生成的 trace 文件名可能会超出限制,从而导致 OSError: [Errno 36] File name too long 异常。建议对 safe_name 进行截断(例如限制在 128 个字符以内),以提高代码的健壮性。

Suggested change
def _trace_file_path(output_dir, func_name, rank):
safe_name = "".join(ch if ch.isalnum() or ch in "._-" else "_" for ch in func_name)
filename = (
f"{safe_name}_trace_rank_{rank}_"
f"{_timestamp()}_pid_{os.getpid()}_tid_{threading.get_ident()}.json"
)
return os.path.join(output_dir, filename)
def _trace_file_path(output_dir, func_name, rank):
safe_name = "".join(ch if ch.isalnum() or ch in "._-" else "_" for ch in func_name)[:128]
filename = (
f"{safe_name}_trace_rank_{rank}_"
f"{_timestamp()}_pid_{os.getpid()}_tid_{threading.get_ident()}.json"
)
return os.path.join(output_dir, filename)

Comment on lines +1 to +5
import os
import sys
import tempfile
import unittest
from pathlib import Path

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

为了在测试中使用 unittest.mock.patch 来模拟时间戳,我们需要导入 patch

Suggested change
import os
import sys
import tempfile
import unittest
from pathlib import Path
import os
import sys
import tempfile
import unittest
from pathlib import Path
from unittest.mock import patch

Comment on lines +12 to +20
class ProfilerTracePathTest(unittest.TestCase):
def test_trace_file_path_is_unique(self):
with tempfile.TemporaryDirectory() as tmp_path:
first = _trace_file_path(tmp_path, "fused_mla", 0)
second = _trace_file_path(tmp_path, "fused_mla", 0)

self.assertNotEqual(first, second)
self.assertTrue(first.endswith(".json"))
self.assertEqual(os.path.dirname(first), tmp_path)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

在运行速度非常快的测试环境中,连续两次调用 _trace_file_path 可能会在同一微秒内完成。由于 _timestamp() 的精度是微秒级,且 pidtid 相同,这会导致 firstsecond 的文件名完全一致,从而使 self.assertNotEqual(first, second) 断言失败,造成测试不稳定(flaky test)。建议使用 unittest.mock.patch 模拟 _timestamp 返回不同的值,以确保测试的确定性和稳定性。

Suggested change
class ProfilerTracePathTest(unittest.TestCase):
def test_trace_file_path_is_unique(self):
with tempfile.TemporaryDirectory() as tmp_path:
first = _trace_file_path(tmp_path, "fused_mla", 0)
second = _trace_file_path(tmp_path, "fused_mla", 0)
self.assertNotEqual(first, second)
self.assertTrue(first.endswith(".json"))
self.assertEqual(os.path.dirname(first), tmp_path)
class ProfilerTracePathTest(unittest.TestCase):
@patch("mcoplib.profiler._timestamp")
def test_trace_file_path_is_unique(self, mock_timestamp):
mock_timestamp.side_effect = ["20231011T120000000001", "20231011T120000000002"]
with tempfile.TemporaryDirectory() as tmp_path:
first = _trace_file_path(tmp_path, "fused_mla", 0)
second = _trace_file_path(tmp_path, "fused_mla", 0)
self.assertNotEqual(first, second)
self.assertTrue(first.endswith(".json"))
self.assertEqual(os.path.dirname(first), tmp_path)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant