增加算子库算子源码清单#41
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a script to build a JSON inventory of operator source groups and adds corresponding unit tests. The reviewer identified a critical issue where the "native" group recursively includes files from other sub-groups (like "vllm" or "sglang") due to overlapping directory paths, leading to duplicate counting. The reviewer provided code suggestions to filter out these overlapping files from the "native" group and to update the unit tests to verify this deduplication logic.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| def build_inventory(root: Path) -> dict[str, object]: | ||
| groups: dict[str, object] = {} | ||
| for name, relative_dir in GROUPS.items(): | ||
| files = _sources(root, relative_dir) | ||
| groups[name] = {"root": relative_dir, "count": len(files), "files": files} | ||
| return {"root": str(root), "groups": groups} |
There was a problem hiding this comment.
在当前的实现中,GROUPS 中的 "native" 对应的路径是 "op"。由于 _sources 函数内部使用了 rglob("*") 进行递归搜索,这会导致 "op/vllm"、"op/sglang" 等子目录下的所有源文件也被重复统计到 "native" 分组中。这不仅导致数据冗余,也使得 "native" 分组的计数和文件列表不准确。
建议在构建清单时,从 "native" 分组中排除已被其他更具体的分组(如 vllm, sglang, lmdeploy, cv)包含的文件。
def build_inventory(root: Path) -> dict[str, object]:
raw_files: dict[str, list[str]] = {}
for name, relative_dir in GROUPS.items():
raw_files[name] = _sources(root, relative_dir)
# 排除其他特定分组中已包含的文件,避免在 "native" 中重复统计
other_files = set()
for name in GROUPS:
if name != "native":
other_files.update(raw_files[name])
if "native" in raw_files:
raw_files["native"] = [f for f in raw_files["native"] if f not in other_files]
groups: dict[str, object] = {}
for name, relative_dir in GROUPS.items():
files = raw_files[name]
groups[name] = {"root": relative_dir, "count": len(files), "files": files}
return {"root": str(root), "groups": groups}| def test_inventory_counts_group_sources(self): | ||
| with tempfile.TemporaryDirectory() as tmp: | ||
| root = Path(tmp) | ||
| (root / "op" / "vllm").mkdir(parents=True) | ||
| (root / "op" / "vllm" / "kernel.cu").write_text("", encoding="utf-8") | ||
| (root / "op" / "vllm" / "README.md").write_text("", encoding="utf-8") | ||
|
|
||
| inventory = build_inventory(root) | ||
|
|
||
| self.assertEqual(inventory["groups"]["vllm"]["count"], 1) | ||
| self.assertEqual(inventory["groups"]["vllm"]["files"], ["op/vllm/kernel.cu"]) |
There was a problem hiding this comment.
为了配合对 native 分组排重逻辑的修改,建议在单元测试中增加对 native 分组的断言,确保其不会错误地包含其他子分组(如 vllm)的文件,并且能正确统计属于 native 自身的源文件。
| def test_inventory_counts_group_sources(self): | |
| with tempfile.TemporaryDirectory() as tmp: | |
| root = Path(tmp) | |
| (root / "op" / "vllm").mkdir(parents=True) | |
| (root / "op" / "vllm" / "kernel.cu").write_text("", encoding="utf-8") | |
| (root / "op" / "vllm" / "README.md").write_text("", encoding="utf-8") | |
| inventory = build_inventory(root) | |
| self.assertEqual(inventory["groups"]["vllm"]["count"], 1) | |
| self.assertEqual(inventory["groups"]["vllm"]["files"], ["op/vllm/kernel.cu"]) | |
| def test_inventory_counts_group_sources(self): | |
| with tempfile.TemporaryDirectory() as tmp: | |
| root = Path(tmp) | |
| (root / "op" / "vllm").mkdir(parents=True) | |
| (root / "op" / "vllm" / "kernel.cu").write_text("", encoding="utf-8") | |
| (root / "op" / "vllm" / "README.md").write_text("", encoding="utf-8") | |
| (root / "op" / "native_kernel.cu").write_text("", encoding="utf-8") | |
| inventory = build_inventory(root) | |
| self.assertEqual(inventory["groups"]["vllm"]["count"], 1) | |
| self.assertEqual(inventory["groups"]["vllm"]["files"], ["op/vllm/kernel.cu"]) | |
| self.assertEqual(inventory["groups"]["native"]["count"], 1) | |
| self.assertEqual(inventory["groups"]["native"]["files"], ["op/native_kernel.cu"]) |
这次改动补上了算子库算子源码清单,主要是为了解决算子库构建和诊断流程里相关信息不够集中、人工整理成本较高的问题,让日常排查、验证和结果归档更直接。
实现上补充了对应工具或脚本逻辑,补上了对应测试,同时尽量保持现有用法不变,避免影响已有流程。
这一分支已经在沐曦算力环境完成实际验证,相关检查均已通过,现提交合入。