增加监控导出启动脚本选项audit#12
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new audit tool, start_script_option_audit.py, along with a test suite to compare documented and implemented options in start_mxexporter.sh. The review feedback points out a potential issue where hyphenated words in option descriptions could be incorrectly parsed as options, and suggests restricting parsing to the first token of each line. Additionally, the feedback recommends organizing standard library imports according to PEP 8 and expanding test coverage to verify the parsing behavior with hyphenated descriptions.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| def parse_documented(text: str) -> set[str]: | ||
| items = set() | ||
| for raw_line in text.splitlines(): | ||
| line = raw_line.strip() | ||
| if not line.startswith("--") or line.endswith(")") or ":)" in line: | ||
| continue | ||
| items.update(TOKEN_RE.findall(line)) | ||
| return items |
There was a problem hiding this comment.
在 parse_documented 中,直接对整行文本 line 进行 TOKEN_RE.findall(line) 匹配是非常脆弱的。如果选项的描述文本中包含带连字符的单词(例如 cmd-tool、k8s-domain、non-root 等),正则表达式会把其中的 -tool、-domain、-root 误识别为命令行选项,从而导致审计结果出现误报。
建议只对每行开头的第一个空格分隔的标记(即选项定义部分)进行正则匹配,以完全避免描述文本的干扰。
| def parse_documented(text: str) -> set[str]: | |
| items = set() | |
| for raw_line in text.splitlines(): | |
| line = raw_line.strip() | |
| if not line.startswith("--") or line.endswith(")") or ":)" in line: | |
| continue | |
| items.update(TOKEN_RE.findall(line)) | |
| return items | |
| def parse_documented(text: str) -> set[str]: | |
| items = set() | |
| for raw_line in text.splitlines(): | |
| line = raw_line.strip() | |
| if not line.startswith("--") or line.endswith(")") or ":)" in line: | |
| continue | |
| first_token = line.split()[0] | |
| items.update(TOKEN_RE.findall(first_token)) | |
| return items |
| import argparse | ||
| import json | ||
| from pathlib import Path | ||
|
|
||
|
|
||
| import re |
There was a problem hiding this comment.
根据 PEP 8 编码规范,标准库导入应当归为一组,且组内不应有空行。建议将 import re 与其他标准库导入合并到同一个组中,并按字母顺序排序。
| import argparse | |
| import json | |
| from pathlib import Path | |
| import re | |
| import argparse | |
| import json | |
| from pathlib import Path | |
| import re |
References
- PEP 8: Imports should be grouped and standard library imports should be in a single group without extra blank lines. (link)
| def test_detects_undocumented_option(self): | ||
| with tempfile.TemporaryDirectory() as tmpdir: | ||
| root = Path(tmpdir) | ||
| (root / "start_mxexporter.sh").write_text( | ||
| " --port|-p=<port>\n" | ||
| " --help|-h\n" | ||
| " --port=*|-p=*)\n" | ||
| " --pid=*)\n", | ||
| encoding="utf-8", | ||
| ) | ||
|
|
||
| report = build(root) | ||
|
|
||
| self.assertEqual(report["undocumented_options"], ["--pid"]) |
There was a problem hiding this comment.
建议在测试用例中加入包含连字符的描述文本(例如 non-standard 和 help-message),以验证并确保审计工具不会将描述中的连字符单词误识别为命令行选项。
| def test_detects_undocumented_option(self): | |
| with tempfile.TemporaryDirectory() as tmpdir: | |
| root = Path(tmpdir) | |
| (root / "start_mxexporter.sh").write_text( | |
| " --port|-p=<port>\n" | |
| " --help|-h\n" | |
| " --port=*|-p=*)\n" | |
| " --pid=*)\n", | |
| encoding="utf-8", | |
| ) | |
| report = build(root) | |
| self.assertEqual(report["undocumented_options"], ["--pid"]) | |
| def test_detects_undocumented_option(self): | |
| with tempfile.TemporaryDirectory() as tmpdir: | |
| root = Path(tmpdir) | |
| (root / "start_mxexporter.sh").write_text( | |
| " --port|-p=<port> Specific port, default: non-standard\n" | |
| " --help|-h Display help-message\n" | |
| " --port=*|-p=*)\n" | |
| " --pid=*)\n", | |
| encoding="utf-8", | |
| ) | |
| report = build(root) | |
| self.assertEqual(report["undocumented_options"], ["--pid"]) |
这次改动补上了监控导出启动脚本选项audit,主要是为了解决监控导出与部署排查流程里相关信息不够集中、人工整理成本较高的问题,让日常排查、验证和结果归档更直接。
实现上补充了对应工具或脚本逻辑,补上了对应测试,同时尽量保持现有用法不变,避免影响已有流程。
这一分支已经在沐曦算力环境完成实际验证,相关检查均已通过,现提交合入。