增加集体通信规模扫描计划#20
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new command-line tool size_sweep_plan.py along with its unit tests to expand MCCL benchmark size arguments into a concrete sweep plan. The tool supports size parsing with units (K, M, G) and expanding sizes using either a step factor or step bytes. The code review feedback suggests three key improvements: handling empty string inputs in parse_size to prevent an IndexError, moving the step_bytes validation outside of the loop in expand_sizes for better efficiency, and catching ValueError in the CLI's entry point to display user-friendly error messages instead of raw tracebacks.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| def parse_size(text: str) -> int: | ||
| text = text.strip().upper() | ||
| if text[-1] in UNITS: | ||
| return int(float(text[:-1]) * UNITS[text[-1]]) | ||
| return int(text) |
There was a problem hiding this comment.
如果输入的 text 为空字符串(例如仅包含空格或为空),text.strip().upper() 将得到空字符串,此时执行 text[-1] 会抛出 IndexError: string index out of range。建议在访问索引前先检查字符串是否为空,并抛出更具描述性的 ValueError。
| def parse_size(text: str) -> int: | |
| text = text.strip().upper() | |
| if text[-1] in UNITS: | |
| return int(float(text[:-1]) * UNITS[text[-1]]) | |
| return int(text) | |
| def parse_size(text: str) -> int: | |
| text = text.strip().upper() | |
| if not text: | |
| raise ValueError("size string cannot be empty") | |
| if text[-1] in UNITS: | |
| return int(float(text[:-1]) * UNITS[text[-1]]) | |
| return int(text) |
| def expand_sizes(min_bytes: int, max_bytes: int, step_factor: float | None = None, step_bytes: int | None = None) -> list[int]: | ||
| if min_bytes <= 0 or max_bytes < min_bytes: | ||
| raise ValueError("invalid size range") | ||
| if (step_factor is None) == (step_bytes is None): | ||
| raise ValueError("exactly one of step_factor or step_bytes must be set") | ||
|
|
||
| sizes = [] | ||
| current = min_bytes | ||
| while current <= max_bytes: | ||
| sizes.append(current) | ||
| if step_factor is not None: | ||
| next_size = int(current * step_factor) | ||
| if next_size <= current: | ||
| raise ValueError("step_factor must increase the size") | ||
| current = next_size | ||
| else: | ||
| if step_bytes <= 0: | ||
| raise ValueError("step_bytes must be positive") | ||
| current += step_bytes | ||
| return sizes |
There was a problem hiding this comment.
在当前的实现中,step_bytes <= 0 的校验是在 while 循环内部进行的。这意味着每次循环迭代都会重复进行这一校验,这不仅影响效率,而且在参数无效时,函数已经向 sizes 列表中添加了第一个元素。建议将 step_bytes 的有效性校验移到循环外部(函数开头),这样可以提高执行效率并使逻辑更清晰。
| def expand_sizes(min_bytes: int, max_bytes: int, step_factor: float | None = None, step_bytes: int | None = None) -> list[int]: | |
| if min_bytes <= 0 or max_bytes < min_bytes: | |
| raise ValueError("invalid size range") | |
| if (step_factor is None) == (step_bytes is None): | |
| raise ValueError("exactly one of step_factor or step_bytes must be set") | |
| sizes = [] | |
| current = min_bytes | |
| while current <= max_bytes: | |
| sizes.append(current) | |
| if step_factor is not None: | |
| next_size = int(current * step_factor) | |
| if next_size <= current: | |
| raise ValueError("step_factor must increase the size") | |
| current = next_size | |
| else: | |
| if step_bytes <= 0: | |
| raise ValueError("step_bytes must be positive") | |
| current += step_bytes | |
| return sizes | |
| def expand_sizes(min_bytes: int, max_bytes: int, step_factor: float | None = None, step_bytes: int | None = None) -> list[int]: | |
| if min_bytes <= 0 or max_bytes < min_bytes: | |
| raise ValueError("invalid size range") | |
| if (step_factor is None) == (step_bytes is None): | |
| raise ValueError("exactly one of step_factor or step_bytes must be set") | |
| if step_bytes is not None and step_bytes <= 0: | |
| raise ValueError("step_bytes must be positive") | |
| sizes = [] | |
| current = min_bytes | |
| while current <= max_bytes: | |
| sizes.append(current) | |
| if step_factor is not None: | |
| next_size = int(current * step_factor) | |
| if next_size <= current: | |
| raise ValueError("step_factor must increase the size") | |
| current = next_size | |
| else: | |
| current += step_bytes | |
| return sizes |
| text = json.dumps( | ||
| summarize(args.min_bytes, args.max_bytes, args.step_factor, args.step_bytes), | ||
| indent=2, | ||
| ) |
There was a problem hiding this comment.
当用户输入无效的参数(例如不合法的尺寸格式、无效的范围或步长)时,summarize 会抛出 ValueError。目前程序没有捕获该异常,会导致直接向用户输出一长串 Python 堆栈信息(traceback)。对于命令行工具,建议捕获 ValueError 并向标准错误(stderr)输出友好的错误提示,然后以非零状态码退出。
try:
summary = summarize(args.min_bytes, args.max_bytes, args.step_factor, args.step_bytes)
except ValueError as e:
import sys
print(f"Error: {e}", file=sys.stderr)
return 1
text = json.dumps(summary, indent=2)
这次改动补上了集体通信规模扫描计划,主要是为了解决集体通信测试与结果整理流程里相关信息不够集中、人工整理成本较高的问题,让日常排查、验证和结果归档更直接。
实现上补充了对应工具或脚本逻辑,补上了对应测试,同时尽量保持现有用法不变,避免影响已有流程。
这一分支已经在沐曦算力环境完成实际验证,相关检查均已通过,现提交合入。