Skip to content

增加集体通信规模扫描计划#20

Open
ghangz wants to merge 1 commit into
MetaX-MACA:mainfrom
ghangz:mengz/mccl-size-sweep-plan
Open

增加集体通信规模扫描计划#20
ghangz wants to merge 1 commit into
MetaX-MACA:mainfrom
ghangz:mengz/mccl-size-sweep-plan

Conversation

@ghangz

@ghangz ghangz commented Jun 10, 2026

Copy link
Copy Markdown

这次改动补上了集体通信规模扫描计划,主要是为了解决集体通信测试与结果整理流程里相关信息不够集中、人工整理成本较高的问题,让日常排查、验证和结果归档更直接。

实现上补充了对应工具或脚本逻辑,补上了对应测试,同时尽量保持现有用法不变,避免影响已有流程。

这一分支已经在沐曦算力环境完成实际验证,相关检查均已通过,现提交合入。

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new command-line tool size_sweep_plan.py along with its unit tests to expand MCCL benchmark size arguments into a concrete sweep plan. The tool supports size parsing with units (K, M, G) and expanding sizes using either a step factor or step bytes. The code review feedback suggests three key improvements: handling empty string inputs in parse_size to prevent an IndexError, moving the step_bytes validation outside of the loop in expand_sizes for better efficiency, and catching ValueError in the CLI's entry point to display user-friendly error messages instead of raw tracebacks.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread tools/size_sweep_plan.py
Comment on lines +18 to +22
def parse_size(text: str) -> int:
text = text.strip().upper()
if text[-1] in UNITS:
return int(float(text[:-1]) * UNITS[text[-1]])
return int(text)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

如果输入的 text 为空字符串(例如仅包含空格或为空),text.strip().upper() 将得到空字符串,此时执行 text[-1] 会抛出 IndexError: string index out of range。建议在访问索引前先检查字符串是否为空,并抛出更具描述性的 ValueError

Suggested change
def parse_size(text: str) -> int:
text = text.strip().upper()
if text[-1] in UNITS:
return int(float(text[:-1]) * UNITS[text[-1]])
return int(text)
def parse_size(text: str) -> int:
text = text.strip().upper()
if not text:
raise ValueError("size string cannot be empty")
if text[-1] in UNITS:
return int(float(text[:-1]) * UNITS[text[-1]])
return int(text)

Comment thread tools/size_sweep_plan.py
Comment on lines +25 to +44
def expand_sizes(min_bytes: int, max_bytes: int, step_factor: float | None = None, step_bytes: int | None = None) -> list[int]:
if min_bytes <= 0 or max_bytes < min_bytes:
raise ValueError("invalid size range")
if (step_factor is None) == (step_bytes is None):
raise ValueError("exactly one of step_factor or step_bytes must be set")

sizes = []
current = min_bytes
while current <= max_bytes:
sizes.append(current)
if step_factor is not None:
next_size = int(current * step_factor)
if next_size <= current:
raise ValueError("step_factor must increase the size")
current = next_size
else:
if step_bytes <= 0:
raise ValueError("step_bytes must be positive")
current += step_bytes
return sizes

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

在当前的实现中,step_bytes <= 0 的校验是在 while 循环内部进行的。这意味着每次循环迭代都会重复进行这一校验,这不仅影响效率,而且在参数无效时,函数已经向 sizes 列表中添加了第一个元素。建议将 step_bytes 的有效性校验移到循环外部(函数开头),这样可以提高执行效率并使逻辑更清晰。

Suggested change
def expand_sizes(min_bytes: int, max_bytes: int, step_factor: float | None = None, step_bytes: int | None = None) -> list[int]:
if min_bytes <= 0 or max_bytes < min_bytes:
raise ValueError("invalid size range")
if (step_factor is None) == (step_bytes is None):
raise ValueError("exactly one of step_factor or step_bytes must be set")
sizes = []
current = min_bytes
while current <= max_bytes:
sizes.append(current)
if step_factor is not None:
next_size = int(current * step_factor)
if next_size <= current:
raise ValueError("step_factor must increase the size")
current = next_size
else:
if step_bytes <= 0:
raise ValueError("step_bytes must be positive")
current += step_bytes
return sizes
def expand_sizes(min_bytes: int, max_bytes: int, step_factor: float | None = None, step_bytes: int | None = None) -> list[int]:
if min_bytes <= 0 or max_bytes < min_bytes:
raise ValueError("invalid size range")
if (step_factor is None) == (step_bytes is None):
raise ValueError("exactly one of step_factor or step_bytes must be set")
if step_bytes is not None and step_bytes <= 0:
raise ValueError("step_bytes must be positive")
sizes = []
current = min_bytes
while current <= max_bytes:
sizes.append(current)
if step_factor is not None:
next_size = int(current * step_factor)
if next_size <= current:
raise ValueError("step_factor must increase the size")
current = next_size
else:
current += step_bytes
return sizes

Comment thread tools/size_sweep_plan.py
Comment on lines +64 to +67
text = json.dumps(
summarize(args.min_bytes, args.max_bytes, args.step_factor, args.step_bytes),
indent=2,
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

当用户输入无效的参数(例如不合法的尺寸格式、无效的范围或步长)时,summarize 会抛出 ValueError。目前程序没有捕获该异常,会导致直接向用户输出一长串 Python 堆栈信息(traceback)。对于命令行工具,建议捕获 ValueError 并向标准错误(stderr)输出友好的错误提示,然后以非零状态码退出。

    try:
        summary = summarize(args.min_bytes, args.max_bytes, args.step_factor, args.step_bytes)
    except ValueError as e:
        import sys
        print(f"Error: {e}", file=sys.stderr)
        return 1

    text = json.dumps(summary, indent=2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant