Skip to content

Paddle2Torch 测试基建:添加 paddle._C_ops 内部算子映射和 MOE 算子支持#635

Merged
wanghuancoder merged 11 commits into
PFCCLab:mainfrom
cangtianhuang:dev
Jun 8, 2026
Merged

Paddle2Torch 测试基建:添加 paddle._C_ops 内部算子映射和 MOE 算子支持#635
wanghuancoder merged 11 commits into
PFCCLab:mainfrom
cangtianhuang:dev

Conversation

@cangtianhuang

Copy link
Copy Markdown
Collaborator

📌 主要内容

1. _C_ops 算子参数映射 (base.py)

  • 新增 _COPS_API_PUBLIC_ALIAS 字典:13 个 _C_ops 算子复用对应公开 API 的签名
    • add_, subtract_, multiply_, concat, flatten_
  • 新增 8 个无签名算子的手动参数提取规则
    • adamw_, full_, fused_linear_param_grad_add, gaussian, matmul_grad, squared_l2_norm, _run_custom_op

2. 签名缓存和参数绑定优化 (base.py)

  • 缓存 inspect.signature() 结果,避免重复调用
  • 统一参数处理流程,简化代码逻辑
  • 支持 _C_ops 别名解析:无签名时自动查找公开 API 获取签名

3. Torch 等价规则 (rules.py)

新增 17 个转换规则:

  • 基础操作:CopsAdd_Rule, CopsSubtract_Rule, CopsMultiply_Rule, CopsConcatRule, CopsTransposeRule, CopsClipRule
  • In-place 操作:CopsFlatten_Rule, CopsScale_Rule, CopsReshape_Rule, CopsPutAlongAxis_Rule
  • 优化器相关:CopsAdamwRule
  • 融合算子:CopsFusedLinearParamGradAddRule, CopsMatmulGradRule
  • 其他:CopsGaussianRule, CopsUniformRule, CopsNumelRule, CopsSquaredL2NormRule, CopsRunCustomOpRule

4. MOE 算子支持 (rules.py)

  • MoePermuteRule:实现专家路由置换
  • MoeUnpermuteRule:实现专家输出聚合

5. 准确性修复

  • 输出处理 (accuracy.py):fused_linear_param_grad_addhas_bias=False 时仅比较第一个输出
  • ROW MAP 初始化 (config_analyzer.py):改进专家行映射逻辑
  • 多维 gather 支持 (rules.py):添加多维索引处理
  • In-place 操作 (base.py):移除梯度检查守卫,无条件复制输入避免 Paddle 错误
  • Paddle 版本兼容 (config_analyzer.py):get_dtype 支持新旧版本

6. 映射配置更新 (mapping.json)

  • 添加 20+ _C_ops 和 MOE 算子的规则配置

Comment thread tester/accuracy.py
Comment on lines +582 to +588
elif api_config.api_name == "paddle._C_ops.fused_linear_param_grad_add":
# When has_bias=False, Paddle returns an uninitialized tensor for dbias (2nd output).
# Only compare the first output (dweight).
if isinstance(paddle_output, (list, tuple)) and len(paddle_output) > 1:
paddle_output = paddle_output[:1]
if isinstance(torch_output, (list, tuple)) and len(torch_output) > 1:
torch_output = torch_output[:1]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这恐怕不合适,这个算子非常重要,用特殊逻辑对比两个output吧

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是因为 paddle._C_ops.fused_linear_param_grad_add 有两个输出 [dweight_out, dbias_out] ,当has_bias=False 时输出的第二个 tensor 未初始化,访问时会报错😢:

[T1P1] [accuracy error] paddle._C_ops.fused_linear_param_grad_add(Tensor(paddle.Size([1536, 4096]),"bfloat16"), Tensor(paddle.Size([1536, 4096]),"bfloat16"), Tensor(paddle.Size([4096, 4096]),"float32"), None, True, False, )
(PreconditionNotMet) Tensor not initialized yet when DenseTensor::place() is called.
  [Hint: holder_ should not be null.] (at /paddle/paddle/phi/core/dense_tensor_impl.cc:57)

个人认为,对于测试而言此时只需比较第一个,但对框架而言这是个不好的表现。

Comment thread tester/paddle_to_torch/rules.py Outdated
Comment on lines +6994 to +6996
with torch.no_grad():
# Weight decay on work_param
if with_decay and wd > 0:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个与torch.optim.adam._fused_adam对比

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的👌

@wanghuancoder wanghuancoder merged commit d30d6f3 into PFCCLab:main Jun 8, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants