Skip to content

add rule for _C_ops.matmul#640

Open
greenhandF wants to merge 5 commits into
PFCCLab:mainfrom
greenhandF:fanbohao
Open

add rule for _C_ops.matmul#640
greenhandF wants to merge 5 commits into
PFCCLab:mainfrom
greenhandF:fanbohao

Conversation

@greenhandF

Copy link
Copy Markdown

add mapping and rule for paddle.C_ops.matmul

cangtianhuang
cangtianhuang previously approved these changes Jun 9, 2026

@cangtianhuang cangtianhuang left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cangtianhuang cangtianhuang left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

修改后要统一复测一下原有配置,在pr描述里贴上测试结果,要看精度没有大问题~

Comment thread tester/accuracy.py Outdated
or (
isinstance(paddle_item, paddle.Tensor)
and not (paddle_item._is_initialized() or paddle_item.numel() == 0)
and (not paddle_item._is_initialized() or paddle_item.numel() == 0)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这样会影响合法的 0size tensor 正常判断(0size tensor 的 numel 为 0,但是有数据类型与形状),是否能改成:

Suggested change
and (not paddle_item._is_initialized() or paddle_item.numel() == 0)
and not paddle_item._is_initialized()

Comment thread tester/accuracy.py Outdated
or (
isinstance(paddle_item, paddle.Tensor)
and not (paddle_item._is_initialized() or paddle_item.numel() == 0)
and (not paddle_item._is_initialized() or paddle_item.numel() == 0)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

Comment on lines +7483 to +7484
do1_e = torch.zeros_like(o1)
o2_s_e = torch.zeros_like(do2_s)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可能 empty_like 更贴切~但是都行

Comment thread tester/paddle_to_torch/rules.py Outdated
Comment on lines +7520 to +7525
pg_out[s:e] = (do2_c * o2_val).sum(dim=-1)
# o2_s 写入需在 do1 之前完成时使用 do2_2d 切片;这里 o2_s 与 do1
# 来自不同 buffer(即使 inplace 也是 do2_s vs o1),互不影响。
o2_s_out[s:e] = (o2_val * prob_c).to(do2_dtype)
do1_out[s:e, :H] = x0g.to(o1_dtype)
do1_out[s:e, H:] = x1g.to(o1_dtype)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这处切片赋值会对 fp32 的叶子节点进行原地写入,torch 会报错,考虑加 torch.no_grad() 包裹

可以复现一下:
paddle._C_ops._run_custom_op("fused_swiglu_probs_bwd", Tensor([2, 4],"float32"), Tensor([2, 2],"float32"), Tensor([2, 1],"float32"), True, )

@cangtianhuang cangtianhuang self-assigned this Jun 12, 2026
@greenhandF

Copy link
Copy Markdown
Author
apitest 所有paddle_only能够通过的test精度比较都没有问题,共3个算子共23个case

@cangtianhuang cangtianhuang left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants