[New features][Bug fixes] Add TileLang CSA compressed indexer for DSv4 and fix ColumnLinear stop_gradient backward#1052
Conversation
3ef98d2 to
8a5f69f
Compare
6b15251 to
4584c34
Compare
4584c34 to
c23cd33
Compare
risemeup1111
left a comment
There was a problem hiding this comment.
已完成初轮审查,发现两处需要先修复的问题,细节已放在行级评论中。CI 当前通过,但这两处会分别改变 TileLang 兼容后端的 sparse attention 前向语义,以及让 packed/multi-document 边界 mask 被静默忽略,建议修复后再合入。
| @@ -225,9 +225,6 @@ def forward( | |||
| attention_mask, | |||
There was a problem hiding this comment.
这里删除了 attn_mask_startend_row_indices 的下传,但 DSv4HybridSelfAttention.forward() 仍然通过 **kwargs 静默接收这个参数;上层 TransformerLayer 在 packed sequence / 多文档场景会继续把该边界 mask 传给 self-attention。结果是调用方以为文档边界仍然生效,CSA 实际却按整段序列生成 window/compressed indices,可能跨文档互相 attend。若本 PR 确认不再支持 DSv4 packed/multi-document 边界,建议在这里显式 fail fast;否则需要恢复边界 mask 到 CompressedSparseAttention 的索引计算。
参考修复形态:
if kwargs.get("attn_mask_startend_row_indices") is not None:
raise NotImplementedError(
"DSv4 Hybrid CSA no longer supports attn_mask_startend_row_indices; "
"please disable packed/multi-document boundary masks or restore CSA boundary handling."
)Add TileLang CSA indexer forward/backward and attention-target kernels, wire them into CSA loss, and preserve detached-input behavior in tensor-parallel linear backward.
Remove the test-local sys.path override and paddlefleet module cleanup so the tests use the installed package environment, including editable installs in CI.
Rename DSv4-scoped TileLang switches to CSA-scoped controls, make attention_paddle_compat drive both CSA indexer and sparse attention by default, and rename the sparse attention export to csa_sparse_attn. Replace public TileLang CSA indexer asserts with explicit TypeError/ValueError checks and cover the new config and validation behavior in tests.
dde2ed2 to
9c89548
Compare
risemeup1111
left a comment
There was a problem hiding this comment.
已复查当前 head。之前两条 P1 行级问题在当前 diff 中仍由原线程覆盖,本次不重复开新线程;CI 当前通过。另有一个非行级 PR 描述问题需要同步更新:
优先级:P3 非行级:PR 描述与当前配置字段不一致。描述第 3 点仍写
dsv4_tilelang_backend/dsv4_tilelang_enable_csa_indexer/dsv4_tilelang_enable_backward,但当前代码已改为csa_tilelang_backend/csa_tilelang_enable_indexer/csa_tilelang_enable_sparse_attn,并且不再保留dsv4_tilelang_enable_backward。这会误导使用者按无效字段配置 TileLang 路径。建议把该段改为:通过 csa_tilelang_backend / csa_tilelang_enable_indexer / csa_tilelang_enable_sparse_attn 三个 TransformerConfig 字段控制 TileLang 路径启用。默认关闭,不影响现有行为。
Keep dsa_indexer_use_sparse_loss=False scoped to the indexer loss by using the full compressed range only for TileLang KL loss computation. Trim the returned indices back to min(index_topk, n_compressed) before feeding main CSA sparse attention so Paddle and TileLang paths keep the same attention range. Add resolver coverage to guard the phase2 loss topk and attention topk split.
Attach TileLang CSA indexer loss gradients directly through the attention output autoscaler. This avoids building TileLang loss backward state during no-grad recompute forwards while keeping the indexer loss top-k independent from the main sparse attention top-k.
PaddleFleet Log Analysis
日志分析报告
失败的测试case:
根本原因分析: PR #1052( Qwen3VL MoE SFT 测试在 修复建议:
🔄 每次 Re-run 后自动更新 |
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-31 01:24:29
📋 Review 摘要
PR 概述:修复 ColumnLinearWithGradAccum 对 detached input 的 PyLayer 合约问题,并新增 TileLang CSA 压缩索引器融合算子
变更范围:tensor_parallel/layers.py、tilelang_ops/indexer/、transformer/csa_attention.py、transformer/transformer_config.py
影响面 Tag:TP OP Config
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| ❓ 疑问 | PR Description | PR 描述中的 Config 字段名与实际代码不一致(3/3 名称偏差) |
📝 描述-代码一致性:PR Description 第 3 点声明通过
dsv4_tilelang_backend/dsv4_tilelang_enable_csa_indexer/dsv4_tilelang_enable_backward三个字段控制 TileLang 路径,但实际代码中的字段名为csa_tilelang_backend/csa_tilelang_enable_indexer/csa_tilelang_enable_sparse_attn,全部不匹配。建议更新 PR Description 中对应段落为实际字段名,避免后续配置使用时产生困惑。
历史 Findings 修复情况
| Finding | 问题 | 状态 |
|---|---|---|
| F1 | tilelang_ops/__init__.py 缺换行符及版权头 |
✅ 已修复 |
| F2 | csa_indexer.py 缺 Apache License 版权头 |
✅ 已修复 |
| F3 | csa_indexer_bwd.py 中 assert 用于运行时校验 |
|
| F4 | 测试文件 sys.path.insert hack |
📝 PR 规范检查
✓ 标题格式 [New features][Bug fixes] ... 符合规范;PR Category / PR Types / Description 三个必填字段均已填写。
总体评价
Bug fix 逻辑正确且有对应回归测试;新增 TileLang CSA indexer 模块结构清晰、验证充分、配置字段有合理默认值和校验。建议同步修正 PR Description 中的字段名,使文档与代码一致。
Codecov Report❌ Patch coverage is ❌ Your patch status has failed because the patch coverage (82.09%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #1052 +/- ##
==========================================
Coverage ? 82.05%
==========================================
Files ? 11
Lines ? 808
Branches ? 167
==========================================
Hits ? 663
Misses ? 67
Partials ? 78
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
PR Category
Operator Mechanism
PR Types
New features, Bug fixes
Description
Bug fix(
tensor_parallel/layers.py):修复ColumnLinearWithGradAccumPyLayer backward 的合约问题。当 forward 输入input.stop_gradient=True时,backward 对应位置必须返回None(Paddle PyLayer 合约要求)。在 forward 缓存input_stop_gradient标志,在 backward 中跳过grad_input计算及输入侧通信(all_reduce / reduce_scatter)。新功能(
tilelang_ops/indexer/):新增 DeepSeek V4 CSA 压缩索引器 TileLang 融合算子:csa_indexer_topk_fwd:流式 top-k 前向,输出索引和 softmax 概率csa_indexer_bwd:融合反向,计算 IndexQ / Weights / IndexKComp 梯度csa_attn_target_reducesum:indexer KL loss 的 attention target 分布TileLangCSAIndexerLossPyLayer 将 fwd/bwd/loss 融合为单一算子集成(
transformer/csa_attention.py):通过dsv4_tilelang_backend/dsv4_tilelang_enable_csa_indexer/dsv4_tilelang_enable_backward三个 TransformerConfig 字段控制 TileLang 路径启用。默认关闭,不影响现有行为。简化:移除
attn_mask_startend_row_indices多文档边界支持(目前 DSv4 场景下不使用)。