feat(dsv4/v3_2): mark L2 orch outputs with correct direction (pl.Out / pl.InOut)#659
feat(dsv4/v3_2): mark L2 orch outputs with correct direction (pl.Out / pl.InOut)#659YunjiQin wants to merge 1 commit into
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
📝 WalkthroughWalkthroughThis PR updates tensor parameter type annotations across DeepSeek v3.2 and v4 model files, changing cache and dispatch buffer parameters (kv_cache, pe_cache, k_cache_idx, dispatch_buf, kv_state, score_state, out) from plain pl.Tensor[...] to pl.Out[pl.Tensor[...]], marking them as output/writable buffers in each function's signature. Changespl.Out Annotation Updates
Estimated code review effort: 2 (Simple) | ~10 minutes Possibly related PRs
Suggested labels: Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
0645675 to
fc669c3
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
models/deepseek/v4/prefill_attention_hca.py (1)
87-108: 🗄️ Data Integrity & Integration | 🔵 Trivial | 💤 Low valueAlign
kv_cachewith the other cache-writing inline kernels.prefill_attention_hcamutateskv_cache, and the matching inline kernels in this tree annotate the same buffer aspl.Out[...]; keeping this one as plainpl.Tensor[...]makes the nested-inline contract inconsistent.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@models/deepseek/v4/prefill_attention_hca.py` around lines 87 - 108, The prefill_attention_hca signature treats kv_cache as an input tensor even though the kernel writes to it, which is inconsistent with the other cache-writing inline kernels. Update the kv_cache parameter annotation in prefill_attention_hca to use the output form used elsewhere in this module (the pl.Out-style buffer annotation) so the nested-inline contract matches the actual mutation behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@models/deepseek/v4/prefill_attention_hca.py`:
- Around line 87-108: The prefill_attention_hca signature treats kv_cache as an
input tensor even though the kernel writes to it, which is inconsistent with the
other cache-writing inline kernels. Update the kv_cache parameter annotation in
prefill_attention_hca to use the output form used elsewhere in this module (the
pl.Out-style buffer annotation) so the nested-inline contract matches the actual
mutation behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c19e3c06-0a43-4638-977b-6384147a7fdb
📒 Files selected for processing (14)
models/deepseek/v3_2/deepseek_v3_2_decode_back.pymodels/deepseek/v3_2/deepseek_v3_2_decode_front.pymodels/deepseek/v3_2/deepseek_v3_2_prefill_back.pymodels/deepseek/v3_2/deepseek_v3_2_prefill_front_draft.pymodels/deepseek/v4/decode_attention_csa.pymodels/deepseek/v4/decode_attention_hca.pymodels/deepseek/v4/decode_attention_swa.pymodels/deepseek/v4/decode_fwd.pymodels/deepseek/v4/decode_layer.pymodels/deepseek/v4/prefill_attention_hca.pymodels/deepseek/v4/prefill_compressor_ratio128.pymodels/deepseek/v4/prefill_compressor_ratio4.pymodels/deepseek/v4/prefill_fwd.pymodels/deepseek/v4/prefill_indexer_compressor.py
…/ pl.InOut) pypto requires orchestration-entry outputs to declare a direction (hw-native-sys/pypto#1901). An output left as a plain pl.Tensor is treated as In, so its device->host copy-back is skipped and the tensor silently reads back as all-zeros on the host, failing golden. Annotate every golden-compared output on its orchestration entry (the @pl.jit entry, its @pl.jit.host L3 driver, or the @pl.function Opaque method) by direction, decided by the golden TensorSpec: - is_output + init_value -> inout -> pl.InOut (in-place caches / state: kv_cache, kv_state, score_state, compress_state, cmp_kv, cmp_kv_cache, idx_kv_cache, and v3_2 decode_front kv/pe/k_cache_idx/dispatch_buf) - is_output, no init_value -> pure output -> pl.Out (v3_2 back `out`, prefill_front_draft `dispatch_buf`, x_out / x_next / logits) pl.InOut round-trips correctly as of hw-native-sys/pypto#1918 (the specializer previously dropped the wrapper). Direction lives on the orchestration entry only; @pl.jit.inline sub-kernels are left as upstream (their direction tag is stripped at splice time). Also normalizes pre-existing pl.Out-on-inout entries (prefill_attention_*, decode_compressor_*, decode_indexer*, prefill_indexer*) to pl.InOut so the whole codebase is consistent. Verified on device (a2a3): decode/prefill attention (csa/hca/swa), compressor ratio4/128, indexer(_compressor), decode_layer, decode_fwd all compile and pass golden. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fc669c3 to
b9aab0c
Compare
What
pypto requires orchestration-entry outputs to declare a direction (hw-native-sys/pypto#1901). An output left as a plain
pl.Tensoris treated asIn, so its device→host copy-back is skipped and the tensor silently reads back as all-zeros on the host, failing golden.This PR annotates every golden-compared output on its orchestration entry with the correct direction, decided by the golden
TensorSpec:is_output=True+init_valuepl.InOutis_output=True, noinit_valuepl.Outpl.InOutround-trips correctly as of hw-native-sys/pypto#1918 (the specializer previously dropped the wrapper → "missing type annotation"). pypto-lib CI builds against pyptomain, which now includes that fix.Scope / rules
@pl.jitentry, its@pl.jit.hostL3 driver, or the@pl.function(type=Opaque)method.@pl.jit.inlinesub-kernels are left as upstream (their direction tag is stripped at splice time).pl.Out-on-inout entries (prefill_attention_, decode_compressor_, decode_indexer*, prefill_indexer*) topl.InOutso the whole codebase is consistent.Changes (21 files, +36/−36 — 33
pl.InOut, 3pl.Out)pl.InOut: decode/prefill attentionkv_cache; compressorkv_state/score_state/compress_state/cmp_kv/cmp_kv_cache; indexeridx_kv_cache; decode_layer / decode_fwd / prefill_fwdkv_cache(entry + L3 host); v3_2 decode_frontkv_cache/pe_cache/k_cache_idx/dispatch_buf.pl.Out: v3_2 backout; prefill_front_draftdispatch_buf(x_out / x_next / logits alreadypl.Out).Verification
is_outputspec (incl.@pl.jit.hostparams and conditionalis_output=name==…specs) maps to a correctly-directioned entry param; no@pl.jit.inlineparam carriespl.InOut.🤖 Generated with Claude Code