Skip to content

perf: chunked-prefill prefix cache update for non-hybrid models#22

Open
LorrinWWW wants to merge 3 commits into
mainfrom
jue/improve-prefix-cache
Open

perf: chunked-prefill prefix cache update for non-hybrid models#22
LorrinWWW wants to merge 3 commits into
mainfrom
jue/improve-prefix-cache

Conversation

@LorrinWWW
Copy link
Copy Markdown
Contributor

Summary

The commit broadens what was InsertHybridCache (gated on hybrid_cache != nullptr — Mamba/hybrid only) into InsertPrefixCache (gated on kv_prefix_cache != nullptr). Before this fix, plain transformer models never inserted mid-chunk KV pages into the radix tree during chunked prefill, only hybrid models did. The function was renamed and the guard moved so non-hybrid models now also get prefix-cache updates per chunk.

Test Plan

@LorrinWWW LorrinWWW requested a review from a team as a code owner May 7, 2026 23:24
@LorrinWWW LorrinWWW requested review from XucSh and wangbo981016 May 7, 2026 23:24
@LorrinWWW LorrinWWW self-assigned this May 8, 2026
@LorrinWWW
Copy link
Copy Markdown
Contributor Author

@XucSh Can you take a look for this one? thx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant