perf: chunked-prefill prefix cache update for non-hybrid models by LorrinWWW · Pull Request #22 · lightseekorg/tokenspeed

LorrinWWW · 2026-05-07T23:24:42Z

Summary

The commit broadens what was InsertHybridCache (gated on hybrid_cache != nullptr — Mamba/hybrid only) into InsertPrefixCache (gated on kv_prefix_cache != nullptr). Before this fix, plain transformer models never inserted mid-chunk KV pages into the radix tree during chunked prefill, only hybrid models did. The function was renamed and the guard moved so non-hybrid models now also get prefix-cache updates per chunk.

Test Plan

LorrinWWW · 2026-05-11T02:54:47Z

@XucSh Can you take a look for this one? thx!

fix prefill cache update.

bccd750

LorrinWWW requested a review from a team as a code owner May 7, 2026 23:24

LorrinWWW requested review from XucSh and wangbo981016 May 7, 2026 23:24

LorrinWWW self-assigned this May 8, 2026

LorrinWWW added 2 commits May 9, 2026 16:05

Merge branch 'main' into jue/improve-prefix-cache

c7116d4

Merge branch 'main' into jue/improve-prefix-cache

ff8893e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: chunked-prefill prefix cache update for non-hybrid models#22

perf: chunked-prefill prefix cache update for non-hybrid models#22
LorrinWWW wants to merge 3 commits into
mainfrom
jue/improve-prefix-cache

LorrinWWW commented May 7, 2026

Uh oh!

LorrinWWW commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LorrinWWW commented May 7, 2026

Summary

Test Plan

Uh oh!

LorrinWWW commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant