Framework-portable KV cache request scheduling primitive.
bidkv is a zero-dependency Python package that addresses the victim-selection problem under KV cache pressure: when KV memory is exhausted, which request should be preempted?
The core idea is to evict the request that frees the most KV space per unit of quality loss, maximising utility:
where
BidKV does not compress tokens — it only controls who gets preempted. The actual eviction is performed by the framework's native preempt + recompute path (vLLM) or RadixCache eviction (SGLang).
| Module | Contents |
|---|---|
protocol/ |
Core types: CompressionBid, BidPool, BidAcceptance |
scoring/ |
PositionalScoring (attention-sink + recency heuristic) |
pool/ |
BidPoolManager |
pressure/ |
PressureDetector (KV pressure detection) |
solver/ |
GreedyBidSolver (bid ranking + greedy selection) |
baselines/ |
6 baseline strategies + BidKV (see below) |
adapters/vllm/ |
vLLM v1 adapter (scheduler hook + plugin) |
adapters/sglang/ |
SGLang adapter (scheduler hook) |
experiments/ |
Experiment runner, collector, analysis |
| Strategy name | Class | Scheduling logic |
|---|---|---|
preempt-evict |
PreemptEvictStrategy |
vLLM native FCFS admission + LIFO eviction |
preempt-evict-sjf |
PreemptEvictSJFStrategy |
SJF admission + LIFO eviction |
static-random |
StaticRandomStrategy |
Random victim selection |
largest-first |
LargestFirstStrategy |
Capacity-greedy: evict largest KV occupant first |
bidkv |
BidKVStrategy |
Quality-aware: maximise U = r / (δ + ε) |
from bidkv import BidKVConfig
# Default: all bid logic bypassed (safe to import without activating)
config = BidKVConfig(enabled=False)
# Enable BidKV scheduling
config = BidKVConfig(enabled=True)
assert config.is_active
# Kill switch: immediately bypasses all logic even when enabled=True
config = BidKVConfig(enabled=True, kill_switch=True)
assert not config.is_activefrom bidkv import (
BaselineRegistry,
BidKVStrategy,
PreemptEvictStrategy, LargestFirstStrategy,
StaticRandomStrategy, PreemptEvictSJFStrategy,
)
# Register all built-in strategies at once
registry = BaselineRegistry()
registry.create_default_registry()
# Or register selectively
registry2 = BaselineRegistry()
registry2.register(BidKVStrategy())
registry2.register(PreemptEvictStrategy())
strategy = registry2.get("bidkv")
print(strategy.name) # "bidkv"
print(registry2.list_strategies()) # ["bidkv", "preempt-evict"]# vLLM: 5 strategies × mixed workload × 3 rates × 3 runs
HF_HUB_OFFLINE=1 python -m bidkv.experiments.vllm.runner \
--strategies "preempt-evict,preempt-evict-sjf,static-random,largest-first,bidkv" \
--workloads mixed \
--mixed-rates 2.0,3.8,5.7 \
--runs 3 \
--output-dir results/vllm_experiment \
--gpu-memory-utilization 0.5 \
--num-gpu-blocks-override 600 \
--max-num-seqs 32
# SGLang: 3 strategies
HF_HUB_OFFLINE=1 python -m bidkv.experiments.sglang.runner \
--strategies "sglang_default,slack_aware,bidkv" \
--workloads mixed \
--runs 3 \
--output-dir results/sglang_experimentBidKV injects into vLLM via the vllm.general_plugins entry-point — set the strategy before starting the server:
BIDKV_STRATEGY=bidkv python -m bidkv.experiments.vllm.serve \
--model meta-llama/Llama-3.1-8B-Instruct --enforce-eager --port 8000bidkv depends only on the Python standard library — no torch, numpy, vllm, or sglang.
pip install -e .
# development mode
pip install -e ".[dev]"python -m pytest tests/ -vApache-2.0