BidKV

Framework-portable KV cache request scheduling primitive.

Overview

bidkv is a zero-dependency Python package that addresses the victim-selection problem under KV cache pressure: when KV memory is exhausted, which request should be preempted?

The core idea is to evict the request that frees the most KV space per unit of quality loss, maximising utility:

$$U(r, \delta) = \frac{r}{\delta + \varepsilon}, \quad \varepsilon = 10^{-3}$$

where $r$ = tokens freed, $\delta$ = surrogate disruption estimate.

BidKV does not compress tokens — it only controls who gets preempted. The actual eviction is performed by the framework's native preempt + recompute path (vLLM) or RadixCache eviction (SGLang).

Module Layout

Module	Contents
`protocol/`	Core types: `CompressionBid`, `BidPool`, `BidAcceptance`
`scoring/`	`PositionalScoring` (attention-sink + recency heuristic)
`pool/`	`BidPoolManager`
`pressure/`	`PressureDetector` (KV pressure detection)
`solver/`	`GreedyBidSolver` (bid ranking + greedy selection)
`baselines/`	6 baseline strategies + BidKV (see below)
`adapters/vllm/`	vLLM v1 adapter (scheduler hook + plugin)
`adapters/sglang/`	SGLang adapter (scheduler hook)
`experiments/`	Experiment runner, collector, analysis

Baseline Strategies

Strategy name	Class	Scheduling logic
`preempt-evict`	`PreemptEvictStrategy`	vLLM native FCFS admission + LIFO eviction
`preempt-evict-sjf`	`PreemptEvictSJFStrategy`	SJF admission + LIFO eviction
`static-random`	`StaticRandomStrategy`	Random victim selection
`largest-first`	`LargestFirstStrategy`	Capacity-greedy: evict largest KV occupant first
`bidkv`	`BidKVStrategy`	Quality-aware: maximise U = r / (δ + ε)

Configuration

from bidkv import BidKVConfig

# Default: all bid logic bypassed (safe to import without activating)
config = BidKVConfig(enabled=False)

# Enable BidKV scheduling
config = BidKVConfig(enabled=True)
assert config.is_active

# Kill switch: immediately bypasses all logic even when enabled=True
config = BidKVConfig(enabled=True, kill_switch=True)
assert not config.is_active

Adding a Custom Strategy

from bidkv import (
    BaselineRegistry,
    BidKVStrategy,
    PreemptEvictStrategy, LargestFirstStrategy,
    StaticRandomStrategy, PreemptEvictSJFStrategy,
)

# Register all built-in strategies at once
registry = BaselineRegistry()
registry.create_default_registry()

# Or register selectively
registry2 = BaselineRegistry()
registry2.register(BidKVStrategy())
registry2.register(PreemptEvictStrategy())

strategy = registry2.get("bidkv")
print(strategy.name)              # "bidkv"
print(registry2.list_strategies())  # ["bidkv", "preempt-evict"]

Running Experiments

# vLLM: 5 strategies × mixed workload × 3 rates × 3 runs
HF_HUB_OFFLINE=1 python -m bidkv.experiments.vllm.runner \
    --strategies "preempt-evict,preempt-evict-sjf,static-random,largest-first,bidkv" \
    --workloads mixed \
    --mixed-rates 2.0,3.8,5.7 \
    --runs 3 \
    --output-dir results/vllm_experiment \
    --gpu-memory-utilization 0.5 \
    --num-gpu-blocks-override 600 \
    --max-num-seqs 32

# SGLang: 3 strategies
HF_HUB_OFFLINE=1 python -m bidkv.experiments.sglang.runner \
    --strategies "sglang_default,slack_aware,bidkv" \
    --workloads mixed \
    --runs 3 \
    --output-dir results/sglang_experiment

Framework Integration (vLLM)

BidKV injects into vLLM via the vllm.general_plugins entry-point — set the strategy before starting the server:

BIDKV_STRATEGY=bidkv python -m bidkv.experiments.vllm.serve \
    --model meta-llama/Llama-3.1-8B-Instruct --enforce-eager --port 8000

Zero Dependencies

bidkv depends only on the Python standard library — no torch, numpy, vllm, or sglang.

Install

pip install -e .

# development mode
pip install -e ".[dev]"

Testing

python -m pytest tests/ -v

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
.github		.github
docs		docs
experiments/vllm/traces		experiments/vllm/traces
paper-ad		paper-ad
paper		paper
results		results
scripts		scripts
src/bidkv		src/bidkv
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
README.md		README.md
README_zh.md		README_zh.md
bidkv-feature-bidkv.bundle		bidkv-feature-bidkv.bundle
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BidKV

Overview

Module Layout

Baseline Strategies

Configuration

Adding a Custom Strategy

Running Experiments

Framework Integration (vLLM)

Zero Dependencies

Install

Testing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BidKV

Overview

Module Layout

Baseline Strategies

Configuration

Adding a Custom Strategy

Running Experiments

Framework Integration (vLLM)

Zero Dependencies

Install

Testing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages