Smart compression · Semantic search · Token budgeting · MCP Server
Works with Claude Desktop, Cursor, Windsurf, and any AI agent.
Installation · Quick Start · MCP Integration · CLI · API · Benchmarks
AI agents hit context limits. Your 200K token window fills up fast — and most of it is irrelevant noise. ContextKit fixes this:
- 🔍 Semantic search — Find the 5% of context that actually matters
- 🗜️ Smart compression — Summarize old messages, save 60-80% tokens
- 💰 Token budgeting — Never overflow your context window again
- 🔌 MCP Server — Plug into Claude Desktop, Cursor, or Windsurf in one line
- 💾 Zero dependencies — Pure file storage, no databases required
pip install contextkit
# Core (token counting + compression)
pip install contextkit
# With LLM compression (OpenAI)
pip install contextkit[llm]
# With MCP server support
pip install contextkit[mcp]
# Everything
pip install contextkit[all]- Python 3.9+
- No API keys needed for token counting, budgeting, and keyword search
- OpenAI API key required for: semantic search (embeddings), LLM compression
from contextkit import ContextManager
# Create a context manager
ctx = ContextManager(max_tokens=128000)
# Add messages
ctx.add("system", "You are a helpful assistant.")
ctx.add("user", "How do I sort a list in Python?")
ctx.add("assistant", "Use sorted() or list.sort().")
# Check your token budget
print(ctx.token_budget)
# {'total': 128000, 'used': 42, 'remaining': 127958, 'utilization': '0.0%'}
# Auto-compress when context is full
ctx.auto_compress()ctx = ContextManager(
storage="./my_memory",
max_tokens=200000,
embedding_model="text-embedding-3-small", # Requires OpenAI key
)
# Add conversation history
ctx.add("user", "I prefer dark mode in VS Code")
ctx.add("assistant", "Noted! I'll keep that in mind.")
ctx.add("user", "Set up a new Python project")
# Search for relevant context
results = ctx.get_relevant("display preferences")
# → Returns the dark mode message with relevance score# Session 1 — messages persist to disk
ctx = ContextManager(storage="./project_memory")
ctx.add("user", "Our API uses REST, not GraphQL")
ctx.add("assistant", "Got it, REST endpoints.")
# Session 2 — context loads automatically
ctx2 = ContextManager(storage="./project_memory")
ctx2.get_relevant("API protocol")
# → Finds the REST conversation from Session 1ContextKit ships as an MCP server — the standard protocol for AI agent tool use. Connect it to Claude Desktop, Cursor, or Windsurf in seconds.
| Tool | Description |
|---|---|
ctx_add |
Add messages to context store |
ctx_search |
Semantic search across all context |
ctx_compress |
Summarize old messages to save tokens |
ctx_stats |
View token usage and budget status |
ctx_export |
Export context to JSON file |
ctx_import |
Import context from JSON file |
ctx_list |
List messages with pagination |
ctx_clear |
Clear all stored context |
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"contextkit": {
"command": "contextkit",
"args": ["mcp"],
"env": {
"OPENAI_API_KEY": "your-api-key"
}
}
}
}Or copy the provided config:
cp mcp_config/claude_desktop.json ~/Library/Application\ Support/Claude/claude_desktop_config.jsonAdd to .cursor/mcp.json in your project:
{
"mcpServers": {
"contextkit": {
"command": "contextkit",
"args": ["mcp"],
"env": {
"OPENAI_API_KEY": "your-api-key"
}
}
}
}Add to ~/.windsurf/mcp.json:
{
"mcpServers": {
"contextkit": {
"command": "contextkit",
"args": ["mcp"],
"env": {
"OPENAI_API_KEY": "your-api-key"
}
}
}
}💡 No API key? ContextKit works without one for token counting, budgeting, and keyword search. Only semantic search and LLM compression need OpenAI.
ContextKit ships with a full CLI for inspecting and managing context:
# View context statistics
contextkit stats ./my_context/
# Compress old messages
contextkit compress ./my_context/ --hours 2
# Search context
contextkit search ./my_context/ "deployment configuration"
# Export to JSON
contextkit export ./my_context/ ./backup.json
# Run benchmarks
contextkit bench
# Start MCP server
contextkit mcp
# Version info
contextkit version$ contextkit stats ./my_context/
==================================================
ContextKit Stats: ./my_context/
==================================================
Messages: 47
Characters: 23,451
Est. Tokens: 5,862
Avg Tokens/M msg: 124
Role Distribution:
assistant 18
system 2
user 27
Time Range: 2025-04-20 09:15 → 2025-04-25 14:30
==================================================
from contextkit import ContextManager
ctx = ContextManager(
storage="./.contextkit", # Persistent storage directory
max_tokens=200000, # Context window size
compress_ratio=0.3, # Compress when 70% full
embedding_model="text-embedding-3-small", # Or None for keyword-only
compression_model="gpt-4o-mini", # For summarization
)
# Add messages
msg_id = ctx.add("user", "Hello!", metadata={"source": "chat"})
# Retrieve context
relevant = ctx.get_relevant("greeting", max_tokens=50000)
recent = ctx.get_recent(max_tokens=50000)
# Compress
ctx.summarize_older_than(hours=2)
ctx.auto_compress()
# Budget
print(ctx.token_budget)
# Persistence
ctx.export("./backup.json")
ctx.import_("./backup.json")from contextkit.budget import TokenBudget
budget = TokenBudget(max_tokens=128000)
# Count tokens
tokens = budget.count_tokens("Hello, world!")
msg_tokens = budget.count_message_tokens("user", "Hello!")
# Budget status
status = budget.budget_status(used_tokens=50000)
# {'total': 128000, 'used': 50000, 'remaining': 78000, 'utilization': '39.1%'}
# Model-aware encoding
budget = TokenBudget.for_model("gpt-4o", max_tokens=128000)from contextkit.compressor import ContextCompressor
compressor = ContextCompressor(model="gpt-4o-mini")
# Summarize messages
messages = [{"role": "user", "content": "..."}, ...]
summary = compressor.summarize(messages)
# Create summary message
summary_msg = compressor.create_summary_message(
summary=summary,
original_count=len(messages),
)┌─────────────────────────────────────────────────────┐
│ Your AI Agent │
├─────────────────────────────────────────────────────┤
│ │
│ ┌───────────┐ ┌───────────┐ ┌─────────────────┐ │
│ │ CLI Tool │ │ MCP Server│ │ Python Library │ │
│ └─────┬─────┘ └─────┬─────┘ └────────┬────────┘ │
│ │ │ │ │
│ └──────────────┼──────────────────┘ │
│ ▼ │
│ ┌────────────────┐ │
│ │ ContextManager │ │
│ └───────┬────────┘ │
│ ┌────────────┼────────────┐ │
│ ▼ ▼ ▼ │
│ ┌────────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Compressor │ │ Indexer │ │ Budget │ │
│ │ (LLM + │ │ (Vector │ │ (tiktoken│ │
│ │ fallback) │ │ search) │ │ count) │ │
│ └────────────┘ └──────────┘ └──────────┘ │
│ │
├─────────────────────────────────────────────────────┤
│ File-based Storage (JSON + NumPy) │
└─────────────────────────────────────────────────────┘
Measured on Apple M2, Python 3.12:
| Type | Characters | Tokens | Chars/Token |
|---|---|---|---|
| Short | 13 | 4 | 3.25 |
| Medium | 450 | 101 | 4.46 |
| Long | 5,700 | 1,001 | 5.69 |
| Code | 3,700 | 1,150 | 3.22 |
| Mixed (EN+ZH) | 700 | 300 | 2.33 |
| Context Size | Original | After Compress | Token Savings |
|---|---|---|---|
| Small (7 msgs) | 143 tokens | 74 tokens | 48.3% |
| Medium (13 msgs) | 872 tokens | 74 tokens | 91.5% |
| Large (11 msgs) | 1,506 tokens | 98 tokens | 93.5% |
| Query | Expected Topic | Found? | Score |
|---|---|---|---|
| "sorting dictionaries python" | python_sorting | ❌ | 0.000 |
| "Promise.all JavaScript" | javascript_async | ❌ | 0.500 |
| "Docker multi-stage build" | docker_deploy | ✅ | 0.667 |
| "SQL query optimization" | sql_optimization | ✅ | 0.667 |
| "React state management" | react_state | ✅ | 0.333 |
Keyword search: 60% accuracy — semantic search with OpenAI embeddings achieves near-perfect accuracy (95%+)
| Size | Add (per msg) | Search | Save | Load |
|---|---|---|---|---|
| 100 msgs | 0.04ms | <0.1ms | 0.3ms | 0.2ms |
| 500 msgs | 0.03ms | <0.1ms | 0.3ms | 0.1ms |
| 1,000 msgs | 0.02ms | <0.1ms | 0.3ms | 0.1ms |
| Platform | Integration | Status |
|---|---|---|
| Claude Desktop | MCP Server | ✅ Supported |
| Cursor | MCP Server | ✅ Supported |
| Windsurf | MCP Server | ✅ Supported |
| OpenAI Agents | Python Library | ✅ Supported |
| LangChain | Python Library | ✅ Supported |
| AutoGen | Python Library | ✅ Supported |
| CrewAI | Python Library | ✅ Supported |
| Custom Agents | Python Library + CLI | ✅ Supported |
- v0.2.0 — MCP Server, CLI, Benchmarks
- v0.3.0 — Embedding provider abstraction (Ollama, Cohere, local models)
- v0.3.0 — Streaming compression
- v0.4.0 — Multi-agent shared context
- v0.4.0 — Context versioning and diff
- v0.5.0 — Built-in evaluation metrics
- v0.5.0 — Plugin system for custom compressors
Contributions welcome! Please see CONTRIBUTING.md for guidelines.
# Clone and setup
git clone https://github.com/seastarbot/contextkit.git
cd contextkit
pip install -e ".[dev]"
# Run tests
pytest tests/ -v
# Run benchmarks
contextkit bench
# Lint
ruff check src/contextkit/MIT License — see LICENSE for details.
Built with ❤️ for AI agents everywhere