Smart compression Β· Semantic search Β· Token budgeting Β· MCP Server
Works with Claude Desktop, Cursor, Windsurf, and any AI agent.
Installation Β· Quick Start Β· MCP Integration Β· CLI Β· API Β· Benchmarks
AI agents hit context limits. Your 200K token window fills up fast β and most of it is irrelevant noise. ContextKit fixes this:
- π Semantic search β Find the 5% of context that actually matters
- ποΈ Smart compression β Summarize old messages, save 60-80% tokens
- π° Token budgeting β Never overflow your context window again
- π MCP Server β Plug into Claude Desktop, Cursor, or Windsurf in one line
- πΎ Zero dependencies β Pure file storage, no databases required
pip install contextkit
# Core (token counting + compression)
pip install contextkit
# With LLM compression (OpenAI)
pip install contextkit[llm]
# With MCP server support
pip install contextkit[mcp]
# Everything
pip install contextkit[all]- Python 3.9+
- No API keys needed for token counting, budgeting, and keyword search
- OpenAI API key required for: semantic search (embeddings), LLM compression
from contextkit import ContextManager
# Create a context manager
ctx = ContextManager(max_tokens=128000)
# Add messages
ctx.add("system", "You are a helpful assistant.")
ctx.add("user", "How do I sort a list in Python?")
ctx.add("assistant", "Use sorted() or list.sort().")
# Check your token budget
print(ctx.token_budget)
# {'total': 128000, 'used': 42, 'remaining': 127958, 'utilization': '0.0%'}
# Auto-compress when context is full
ctx.auto_compress()ctx = ContextManager(
storage="./my_memory",
max_tokens=200000,
embedding_model="text-embedding-3-small", # Requires OpenAI key
)
# Add conversation history
ctx.add("user", "I prefer dark mode in VS Code")
ctx.add("assistant", "Noted! I'll keep that in mind.")
ctx.add("user", "Set up a new Python project")
# Search for relevant context
results = ctx.get_relevant("display preferences")
# β Returns the dark mode message with relevance score# Session 1 β messages persist to disk
ctx = ContextManager(storage="./project_memory")
ctx.add("user", "Our API uses REST, not GraphQL")
ctx.add("assistant", "Got it, REST endpoints.")
# Session 2 β context loads automatically
ctx2 = ContextManager(storage="./project_memory")
ctx2.get_relevant("API protocol")
# β Finds the REST conversation from Session 1ContextKit ships as an MCP server β the standard protocol for AI agent tool use. Connect it to Claude Desktop, Cursor, or Windsurf in seconds.
| Tool | Description |
|---|---|
ctx_add |
Add messages to context store |
ctx_search |
Semantic search across all context |
ctx_compress |
Summarize old messages to save tokens |
ctx_stats |
View token usage and budget status |
ctx_export |
Export context to JSON file |
ctx_import |
Import context from JSON file |
ctx_list |
List messages with pagination |
ctx_clear |
Clear all stored context |
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"contextkit": {
"command": "contextkit",
"args": ["mcp"],
"env": {
"OPENAI_API_KEY": "your-api-key"
}
}
}
}Or copy the provided config:
cp mcp_config/claude_desktop.json ~/Library/Application\ Support/Claude/claude_desktop_config.jsonAdd to .cursor/mcp.json in your project:
{
"mcpServers": {
"contextkit": {
"command": "contextkit",
"args": ["mcp"],
"env": {
"OPENAI_API_KEY": "your-api-key"
}
}
}
}Add to ~/.windsurf/mcp.json:
{
"mcpServers": {
"contextkit": {
"command": "contextkit",
"args": ["mcp"],
"env": {
"OPENAI_API_KEY": "your-api-key"
}
}
}
}π‘ No API key? ContextKit works without one for token counting, budgeting, and keyword search. Only semantic search and LLM compression need OpenAI.
ContextKit ships with a full CLI for inspecting and managing context:
# View context statistics
contextkit stats ./my_context/
# Compress old messages
contextkit compress ./my_context/ --hours 2
# Search context
contextkit search ./my_context/ "deployment configuration"
# Export to JSON
contextkit export ./my_context/ ./backup.json
# Run benchmarks
contextkit bench
# Start MCP server
contextkit mcp
# Version info
contextkit version$ contextkit stats ./my_context/
==================================================
ContextKit Stats: ./my_context/
==================================================
Messages: 47
Characters: 23,451
Est. Tokens: 5,862
Avg Tokens/M msg: 124
Role Distribution:
assistant 18
system 2
user 27
Time Range: 2025-04-20 09:15 β 2025-04-25 14:30
==================================================
from contextkit import ContextManager
ctx = ContextManager(
storage="./.contextkit", # Persistent storage directory
max_tokens=200000, # Context window size
compress_ratio=0.3, # Compress when 70% full
embedding_model="text-embedding-3-small", # Or None for keyword-only
compression_model="gpt-4o-mini", # For summarization
)
# Add messages
msg_id = ctx.add("user", "Hello!", metadata={"source": "chat"})
# Retrieve context
relevant = ctx.get_relevant("greeting", max_tokens=50000)
recent = ctx.get_recent(max_tokens=50000)
# Compress
ctx.summarize_older_than(hours=2)
ctx.auto_compress()
# Budget
print(ctx.token_budget)
# Persistence
ctx.export("./backup.json")
ctx.import_("./backup.json")from contextkit.budget import TokenBudget
budget = TokenBudget(max_tokens=128000)
# Count tokens
tokens = budget.count_tokens("Hello, world!")
msg_tokens = budget.count_message_tokens("user", "Hello!")
# Budget status
status = budget.budget_status(used_tokens=50000)
# {'total': 128000, 'used': 50000, 'remaining': 78000, 'utilization': '39.1%'}
# Model-aware encoding
budget = TokenBudget.for_model("gpt-4o", max_tokens=128000)from contextkit.compressor import ContextCompressor
compressor = ContextCompressor(model="gpt-4o-mini")
# Summarize messages
messages = [{"role": "user", "content": "..."}, ...]
summary = compressor.summarize(messages)
# Create summary message
summary_msg = compressor.create_summary_message(
summary=summary,
original_count=len(messages),
)βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Your AI Agent β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββ βββββββββββββ βββββββββββββββββββ β
β β CLI Tool β β MCP Serverβ β Python Library β β
β βββββββ¬ββββββ βββββββ¬ββββββ ββββββββββ¬βββββββββ β
β β β β β
β ββββββββββββββββΌβββββββββββββββββββ β
β βΌ β
β ββββββββββββββββββ β
β β ContextManager β β
β βββββββββ¬βββββββββ β
β ββββββββββββββΌβββββββββββββ β
β βΌ βΌ βΌ β
β ββββββββββββββ ββββββββββββ ββββββββββββ β
β β Compressor β β Indexer β β Budget β β
β β (LLM + β β (Vector β β (tiktokenβ β
β β fallback) β β search) β β count) β β
β ββββββββββββββ ββββββββββββ ββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β File-based Storage (JSON + NumPy) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Measured on Apple M2, Python 3.12:
| Type | Characters | Tokens | Chars/Token |
|---|---|---|---|
| Short | 13 | 4 | 3.25 |
| Medium | 450 | 101 | 4.46 |
| Long | 5,700 | 1,001 | 5.69 |
| Code | 3,700 | 1,150 | 3.22 |
| Mixed (EN+ZH) | 700 | 300 | 2.33 |
| Context Size | Original | After Compress | Token Savings |
|---|---|---|---|
| Small (7 msgs) | 143 tokens | 74 tokens | 48.3% |
| Medium (13 msgs) | 872 tokens | 74 tokens | 91.5% |
| Large (11 msgs) | 1,506 tokens | 98 tokens | 93.5% |
| Query | Expected Topic | Found? | Score |
|---|---|---|---|
| "sorting dictionaries python" | python_sorting | β | 0.000 |
| "Promise.all JavaScript" | javascript_async | β | 0.500 |
| "Docker multi-stage build" | docker_deploy | β | 0.667 |
| "SQL query optimization" | sql_optimization | β | 0.667 |
| "React state management" | react_state | β | 0.333 |
Keyword search: 60% accuracy β semantic search with OpenAI embeddings achieves near-perfect accuracy (95%+)
| Size | Add (per msg) | Search | Save | Load |
|---|---|---|---|---|
| 100 msgs | 0.04ms | <0.1ms | 0.3ms | 0.2ms |
| 500 msgs | 0.03ms | <0.1ms | 0.3ms | 0.1ms |
| 1,000 msgs | 0.02ms | <0.1ms | 0.3ms | 0.1ms |
| Platform | Integration | Status |
|---|---|---|
| Claude Desktop | MCP Server | β Supported |
| Cursor | MCP Server | β Supported |
| Windsurf | MCP Server | β Supported |
| OpenAI Agents | Python Library | β Supported |
| LangChain | Python Library | β Supported |
| AutoGen | Python Library | β Supported |
| CrewAI | Python Library | β Supported |
| Custom Agents | Python Library + CLI | β Supported |
- v0.2.0 β MCP Server, CLI, Benchmarks
- v0.3.0 β Embedding provider abstraction (Ollama, Cohere, local models)
- v0.3.0 β Streaming compression
- v0.4.0 β Multi-agent shared context
- v0.4.0 β Context versioning and diff
- v0.5.0 β Built-in evaluation metrics
- v0.5.0 β Plugin system for custom compressors
Contributions welcome! Please see CONTRIBUTING.md for guidelines.
# Clone and setup
git clone https://github.com/seastarbot/contextkit.git
cd contextkit
pip install -e ".[dev]"
# Run tests
pytest tests/ -v
# Run benchmarks
contextkit bench
# Lint
ruff check src/contextkit/MIT License β see LICENSE for details.
Built with β€οΈ for AI agents everywhere