Track, analyze, and optimize your LLM API spending. Three lines of code to monitor every OpenAI and Anthropic call β cost, tokens, latency, and errors β in a self-hosted dashboard.
CostPilot is an open-source, self-hosted platform for LLM cost tracking and observability. Drop the lightweight Python SDK into your app and every call to OpenAI, Anthropic, Google, or DeepSeek is automatically tracked. You get real-time spend dashboards, per-model and per-feature cost breakdowns, budget alerts, cost forecasting, and optimization recommendations β without sending your data to a third party.
If you've ever asked "how much are we actually spending on GPT-5.5 this month?" or "which feature is burning our token budget?", this is for you.
- Why CostPilot
- Features
- Quick start
- Python SDK
- Screenshots
- Architecture
- API reference
- CostPilot vs Helicone vs LangSmith
- FAQ
- Tech stack
- Development
- License
LLM bills are unpredictable and opaque. Costs are scattered across providers, hard to attribute to teams or features, and easy to let run away. Existing tools either lock your data in a SaaS, focus on tracing instead of cost, or are too heavy to drop into a side project.
CostPilot is deliberately narrow and self-hosted:
- SDK-first β instrument an app in 3 lines, zero changes to your call sites.
- Cost-focused β built for FinOps on AI spend, not full request tracing.
- Self-hosted β your usage data stays in your own Postgres. Run it with one
docker compose up. - Zero overhead β the SDK batches events on a background thread and never blocks or crashes your application. If the backend is down, events are dropped silently.
It's the dashboard layer on top of code-level cost tracking β the full-stack evolution of TokenBudget.
- π Real-time cost dashboard β total spend, cost over time, requests, average cost per request, and latency at a glance.
- π§© Breakdowns by model, feature, and team member β see exactly where the money goes (cost by
gpt-4ovsclaude-sonnet-4-6, cost byfeature:doc-summary, cost byuser:alice). - πΈ Budget management & alerts β set daily / weekly / monthly budgets per project and get warned at a threshold or when you blow past it.
- π Cost forecasting β "At current rate, you'll spend $251/month" with trend detection (increasing / stable / decreasing).
- π‘ Optimization recommendations β automatic suggestions: downgrade overkill models, cut error waste, compress long prompts, tackle cost concentration.
- π API key management β generate and revoke project keys for the SDK; keys are SHA-256 hashed and shown only once.
- π₯ Team management β multiple members per project, usage tracked per user.
- π§Ύ Raw event log β paginated, filterable table of every tracked call with full token and tag detail.
You need Docker and Docker Compose. To run the full stack (Postgres + API + dashboard):
git clone https://github.com/aryanjp1/costpilot.git
cd costpilot
cp .env.example .env # set SECRET_KEY etc.
docker compose up -d --build # starts db, backend (:8787), frontend (:3000)Open the dashboard at http://localhost:3000, register an account, and create a project.
Then install the SDK in your LLM app and start tracking:
pip install costpilotimport costpilot
import openai
costpilot.init(
api_key="cp_proj_abc123def456", # generate this in the dashboard
endpoint="http://localhost:8787",
default_tags={"service": "chatbot"},
)
client = costpilot.wrap_openai(openai.OpenAI())
# Use the client exactly as before β every call is now tracked.
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)Your calls show up in the dashboard within a few seconds.
Want demo data? Generate a project API key in the dashboard, then run
python scripts/generate_sample_data.py --api-key cp_proj_... --endpoint http://localhost:8787to populate 30 days of realistic usage.
The SDK is dead simple and has a single dependency (httpx).
client = costpilot.wrap_openai(openai.OpenAI())import anthropic
client = costpilot.wrap_anthropic(anthropic.Anthropic())
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}],
)Attach tags to any call through the x-costpilot-tags header (a comma-separated list of key:value pairs):
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Summarize this document"}],
extra_headers={"x-costpilot-tags": "feature:doc-summary,user:alice"},
)The dashboard then breaks spending down by feature, user, service, environment, or any tag you choose.
init()starts one background client with a daemon flush thread.- Each wrapped call enqueues an event; the queue flushes on a timer or when a batch fills.
- Sending happens on throwaway threads β your request path is never blocked, and the SDK never raises into your application.
Per-model cost breakdown β cost, request count, latency, and token usage for each model.
Cost by feature + token usage β attribute spend to the features that drive it.
Budgets & alerts β set caps per project and get alerted before costs run away.
Optimization recommendations β actionable, dollar-quantified savings suggestions.
Event log β every tracked LLM call, paginated and filterable.
ββββββββββββββββββββ
β Your LLM App β
β + CostPilot SDK βββ(async HTTP)βββΆ ββββββββββββββββ ββββββββββββββββ
β (3 lines added) β β FastAPI ββββββΆβ PostgreSQL β
ββββββββββββββββββββ β Backend β β β
ββββββββ¬ββββββββ ββββββββββββββββ
ββββββββββββββββββββ β
β React Frontend ββββββββββββββββββββββββββββ
β (Dashboard) β
ββββββββββββββββββββ
The SDK sends events asynchronously and never blocks the LLM call. The ingest endpoint authenticates with project API keys (not user JWTs), bulk-inserts events, and evaluates budgets in the background.
All dashboard endpoints are under /api and authenticate with a JWT bearer token. The SDK ingest endpoint authenticates with a project API key.
# Auth
POST /api/auth/register
POST /api/auth/login
GET /api/auth/me
# Projects & members
GET /api/projects
POST /api/projects
GET /api/projects/{id}/members
POST /api/projects/{id}/members/invite
# API keys (for the SDK)
GET /api/projects/{id}/api-keys
POST /api/projects/{id}/api-keys # returns the full key once
DELETE /api/api-keys/{id}
# Ingest (SDK β backend, API-key auth)
POST /api/ingest
# Analytics
GET /api/projects/{id}/analytics/overview?period=7d
GET /api/projects/{id}/analytics/cost-timeline?period=7d&granularity=hour
GET /api/projects/{id}/analytics/models?period=7d
GET /api/projects/{id}/analytics/tags?period=7d&tag_key=feature
GET /api/projects/{id}/analytics/latency?period=7d&granularity=hour
# Forecast, budgets, alerts, recommendations
GET /api/projects/{id}/forecast
GET /api/projects/{id}/budgets
POST /api/projects/{id}/budgets
GET /api/projects/{id}/alerts
GET /api/projects/{id}/recommendations
Interactive API docs are served at http://localhost:8787/docs.
| CostPilot | Helicone | LangSmith | |
|---|---|---|---|
| Self-hosted | β Yes (your Postgres) | Partial | Partial |
| Primary focus | LLM cost analytics | Proxy + observability | Tracing + eval |
| Integration | SDK wrapper, 3 lines | Proxy / base URL | Callbacks / tracer |
| Cost forecasting | β Built-in | Limited | β |
| Optimization tips | β Built-in | β | β |
| Blocks your request path | β Never | Proxy adds a hop | Depends |
| Data leaves your infra | β No | Yes (hosted) | Yes (hosted) |
CostPilot doesn't try to be everything. If you want a lightweight, self-hosted tool focused purely on understanding and reducing AI spend, that's the niche it fills.
How do I track OpenAI API costs?
Install the costpilot SDK, call costpilot.init(...), and wrap your client with costpilot.wrap_openai(openai.OpenAI()). Every chat.completions.create call is tracked with model, tokens, cost, and latency.
How do I track Anthropic / Claude API costs?
Use costpilot.wrap_anthropic(anthropic.Anthropic()). Claude's messages.create calls are tracked the same way.
Does the SDK slow down my application? No. Events are queued in memory and flushed by a background thread in batches. The tracking code never blocks your LLM call and is wrapped so it can never raise into your app.
What happens if the CostPilot backend is down? The SDK silently drops events and your application keeps working normally. Tracking is best-effort and never a hard dependency.
Is my usage data private? Yes. CostPilot is fully self-hosted β events are stored in your own PostgreSQL database. Nothing is sent to a third-party service.
Can I track cost per feature or per user?
Yes. Pass extra_headers={"x-costpilot-tags": "feature:doc-summary,user:alice"} on any call and the dashboard will break spend down by those tags.
Which models are supported for cost calculation? GPT-4o / 4.1 / o3 / o4-mini, Claude Opus / Sonnet / Haiku, Gemini 2.5, and DeepSeek, with up-to-date per-million-token pricing. Unknown models are recorded at zero cost rather than guessed.
- SDK: Python 3.10+,
httpx, zero other dependencies. - Backend: FastAPI, SQLAlchemy 2.0 async (asyncpg), Pydantic v2, Alembic, JWT auth, PostgreSQL 16.
- Frontend: React 18 + TypeScript, Vite, TanStack Router & Query, Tailwind CSS, Recharts.
- Infra: Docker + Docker Compose.
Run the test suite (SDK + backend, 60+ tests):
# SDK
cd sdk/python && pip install -e ".[dev]" && pytest
# Backend
cd backend && pip install -e ".[dev]" && pytestRun services individually:
make backend # uvicorn on :8787
make frontend # vite dev server on :3000
make test # SDK + backend testsMIT Β© Aryan Sharma
Keywords: LLM cost tracking Β· OpenAI cost dashboard Β· Anthropic Claude cost monitoring Β· AI spend analytics Β· token usage tracking Β· LLM observability Β· self-hosted Helicone alternative Β· LLM FinOps Β· GPT-4o cost calculator Β· prompt cost optimization.





