Key architectural decisions and their rationale.
Status: Accepted
Date: 2026-05-29
ResGov needs a persistent store for budget tracking, API keys, and audit logs. We evaluated PostgreSQL, MySQL, and SQLite.
Use SQLite in WAL (Write-Ahead Logging) mode as the default storage backend.
- Zero-config deployment: No separate database service to manage, monitor, or back up.
pip installand go. - Sufficient performance: For the target use case (single-instance proxy serving one organization), SQLite handles thousands of concurrent reads with serialized writes. WAL mode allows concurrent readers without blocking.
- File-based backup: A single
.cpor.backupcommand creates a consistent snapshot. Nopg_dumpneeded. - Edge-friendly: Runs on resource-constrained infrastructure (IoT, edge servers, CI runners) where PostgreSQL is overkill.
- Migration path: The storage layer is abstracted behind the
BudgetEngineinterface. A Redis/Dragonfly or PostgreSQL backend can be added later (see Roadmap v0.5).
- Not suitable for horizontal multi-instance deployments. The WAL file is local to one process.
- Write throughput is limited to one writer at a time (serialized writes in WAL mode).
| Option | Rejected Because |
|---|---|
| PostgreSQL | Operational overhead disproportionate for single-instance use case |
| Redis | Volatile by default; persistence adds complexity |
| MongoDB | Document model doesn't match the relational budget/agent structure |
| In-memory only | No crash recovery; budgets lost on restart |
Status: Accepted
Date: 2026-05-29
Multiple agents may make concurrent API calls through ResGov. Each call's cost must be deducted from the remaining budget atomically. Two concurrent requests must not both succeed based on stale budget reads (double-spending).
Use a pessimistic pre-commit pattern:
- At stream start: acquire a DB lock, read current spend, calculate pessimistic
max_cost, reserve it atomically, release lock. - During stream: no lock held (the expensive network I/O phase).
- At stream end: acquire lock, read actual token usage, refund the difference between reserved and actual, release lock.
- Prevents double-spending: The lock ensures that two concurrent requests never both read the same remaining budget.
- Lock-free streaming: The lock is held for microseconds (a single DB write), not during the entire LLM response stream. This is critical because LLM streams last seconds to minutes.
- Pessimistic over optimistic: We chose pessimistic locking (reserve worst-case upfront) over optimistic concurrency control (retry on conflict) because:
- Retries are wasteful: an agent has already waited for the LLM response before being told "budget exceeded."
- The lock hold time is negligible (microseconds), so contention is minimal.
If the ResGov process crashes after reservation but before finalize, the reserved amount is "stuck." A background job (every 5 minutes) scans for reservations older than the expected maximum stream duration and refunds them.
| Option | Rejected Because |
|---|---|
| Optimistic concurrency (retry) | Wasteful: LLM response already streamed before rejection |
| Row-level advisory locks | SQLite doesn't support advisory locks |
| Redis-based distributed lock | Adds Redis dependency; unnecessary for single instance |
Status: Accepted
Date: 2026-05-29
Governance rules (budgets, model allowlists, tool restrictions) need a human-readable, VCS-friendly configuration format.
Use TOML, governed by a .rgf file (analogous to .gitignore, .env, .editorconfig).
- Human-readable: Lower cognitive overhead than YAML/JSON for non-developers.
- Trivially parseable: Python's
tomllib(stdlib since 3.11) handles it with zero dependencies. - Familiar convention: The
.rgfextension signals "this is a governance rules file" and is auto-detected by ResGov from the working directory. - Section-per-agent: TOML's
[table]syntax maps naturally to per-agent configuration.
- Only one
.rgffile per ResGov instance (no includes/imports). - For multi-instance setups, manage
.rgffiles per deployment (Ansible, Terraform, etc.).
| Option | Rejected Because |
|---|---|
| YAML | Indentation errors, larger spec, needs PyYAML dependency |
| JSON | No comments, painful for multi-line lists |
| Environment variables | Doesn't scale to per-agent sections |
| Database config | Circular dependency — DB needs auth, auth needs config |
Status: Accepted
Date: 2026-05-29
ResGov's core function is a transparent LLM proxy. It must work with existing frameworks (CrewAI, LangLang, LlamaIndex, custom code) without SDK changes.
Implement an OpenAI-compatible /v1/chat/completions endpoint. Frameworks switch by changing base_url only.
- Zero-friction adoption: Every major LLM framework supports configuring a custom
base_url. - No vendor lock-in: If a user outgrows ResGov, they revert the
base_url— no code changes. - Header-based routing: Agent ID and org ID are passed via HTTP headers (
X-ResGov-Agent-ID), keeping the request body untouched.
| Provider | Base URL | Auth |
|---|---|---|
| OpenRouter | https://openrouter.ai/api/v1 |
API Key |
| GitHub Copilot | https://api.githubcopilot.com |
OAuth Token |
| Custom OpenAI-compatible | Configurable | API Key |
Each provider adapter handles model name mapping, auth headers, and response normalization.
Status: Accepted
Date: 2026-05-29
ResGov must support multiple LLM providers (OpenRouter, GitHub Copilot, potentially direct OpenAI/Anthropic) with different auth methods, model naming conventions, and pricing structures.
Each provider is a separate module under src/providers.py implementing a common interface:
class LLMProvider(Protocol):
async def chat_completion(self, request: ChatRequest) -> ChatResponse: ...
def resolve_model(self, model_alias: str) -> str: ...
def estimate_cost(self, model: str, tokens: int) -> float: ...- Extensible: Adding a new provider = adding a new module, zero changes to core engine.
- Testable: Each provider can be mocked independently.
- Cost-aware: The
estimate_costmethod enables budget enforcement before the request is sent.
- Provider auto-detection is based on URL patterns. No magic.
- Price data is cached locally (
src/price_cache.py) with TTL to avoid runtime API calls.