-
-
Notifications
You must be signed in to change notification settings - Fork 0
FAQ
Nenya is a lightweight AI API Gateway that sits between your AI coding clients (OpenCode, Cursor, Aider) and upstream LLM providers. It adds secret redaction, agent-based routing with fallback chains, context management, MCP tool integration, and more — all with transparent SSE streaming.
It is not a model hosting service. It routes requests to your configured upstream providers (Anthropic, Gemini, DeepSeek, etc.).
No. Ollama is only needed for the optional content pipeline features (engine summarization, privacy filtering). Nenya works as a routing proxy without Ollama:
- If the engine is unreachable, requests proceed unchanged
- Regex-based secret redaction works independently
- Agent fallback, circuit breaker, and latency routing work without any engine
A model is a specific LLM (e.g., gemini-2.5-flash, deepseek-chat). An agent is a named routing configuration that can reference multiple models in a fallback chain.
When your client sends model: "build", Nenya resolves the agent build and routes through its configured models, applying circuit breaker and fallback logic.
{
"providers": {
"fireworks": {
"url": "https://api.fireworks.ai/inference/v1/chat/completions",
"auth_style": "bearer"
}
}
}Add the API key under provider_keys in secrets.
For providers with different wire formats (Bedrock, Vertex), create a Go adapter implementing ProviderAdapter.
The entire content pipeline is best-effort. If any step fails:
- A warning is logged
- The original payload is forwarded unchanged
- The client never sees an error from the pipeline
When skip_on_engine_failure is true (default), even hard-limit payloads that fail engine summarization are forwarded unchanged instead of being truncated.
systemctl reload nenya:
- Reloads config from the same path
- Re-discovers model catalogs from all providers
- Validates config structure
- Preserves usage counters, metrics, and thought signature cache across reloads
- On failure, continues serving with the old config
Yes. MCP is entirely optional. If no MCP servers are configured, there is zero overhead — no allocations, no goroutines, no tool injection.
- OpenAI Chat Completions (native, ~80% of providers)
- Anthropic Messages API (via format conversion adapter)
- Gemini API (via Gemini provider adapter)
Models can declare a format attribute for per-model wire format routing.
The Tier-0 regex secret redaction runs on every request. Built-in patterns match:
- AWS access keys and secret keys
- GitHub personal access tokens
- Google OAuth credentials
- OpenAI/Anthropic API keys (
sk-,sk-ant-) - PEM private keys
- AWS credential file lines
- Password/key assignments
- Docker tokens and SendGrid keys
You can add custom regex patterns in the bouncer.redact_patterns config, or use a named preset via bouncer.redact_preset ("credentials", "pii").
At startup and on systemctl reload nenya, Nenya fetches /v1/models from each configured provider (concurrently, 10s timeout). Discovered models are merged with the built-in registry using three-tier priority: config overrides > discovered > static.
- Bare metal (systemd) — binary install, socket activation, hot reload
- Container (Podman/Docker Compose) — hardened, read-only root, mlock
- Kubernetes (Helm) — ConfigMap/Secret, ingress setup
Nenya enforces role-based access control (RBAC) via per-API key configuration:
Roles:
-
admin— Unrestricted access (bypasses all checks) -
user— Access to configured agents, all non-admin endpoints -
read-only— GET requests only (e.g.,/v1/models,/healthz,/statsz,/metrics)
Agent Scoping:
-
allowed_agentslist restricts which agents the key can access (empty = all agents) - Admin keys bypass agent restrictions
Endpoint Restrictions:
-
allowed_endpointslist allows fine-grained allowlisting (method + path) - Overrides default role-based permissions when set
- Admin keys bypass endpoint restrictions
See Secrets for configuration examples.
Yes. When a model exists in multiple providers (e.g., claude-sonnet-4-5 in both Anthropic and OpenCode Zen), Nenya can route to all of them. Configure the agent's model list to include the model and both providers will be candidates.
When MCP is enabled:
- The upstream response is buffered (not streamed to the client yet)
- Tool calls are inspected
- MCP tools are executed locally
- The request is re-sent if needed
- Only the final response (no tool calls) is streamed to the client
This means MCP introduces a latency trade-off: the first response is delayed, but subsequent turns are faster since the LLM context is already populated with tool results.
- Quick Start — Install and run Nenya
- Troubleshooting — Common issues and solutions
- Configuration — Config reference
Getting Started
- Home — Project overview
- Quick Start — Install and run in 5 minutes
- Client Setup — OpenCode, Cursor, and other clients
- Deployment — Bare metal, container, Kubernetes
Core Concepts
- Configuration — Config reference and examples
- Providers — 22 providers, capabilities, special behaviors
- Routing — Latency-aware routing and fallback chains
- Architecture — Package overview and request lifecycle
- MCP Integration — MCP server integration
Reference
- Passthrough Proxy — Raw provider endpoint proxying
- Secrets — Systemd credentials and container secrets
- Model Discovery — Dynamic model catalog fetching
- API Endpoints — Endpoint reference
Operations
- Demo — Test all pipeline tiers
- Troubleshooting — Common issues and solutions
- FAQ — Frequently asked questions
- Security — Security policy and vulnerability reporting
Project
- Roadmap — Planned features
- Disclaimer — Legal disclaimer