FAQ

What is Nenya?

Nenya is a lightweight AI API Gateway that sits between your AI coding clients (OpenCode, Cursor, Aider) and upstream LLM providers. It adds secret redaction, agent-based routing with fallback chains, context management, MCP tool integration, and more — all with transparent SSE streaming.

It is not a model hosting service. It routes requests to your configured upstream providers (Anthropic, Gemini, DeepSeek, etc.).

Do I need Ollama?

No. Ollama is only needed for the optional content pipeline features (engine summarization, privacy filtering). Nenya works as a routing proxy without Ollama:

If the engine is unreachable, requests proceed unchanged
Regex-based secret redaction works independently
Agent fallback, circuit breaker, and latency routing work without any engine

What's the difference between an agent and a model?

A model is a specific LLM (e.g., gemini-2.5-flash, deepseek-chat). An agent is a named routing configuration that can reference multiple models in a fallback chain.

When your client sends model: "build", Nenya resolves the agent build and routes through its configured models, applying circuit breaker and fallback logic.

How do I add a custom provider?

OpenAI-compatible (no code changes)

{
  "providers": {
    "fireworks": {
      "url": "https://api.fireworks.ai/inference/v1/chat/completions",
      "auth_style": "bearer"
    }
  }
}

Add the API key under provider_keys in secrets.

Non-OpenAI format

For providers with different wire formats (Bedrock, Vertex), create a Go adapter implementing ProviderAdapter.

What happens if the engine fails?

The entire content pipeline is best-effort. If any step fails:

A warning is logged
The original payload is forwarded unchanged
The client never sees an error from the pipeline

When skip_on_engine_failure is true (default), even hard-limit payloads that fail engine summarization are forwarded unchanged instead of being truncated.

How do hot reloads work?

systemctl reload nenya:

Reloads config from the same path
Re-discovers model catalogs from all providers
Validates config structure
Preserves usage counters, metrics, and thought signature cache across reloads
On failure, continues serving with the old config

Can I use Nenya without MCP?

Yes. MCP is entirely optional. If no MCP servers are configured, there is zero overhead — no allocations, no goroutines, no tool injection.

What wire formats are supported?

OpenAI Chat Completions (native, ~80% of providers)
Anthropic Messages API (via format conversion adapter)
Gemini API (via Gemini provider adapter)

Models can declare a format attribute for per-model wire format routing.

How does secret redaction work?

The Tier-0 regex secret redaction runs on every request. Built-in patterns match:

AWS access keys and secret keys
GitHub personal access tokens
Google OAuth credentials
OpenAI/Anthropic API keys (sk-, sk-ant-)
PEM private keys
AWS credential file lines
Password/key assignments
Docker tokens and SendGrid keys

You can add custom regex patterns in the bouncer.redact_patterns config, or use a named preset via bouncer.redact_preset ("credentials", "pii").

How does model discovery work?

At startup and on systemctl reload nenya, Nenya fetches /v1/models from each configured provider (concurrently, 10s timeout). Discovered models are merged with the built-in registry using three-tier priority: config overrides > discovered > static.

What deployment options are available?

Bare metal (systemd) — binary install, socket activation, hot reload
Container (Podman/Docker Compose) — hardened, read-only root, mlock
Kubernetes (Helm) — ConfigMap/Secret, ingress setup

How does RBAC work?

Nenya enforces role-based access control (RBAC) via per-API key configuration:

Roles:

admin — Unrestricted access (bypasses all checks)
user — Access to configured agents, all non-admin endpoints
read-only — GET requests only (e.g., /v1/models, /healthz, /statsz, /metrics)

Agent Scoping:

allowed_agents list restricts which agents the key can access (empty = all agents)
Admin keys bypass agent restrictions

Endpoint Restrictions:

allowed_endpoints list allows fine-grained allowlisting (method + path)
Overrides default role-based permissions when set
Admin keys bypass endpoint restrictions

See Secrets for configuration examples.

Can I use multiple providers for the same model?

Yes. When a model exists in multiple providers (e.g., claude-sonnet-4-5 in both Anthropic and OpenCode Zen), Nenya can route to all of them. Configure the agent's model list to include the model and both providers will be candidates.

What happens to streaming when MCP is configured?

When MCP is enabled:

The upstream response is buffered (not streamed to the client yet)
Tool calls are inspected
MCP tools are executed locally
The request is re-sent if needed
Only the final response (no tool calls) is streamed to the client

This means MCP introduces a latency trade-off: the first response is delayed, but subsequent turns are faster since the LLM context is already populated with tool results.

Uh oh!

FAQ

FAQ

What is Nenya?

Do I need Ollama?

What's the difference between an agent and a model?

How do I add a custom provider?

OpenAI-compatible (no code changes)

Non-OpenAI format

What happens if the engine fails?

How do hot reloads work?

Can I use Nenya without MCP?

What wire formats are supported?

How does secret redaction work?

How does model discovery work?

What deployment options are available?

How does RBAC work?

Can I use multiple providers for the same model?

What happens to streaming when MCP is configured?

See Also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally