Skip to content
Rafael Gumieri edited this page May 15, 2026 · 3 revisions

FAQ

What is Nenya?

Nenya is a lightweight AI API Gateway that sits between your AI coding clients (OpenCode, Cursor, Aider) and upstream LLM providers. It adds secret redaction, agent-based routing with fallback chains, context management, MCP tool integration, and more — all with transparent SSE streaming.

It is not a model hosting service. It routes requests to your configured upstream providers (Anthropic, Gemini, DeepSeek, etc.).

Do I need Ollama?

No. Ollama is only needed for the optional content pipeline features (engine summarization, privacy filtering). Nenya works as a routing proxy without Ollama:

  • If the engine is unreachable, requests proceed unchanged
  • Regex-based secret redaction works independently
  • Agent fallback, circuit breaker, and latency routing work without any engine

What's the difference between an agent and a model?

A model is a specific LLM (e.g., gemini-2.5-flash, deepseek-chat). An agent is a named routing configuration that can reference multiple models in a fallback chain.

When your client sends model: "build", Nenya resolves the agent build and routes through its configured models, applying circuit breaker and fallback logic.

How do I add a custom provider?

OpenAI-compatible (no code changes)

{
  "providers": {
    "fireworks": {
      "url": "https://api.fireworks.ai/inference/v1/chat/completions",
      "auth_style": "bearer"
    }
  }
}

Add the API key under provider_keys in secrets.

Non-OpenAI format

For providers with different wire formats (Bedrock, Vertex), create a Go adapter implementing ProviderAdapter.

What happens if the engine fails?

The entire content pipeline is best-effort. If any step fails:

  1. A warning is logged
  2. The original payload is forwarded unchanged
  3. The client never sees an error from the pipeline

When skip_on_engine_failure is true (default), even hard-limit payloads that fail engine summarization are forwarded unchanged instead of being truncated.

How do hot reloads work?

systemctl reload nenya:

  1. Reloads config from the same path
  2. Re-discovers model catalogs from all providers
  3. Validates config structure
  4. Preserves usage counters, metrics, and thought signature cache across reloads
  5. On failure, continues serving with the old config

Can I use Nenya without MCP?

Yes. MCP is entirely optional. If no MCP servers are configured, there is zero overhead — no allocations, no goroutines, no tool injection.

What wire formats are supported?

  • OpenAI Chat Completions (native, ~80% of providers)
  • Anthropic Messages API (via format conversion adapter)
  • Gemini API (via Gemini provider adapter)

Models can declare a format attribute for per-model wire format routing.

How does secret redaction work?

The Tier-0 regex secret redaction runs on every request. Built-in patterns match:

  • AWS access keys and secret keys
  • GitHub personal access tokens
  • Google OAuth credentials
  • OpenAI/Anthropic API keys (sk-, sk-ant-)
  • PEM private keys
  • AWS credential file lines
  • Password/key assignments
  • Docker tokens and SendGrid keys

You can add custom regex patterns in the bouncer.redact_patterns config, or use a named preset via bouncer.redact_preset ("credentials", "pii").

How does model discovery work?

At startup and on systemctl reload nenya, Nenya fetches /v1/models from each configured provider (concurrently, 10s timeout). Discovered models are merged with the built-in registry using three-tier priority: config overrides > discovered > static.

What deployment options are available?

  • Bare metal (systemd) — binary install, socket activation, hot reload
  • Container (Podman/Docker Compose) — hardened, read-only root, mlock
  • Kubernetes (Helm) — ConfigMap/Secret, ingress setup

How does RBAC work?

Nenya enforces role-based access control (RBAC) via per-API key configuration:

Roles:

  • admin — Unrestricted access (bypasses all checks)
  • user — Access to configured agents, all non-admin endpoints
  • read-only — GET requests only (e.g., /v1/models, /healthz, /statsz, /metrics)

Agent Scoping:

  • allowed_agents list restricts which agents the key can access (empty = all agents)
  • Admin keys bypass agent restrictions

Endpoint Restrictions:

  • allowed_endpoints list allows fine-grained allowlisting (method + path)
  • Overrides default role-based permissions when set
  • Admin keys bypass endpoint restrictions

See Secrets for configuration examples.

Can I use multiple providers for the same model?

Yes. When a model exists in multiple providers (e.g., claude-sonnet-4-5 in both Anthropic and OpenCode Zen), Nenya can route to all of them. Configure the agent's model list to include the model and both providers will be candidates.

What happens to streaming when MCP is configured?

When MCP is enabled:

  1. The upstream response is buffered (not streamed to the client yet)
  2. Tool calls are inspected
  3. MCP tools are executed locally
  4. The request is re-sent if needed
  5. Only the final response (no tool calls) is streamed to the client

This means MCP introduces a latency trade-off: the first response is delayed, but subsequent turns are faster since the LLM context is already populated with tool results.

See Also

Getting Started

Core Concepts

Reference

Operations

  • Demo — Test all pipeline tiers
  • Troubleshooting — Common issues and solutions
  • FAQ — Frequently asked questions
  • Security — Security policy and vulnerability reporting

Project

Clone this wiki locally