Home

Nenya AI Gateway

A lightweight, zero-dependency AI API Gateway written in Go. Nenya sits between your AI coding clients and upstream LLM providers, adding secret redaction, context management, agent routing, and MCP tool integration — all with transparent SSE streaming.

go-version License zero-deps

Features

Routing & Agents

22 built-in providers with specialized adapters for wire format differences
Dynamic model discovery — fetches live model catalogs from providers at startup
Multi-provider model resolution — when a model exists in multiple providers, all are added to the agent's fallback chain
Agent fallback chains — round-robin or sequential with circuit breaker and automatic failover
Latency-aware routing — auto-reorder targets by historical median response time
Per-agent system prompts — inline or file-based

Security & Privacy

Tier-0 regex secret filter — always-on redaction of AWS keys, GitHub tokens, passwords, etc.
3-Tier content pipeline — pass-through, engine summarization, or TF-IDF relevance-scored truncation
Context window compaction — sliding window summarization with configurable engine
Stale tool call pruning — compact old assistant+tool response pairs to save tokens
Graceful degradation — never blocks requests due to engine or pipeline failures

Hardening (Deployment Security)

Non-root execution — runs as UID 65532 with dropped capabilities
Memory protection — IPC_LOCK for mlock; prevents secrets from swapping to disk
Read-only filesystem — immutable root + private /tmp
Seccomp + no-new-privileges — restricted syscalls, prevents privilege escalation
Zero-trust secrets — loaded via systemd credentials or container mounts, never to disk
Socket activation — seamless restarts with zero dropped connections

Reliability

Zero external dependencies — Go standard library only
Hot reload — systemctl reload nenya for zero-downtime config changes
Circuit breaker — per agent+provider+model with automatic failover and backoff
Rate limiting — per upstream host (RPM/TPM)
Response cache — in-memory LRU with SHA-256 fingerprinting

MCP Tool Integration

Tool discovery — connect to MCP servers for automatic tool injection
Multi-turn execution — intercept tool calls, execute against MCP servers, forward results
Auto-search — pre-fetch relevant context from MCP servers before forwarding
Auto-save — persist assistant responses to MCP memory servers

Request Flow

+----------------------------------------------+
| Client (Cursor / OpenCode / Aider / etc.)    |
| OpenAI-compatible request                    |
| POST /v1/chat/completions + Bearer token     |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Nenya Gateway                                |
| - auth check                                 |
| - parse JSON + extract model                 |
| - resolve agent/provider                     |
| - optional cache (HIT => replay SSE)         |
| - optional MCP context/tool injection        |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Privacy / Context Pipeline (best-effort)     |
| - Tier-0 regex + entropy secret redaction    |
| - compaction / pruning / window mgmt         |
| - engine summarize (usually local Ollama)    |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Routing                                      |
|  A) Standard forwarding                      |
|     - fallback chain + circuit breaker + RL  |
|  B) MCP multi-turn tool loop (if enabled)    |
|     - buffer SSE, execute MCP tools, re-send |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Upstream LLM Providers                       |
| Anthropic | Gemini | DeepSeek | Mistral | ...|
+----------------------------------------------+
                        |
                        |  SSE stream
                        v
+----------------------------------------------+
| Nenya SSE Pipeline                           |
| - adapter response transforms                |
| - usage accounting + stream filter           |
| - flush + (optional) cache capture           |
| - (optional) MCP auto-save                   |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Client receives transparent SSE output       |
+----------------------------------------------+

Quick Start

1. Install

curl -fsSL https://raw.githubusercontent.com/gumieri/nenya/main/install.sh | sudo sh

2. Run with Podman

Create minimal config and secrets:

mkdir -p config secrets
cat > config/config.json << 'EOF'
{
  "server": { "listen_addr": ":8080" },
  "agents": {
    "default": {
      "strategy": "fallback",
      "models": ["gemini-2.5-flash"]
    }
  }
}
EOF

cat > secrets/provider_keys.json << 'EOF'
{
  "provider_keys": {
    "gemini": "AIza..."
  }
}
EOF

cat > secrets/client.json << 'EOF'
{
  "client_token": "nk-$(openssl rand -hex 32)"
}
EOF

Run the container:

podman run -d \
  --name nenya \
  -p 8080:8080 \
  -v ./config:/etc/nenya:ro \
  -v ./secrets:/run/secrets/nenya:ro \
  -e NENYA_SECRETS_DIR=/run/secrets/nenya \
  --cap-drop=ALL \
  --cap-add=IPC_LOCK \
  --security-opt=no-new-privileges:true \
  --read-only \
  --tmpfs /tmp:rw,noexec,nosuid,size=64M \
  ghcr.io/gumieri/nenya:latest

Test it:

curl -H "Authorization: Bearer $(jq -r '.client_token' secrets/client.json)" \
  http://localhost:8080/healthz

Navigation

Getting Started

Quick Start — Detailed installation and first run
Client Setup — Configure OpenCode, Cursor, and other clients
Deployment — Bare metal (systemd), container, and Kubernetes guides

Core Concepts

Configuration — Full config reference with examples
Providers — All 22 providers, capabilities, and special behaviors
Routing — Latency-aware routing and fallback chains
Architecture — Package overview, request lifecycle, circuit breaker
MCP Integration — Model Context Protocol server integration

Reference

Passthrough Proxy — Raw provider endpoint proxying
Secrets — Systemd credentials, env var fallback, container/K8s deployment
Model Discovery — Dynamic model catalog fetching
API Endpoints — Endpoint reference with auth requirements

Operations

Demo — Step-by-step testing of all pipeline tiers
Troubleshooting — Common issues and solutions
FAQ — Frequently asked questions
Security — Security policy and vulnerability reporting

Project

Roadmap — Planned features and improvements
Disclaimer — Legal disclaimer and usage terms

Nenya on GitHub | Apache 2.0 License

Nenya on GitHub | Report an Issue | Apache 2.0 License

Getting Started

Home — Project overview
Quick Start — Install and run in 5 minutes
Client Setup — OpenCode, Cursor, and other clients
Deployment — Bare metal, container, Kubernetes

Core Concepts

Configuration — Config reference and examples
Providers — 22 providers, capabilities, special behaviors
Routing — Latency-aware routing and fallback chains
Architecture — Package overview and request lifecycle
MCP Integration — MCP server integration

Reference

Passthrough Proxy — Raw provider endpoint proxying
Secrets — Systemd credentials and container secrets
Model Discovery — Dynamic model catalog fetching
API Endpoints — Endpoint reference

Operations

Demo — Test all pipeline tiers
Troubleshooting — Common issues and solutions
FAQ — Frequently asked questions
Security — Security policy and vulnerability reporting

Project

Roadmap — Planned features
Disclaimer — Legal disclaimer

Uh oh!

Home

Nenya AI Gateway

Features

Routing & Agents

Security & Privacy

Hardening (Deployment Security)

Reliability

MCP Tool Integration

Request Flow

Quick Start

1. Install

2. Run with Podman

Navigation

Getting Started

Core Concepts

Reference

Operations

Project

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally