Skip to content

Roadmap

Rafael Gumieri edited this page May 2, 2026 · 1 revision

Roadmap

Planned features and improvements for Nenya. Items are grouped by domain — implementation order depends on user demand and technical feasibility.

API Surface Expansion

Non-Streaming Chat Completions ✅ Completed

Synchronous non-streaming responses (stream: false) — buffers upstream SSE into complete JSON response before returning.

Responses API ✅ Completed

Full lifecycle (/v1/responses) with GET/POST/DELETE support.

Embeddings (Enhanced) ✅ Completed

Token counting, rate limiting, and usage tracking for embeddings requests.

File Operations ✅ Completed

File CRUD (/v1/files): create, list, get, delete, content download.

Batch Processing ✅ Completed

Batch API (/v1/batches): submit, check status, cancel, retrieve results.

Passthrough Proxy ✅ Completed

/proxy/{provider}/* — arbitrary HTTP method passthrough with auth injection, SSE streaming auto-detect.

OpenAPI Specification 🔜 Planned

Auto-generated spec served at /openapi.json.

Intelligence & Routing

Model Discovery ✅ Completed

Dynamic fetch of /v1/models from providers at startup and on reload.

Semantic Caching 🔜 Planned

Vector-based caching using local embeddings and cosine similarity (zero-dep, in-memory).

Auto-Fallback Intelligence 🔜 Planned

Elo rank-based fallback with capability overlap scoring.

Model Metadata 🔜 Planned

External model list with pricing, categories, rankings for cost tracking.

Admin API (for External Dashboard)

Usage Analytics API 🔜 Planned

Detailed per-agent/model/provider usage breakdowns with time-series data.

Configuration Management API 🔜 Planned

CRUD via API with internal hot-reload — manage agents, providers, and keys without editing JSON.

Circuit Breaker Management API 🔜 Planned

Inspect and manually control circuit breaker state per target.

Non-Goals

These features are explicitly out of scope for Nenya's single-user, local-first design:

  • Multi-tenancy — Designed for single-user, local deployment
  • Per-key budgets — No multi-user isolation needed
  • Cluster mode — Single-node by design
  • Admin UI — Admin APIs provided; UI is a separate project
  • Semantic search — Not relevant for gateway use case
  • Workflow engine — Agents serve a similar purpose

Implementation Principles

  • Zero-dependency: All features maintain Go stdlib-only policy
  • Backward compatibility: All new features preserve existing streaming behavior
  • Security: Admin APIs require client_token auth
  • Testing: Each feature includes unit, integration, and fuzz tests

See Also

Getting Started

Core Concepts

Reference

Operations

  • Demo — Test all pipeline tiers
  • Troubleshooting — Common issues and solutions
  • FAQ — Frequently asked questions
  • Security — Security policy and vulnerability reporting

Project

Clone this wiki locally