vectorless-engine — roadmap

Living document. Tick boxes as things land. Edit freely; this is the source of truth for "what's done, what's next."

Legend: [x] done · [~] in progress · [ ] not started · [?] idea, not committed · (opt) optional polish

Phase 0 — scaffold (shipped)

One-line: get a single Go binary that builds, boots, and serves an HTTP surface.

Phase 1 — ingest (shipped)

One-line: raw bytes → queryable, persisted tree.

Phase 2 — retrieval (shipped, minus benchmarks)

One-line: turn POST /v1/query from a 501 into the feature the engine exists for.

Phase 3 — ecosystem (soon)

One-line: the engine becomes useful beyond a single go run on a laptop.

Phase 4 — scale (later)

One-line: push the engine past the "one doc, one query" comfort zone.

Multi-document queries — reason across N trees in one call, merge across docs
Streaming answers — SSE on /v1/query, tokens as they come
Tree caching — cache the View prompt per document+model so repeated queries skip rebuilding
Tree compaction — merge adjacent leaf sections with tiny token counts for more efficient reasoning
Incremental re-ingest — detect changed sections in a re-uploaded doc, keep stable section IDs for unchanged ones
Access control — per-document ACLs + API key scoping (the control-plane's job, but engine needs hooks)

Cross-cutting — always on

Observability

OpenTelemetry tracing (HTTP + queue jobs + LLM calls)
Prometheus metrics endpoint (/metrics): request counters, queue depth, ingest latency, LLM token usage, error rates
Structured error wrapping everywhere (sentinel errors + errors.Is)

Security

API key auth middleware (pluggable; the control-plane supplies keys)
Rate limiting per key
Request size limits (already 32MB on multipart; review)
SBOM generation + supply-chain signing (cosign on images)

Developer docs

docs/API.md — full OpenAPI-driven reference
docs/CONTRIBUTING.md — conventions, commit style, local dev loop
docs/ADR/ — architecture decision records as we go
docs/BENCHMARKS.md — live numbers, updated per release

Testing

Unit test coverage ≥ 70% on internal/retrieval, internal/ingest, internal/db, internal/parser
Integration test suite that spins docker-compose and runs end-to-end ingest → query
Fuzz tests on parsers (malformed markdown, malformed HTML, truncated PDFs)
Load test harness with k6 or vegeta scripts

Known issues / deferred

Windows CRLF handling in git — benign warnings on every git add
PDF parser doesn't handle scanned (image-only) PDFs — needs OCR
DOCX parser loses inline formatting (bold/italic/links) — plain text only for now
Summarizer is sequential; large trees (> 100 sections) serialize too long
handleQueueWebhook is a no-op stub; needed when queue.driver=qstash

How to use this doc

Before starting a task: flip its box to [~] in a tiny commit so collaborators see it's claimed.
On merge: flip to [x] in the same PR that delivers the work.
New ideas: drop them under the right phase with [?] — it means "plausible, not committed yet."
Removals: if a task turns out not to make sense, delete it rather than leaving a zombie checkbox.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vectorless-engine — roadmap

Phase 0 — scaffold (shipped)

Phase 1 — ingest (shipped)

Phase 2 — retrieval (shipped, minus benchmarks)

Phase 3 — ecosystem (soon)

Phase 4 — scale (later)

Cross-cutting — always on

Observability

Security

Developer docs

Testing

Known issues / deferred

How to use this doc

FilesExpand file tree

ROADMAP.md

Latest commit

History

ROADMAP.md

File metadata and controls

vectorless-engine — roadmap

Phase 0 — scaffold (shipped)

Phase 1 — ingest (shipped)

Phase 2 — retrieval (shipped, minus benchmarks)

Phase 3 — ecosystem (soon)

Phase 4 — scale (later)

Cross-cutting — always on

Observability

Security

Developer docs

Testing

Known issues / deferred

How to use this doc