Living document. Tick boxes as things land. Edit freely; this is the source of truth for "what's done, what's next."
Legend: [x] done · [~] in progress · [ ] not started · [?] idea, not committed · (opt) optional polish
One-line: get a single Go binary that builds, boots, and serves an HTTP surface.
- Go module (
github.com/hallelx2/vectorless-engine, Go 1.25+) -
cmd/engineentry point with graceful shutdown (signal.NotifyContext, 15s drain) - Structured logging (
slog, JSON + console handlers) -
configpackage — YAML +VLE_*env overrides +Validate() - HTTP layer (chi router, RequestID / RealIP / Recoverer middleware)
- Pluggable interfaces:
storage.Storage,queue.Queue,llm.Client,retrieval.Strategy - Driver stubs for: local / S3 storage · QStash / River / Asynq queue · Anthropic / OpenAI / Gemini LLM
-
treepackage — coreTree/Section/Viewmodel - Dockerfile (multi-stage, distroless) +
docker-compose.yml(Postgres / Redis / MinIO) - Apache 2.0 license, README with badges and SVG diagrams
- GitHub repo created,
mainpushed, topics added
One-line: raw bytes → queryable, persisted tree.
-
Database layer
-
pgxpoolwrapper ininternal/dbwithOpen()+ ping - Embedded SQL migrations + auto-apply at boot (
schema_migrationstracked) - Schema:
documents(lifecycle: pending → parsing → summarizing → ready | failed) +sections(self-referential tree) - CRUD helpers:
NewDocument,GetDocument,SetDocumentStatus,SetDocumentTitle,DeleteDocument,UpsertSection,UpdateSectionSummary,GetSection,ListSections,LoadTree,ListDocuments - (opt) sqlc migration — queries are hand-written right now; revisit once schema stabilizes
-
-
Parser subsystem
-
Parserinterface +Registrythat routes by content-type / extension - Markdown (goldmark, ATX+Setext headings → level-stack hierarchy)
- HTML (
golang.org/x/net/html, prefers<main>/<article>, strips chrome) - DOCX (stdlib
archive/zip+encoding/xml, detectsHeading 1…9+Titlestyles — bothHeading2andHeading 2spellings) - PDF (
ledongthuc/pdf, pure Go no cgo)- Font-size heuristic for unstructured PDFs
-
/Outlinesground truth when bookmarks exist, with text-matching fallback (< 50% match ⇒ fall back) - (opt) OCR for scanned PDFs (Tesseract via shell-out, or LLM vision call)
- (opt) Encrypted PDF support via
NewReaderEncrypted
- Plain Text single-section fallback
- Table-driven smoke tests for all five (DOCX test assembles
.docxin-memory — no binary fixtures)
-
-
Ingest pipeline
-
Pipelineorchestrates parse → persist tree → summarize (all stages idempotent) - Registered against
queue.KindIngestDocument - Section content lands in object storage (
storage.Storage); DB holds only outline + summaries - Summarizer calls the
llm.Clientwith a terse one-sentence prompt - Graceful degradation when LLM is stubbed — falls back to truncated excerpt so ingest completes end-to-end in dev
- (opt) Parallel summarization via errgroup + semaphore (today it's sequential)
- (opt) Retry budget per section; surface summary errors on the document row
-
-
HTTP API (ingest side)
-
POST /v1/documents— multipart or JSON body, stores bytes, enqueues job, returns 202 -
GET /v1/documents— keyset pagination (?limit,?cursor,?status) -
GET /v1/documents/{id}— metadata + status -
GET /v1/documents/{id}/tree— compactView -
GET /v1/sections/{id}— metadata + full content -
DELETE /v1/documents/{id}— cascades to sections - (opt)
GET /v1/documents/{id}/source— stream the original bytes back - (opt) Presigned URL passthrough when
storage.SignedURLis supported
-
-
TLS
- Plaintext by default (behind reverse proxy — recommended production setup)
- Opt-in direct TLS via
server.tls.{cert_file,key_file,min_version}+VLE_TLS_*env overrides - (opt) Autocert / Let's Encrypt integration for single-node deployments
-
Dev ergonomics
-
docker-composewith Postgres, Redis, MinIO -
engineservice gated behindprofiles: ["engine"]for one-command containerised stack -
.gitignoretightened socmd/engine/main.gostops being ignored
-
One-line: turn POST /v1/query from a 501 into the feature the engine exists for.
-
Live LLM clients — extracted to
llmgate- Anthropic, OpenAI, Gemini all live via langchaingo under a shared
llmgate.Client - Provider switching is pure config (
llm.driver: anthropic | openai | gemini) - Retry with exponential backoff + jitter on 429 / 5xx (
llmgate/middleware/retry) - Cost tracking per call (
Usage.CostUSDpopulated from a static price table) - Error classification shared across providers (
llmgate.Classify) - Real
CountTokensvia provider endpoints — currently heuristic in llmgate; tracked in the llmgate roadmap, not a blocker here - Streaming responses (SSE) — deferred to Phase 4
- Anthropic, OpenAI, Gemini all live via langchaingo under a shared
-
Retrieval strategies
-
SinglePass— real implementation- Build prompt from
tree.View(titles + summaries + IDs, depth-aware indentation) - Request structured output (JSON list of section IDs + reasoning) — JSON-mode via prompt nudge + schema
- Validate returned IDs against the tree; drop unknown ones (
FilterKnownIDs) - Tolerate code fences / leading prose in model output (
ParseSelection)
- Build prompt from
-
ChunkedTree— real implementation of the parallel map-reduce design-
Splitterthat slices the tree view into budget-sized chunks with breadcrumb + sibling summaries (structure-aware bin-packing, recurses into oversized subtrees) -
errgroup+ semaphore bounded byMaxParallelCalls(already in scaffold) -
Mergepolicies:Uniondefault (dedupe + sorted) - (opt)
TopN(ranked),Vote(k-of-n)merges - Fall back to single slice when the tree fits the budget
- Filter IDs per-slice so the model can't fabricate IDs from other slices
-
- Unit tests with a mock
llm.Clientthat returns canned IDs- Happy-path selection, unknown-ID filtering, code-fence tolerance, multi-slice split, ID-fabrication guard, splitter fast path
-
-
POST /v1/queryhandler- Parse body
{ document_id, query, model?, max_tokens?, reserved_for_prompt?, max_parallel_calls?, max_sections? } - Load tree via
db.LoadTree - Run the configured
retrieval.Strategy - Fetch picked sections' content from storage
- Return
{ sections: [...], strategy, model, elapsed_ms } - (opt) Include
tokens_in/tokens_outin response (Response struct already tracks them — just needs plumbing) - (opt) SSE streaming variant for progressively revealing sections as they're picked
- Parse body
-
Benchmarks vs. traditional RAG
- Pick a corpus (e.g. 50 technical docs + hand-written QA pairs)
- Baseline: pgvector + OpenAI embeddings + top-K=5
- Metrics: precision@k, recall, citation correctness, $ per query, p50/p95 latency
- Publish in
benchmarks/README.md
One-line: the engine becomes useful beyond a single go run on a laptop.
-
Queue drivers — flesh out the stubs
- River live (Postgres-backed, uses the same DB as the data plane)
- Asynq live (Redis-backed, higher throughput path)
- QStash webhook signature verification in
handleQueueWebhook - Dead-letter surface (document row records last error + retry count)
-
Storage drivers
- S3-compatible live (AWS S3, Cloudflare R2, MinIO, Backblaze B2, DigitalOcean Spaces)
-
SignedURLfor providers that support it - (opt) GCS driver
- (opt) Azure Blob driver
-
SDKs (separate repos)
-
@vectorless/sdk-ts— TypeScript, targets node + edge runtimes -
vectorlessPython package — targets 3.10+ -
github.com/hallelx2/vectorless-go— Go client - OpenAPI 3 spec generated from route handlers, SDKs generated from it
-
-
Packaging / deploy
- GitHub Actions: build + test + lint matrix
- GHCR image publish on tag (
:latest,:vX.Y.Z,:sha-<short>) - Release binaries via
goreleaser(linux/darwin/windows × amd64/arm64) - Helm chart (
charts/vectorless-engine) - Terraform module (
terraform/) for one-click cloud deploys - systemd unit file for bare-metal installs
One-line: push the engine past the "one doc, one query" comfort zone.
- Multi-document queries — reason across N trees in one call, merge across docs
- Streaming answers — SSE on
/v1/query, tokens as they come - Tree caching — cache the
Viewprompt per document+model so repeated queries skip rebuilding - Tree compaction — merge adjacent leaf sections with tiny token counts for more efficient reasoning
- Incremental re-ingest — detect changed sections in a re-uploaded doc, keep stable section IDs for unchanged ones
- Access control — per-document ACLs + API key scoping (the control-plane's job, but engine needs hooks)
- OpenTelemetry tracing (HTTP + queue jobs + LLM calls)
- Prometheus metrics endpoint (
/metrics): request counters, queue depth, ingest latency, LLM token usage, error rates - Structured error wrapping everywhere (sentinel errors +
errors.Is)
- API key auth middleware (pluggable; the control-plane supplies keys)
- Rate limiting per key
- Request size limits (already 32MB on multipart; review)
- SBOM generation + supply-chain signing (cosign on images)
-
docs/API.md— full OpenAPI-driven reference -
docs/CONTRIBUTING.md— conventions, commit style, local dev loop -
docs/ADR/— architecture decision records as we go -
docs/BENCHMARKS.md— live numbers, updated per release
- Unit test coverage ≥ 70% on
internal/retrieval,internal/ingest,internal/db,internal/parser - Integration test suite that spins
docker-composeand runs end-to-end ingest → query - Fuzz tests on parsers (malformed markdown, malformed HTML, truncated PDFs)
- Load test harness with
k6orvegetascripts
- Windows CRLF handling in git — benign warnings on every
git add - PDF parser doesn't handle scanned (image-only) PDFs — needs OCR
- DOCX parser loses inline formatting (bold/italic/links) — plain text only for now
- Summarizer is sequential; large trees (> 100 sections) serialize too long
-
handleQueueWebhookis a no-op stub; needed whenqueue.driver=qstash
- Before starting a task: flip its box to
[~]in a tiny commit so collaborators see it's claimed. - On merge: flip to
[x]in the same PR that delivers the work. - New ideas: drop them under the right phase with
[?]— it means "plausible, not committed yet." - Removals: if a task turns out not to make sense, delete it rather than leaving a zombie checkbox.