vectorless-engine

A retrieval engine that reasons over document structure — not embeddings.
No chunking. No top-K. No vector database. Just a tree, an LLM, and full sections.

Why · How it works · Quick start · Architecture · Configuration · Roadmap

Why vectorless

Vector RAG works — until you hit the parts where it doesn't. Chunks lose structure. Top-K is a guess. Embeddings drift. You maintain a second database just to do approximate similarity on bits of text you cut out of context.

vectorless-engine takes a different path: at ingest, it builds a structured tree of the document (titles, summaries, metadata) — essentially an LLM-friendly table of contents. At query time, an LLM reads that tree and picks the exact section IDs it needs. The engine returns those sections in full, with their narrative intact.

What you get:

No embeddings — nothing to recompute when you swap models, nothing to drift.
No vector database — Postgres + object storage is enough.
No top-K tuning — the model picks 1 section or 8, as needed.
Full context preserved — sections are returned whole, not as fragments.
Citations for free — every returned section has a stable ID.
Provider-agnostic — Anthropic, OpenAI, Gemini all plug in behind the same interface.

How it works

1. Ingest — build a structured tree

Upload a document; the engine parses it, splits it along semantic section boundaries (not blind chunks), summarizes each section with a cheap model, and persists the tree. All asynchronous, driven by the queue of your choice.

2. Query — the LLM reasons over the tree

For small documents the whole tree fits in one prompt. For large documents the engine splits the tree into budget-sized slices, fires parallel LLM calls, and merges the results — so you're never bottlenecked on a single model's context window.

3. Return — full sections, not fragments

The engine fetches the selected sections from object storage and returns them intact. Your downstream model (or agent) gets coherent, cite-able text — not a bag of chunks.

Architecture

The engine is a single Go binary with four pluggable boundaries:

Boundary	Implementations shipped
Storage	Local filesystem · S3-compatible (AWS S3, Cloudflare R2, MinIO, Backblaze B2, DigitalOcean Spaces) — GCS / Azure planned
Queue	QStash (serverless) · River (Postgres) · Asynq (Redis)
LLM	Anthropic · OpenAI · Gemini — all behind one `llm.Client` interface
Retrieval	`single-pass` (small trees, one call) · `chunked-tree` (big trees, parallel map-reduce)

Everything else — the control plane, dashboard, MCP server, SDKs — lives outside this repo and talks to the engine over its HTTP API. Run the engine standalone; run it behind your own control plane; embed it in your product. Up to you.

Quick start

Prerequisites

Go 1.25+
Postgres 15+ (for the job queue + document metadata)
An API key from Anthropic, OpenAI, or Google

Run locally

git clone https://github.com/hallelx2/vectorless-engine.git
cd vectorless-engine

cp config.example.yaml config.yaml
# edit config.yaml — set your LLM API key and database URL

docker compose up -d postgres
go run ./cmd/engine --config config.yaml

Or run the whole stack containerised:

export ANTHROPIC_API_KEY=sk-ant-...
docker compose --profile engine up --build
# engine → http://localhost:8080

The engine listens on :8080 by default:

curl http://localhost:8080/v1/health
# {"status":"ok"}

Ingest and query

# upload a document
curl -X POST http://localhost:8080/v1/documents \
  -F "file=@whitepaper.pdf"
# → {"document_id":"doc_01H...","status":"pending"}

# poll until ready (status: ready)
curl http://localhost:8080/v1/documents/doc_01H...

# query it
curl -X POST http://localhost:8080/v1/query \
  -H "Content-Type: application/json" \
  -d '{
    "document_id": "doc_01H...",
    "query": "what are the API stability guarantees?"
  }'
# → {"sections": [{"id":"sec_...","title":"...","content":"..."}]}

HTTP API (v1)

Method	Path	Purpose
GET	`/v1/health`	Liveness probe
GET	`/v1/version`	Engine version
GET	`/v1/documents`	List documents (paginated; `?status`, `?limit`, `?cursor`)
POST	`/v1/documents`	Ingest a document (async, returns 202)
GET	`/v1/documents/{id}`	Document metadata + status
DELETE	`/v1/documents/{id}`	Delete a document
GET	`/v1/documents/{id}/tree`	Full structured tree
POST	`/v1/query`	Query — returns relevant sections
GET	`/v1/sections/{id}`	Fetch a single section in full

Routes are versioned under /v1 from day one. Breaking changes ship under /v2 with a deprecation window.

Configuration

The engine reads config from --config <path>.yaml, then overlays environment variables prefixed with VLE_. Environment variables always win.

Minimal config.yaml:

server:
  addr: ":8080"

database:
  url: "postgres://vle:vle@localhost:5432/vectorless?sslmode=disable"

storage:
  driver: local          # local | s3
  local:
    root: "./data"

queue:
  driver: river          # qstash | river | asynq
  river:
    num_workers: 8

llm:
  driver: anthropic      # anthropic | openai | gemini
  anthropic:
    api_key: "${ANTHROPIC_API_KEY}"
    model: "claude-sonnet-4-5"
    reasoning_model: "claude-opus-4-5"

retrieval:
  strategy: chunked-tree # single-pass | chunked-tree

log:
  level: info            # debug | info | warn | error
  format: json           # json | console

See config.example.yaml for the full reference.

TLS

The engine is plaintext HTTP by default — the recommended production setup is to terminate TLS at a reverse proxy (Caddy, nginx, an ALB, a Kubernetes ingress, Cloudflare) so cert rotation lives outside the binary. For single-node / homelab / direct-to-internet deployments you can opt into direct TLS:

server:
  addr: ":8443"
  tls:
    cert_file: "/etc/vectorless/cert.pem"
    key_file:  "/etc/vectorless/key.pem"
    min_version: "1.2"       # "1.2" | "1.3"

Or via environment variables: VLE_TLS_CERT_FILE, VLE_TLS_KEY_FILE.

Supported document formats

Format	Parser	Notes
Markdown	`goldmark`	ATX + Setext headings become section boundaries
HTML	`golang.org/x/net/html`	Prefers `<main>`/`<article>`; skips nav/footer/script
DOCX	stdlib `archive/zip` + `encoding/xml`	`Heading 1…9` styles become section boundaries
PDF	`ledongthuc/pdf`	Font-size heuristic recovers headings from unstructured PDFs
Text	stdlib	Single-section fallback

New parsers drop in behind a one-method Parser interface — see pkg/parser/.

Features

✅ Structured tree retrieval — no embeddings, no ANN index
✅ Pluggable LLM providers (Anthropic, OpenAI, Gemini)
✅ Pluggable queue backends (QStash, River, Asynq)
✅ Pluggable storage (Local, S3-compatible)
✅ Parallel map-reduce over big trees (context-budget-aware)
✅ Versioned HTTP API (/v1) with OpenAPI spec (coming)
✅ Graceful shutdown, structured logging, request IDs
✅ Postgres schema + embedded migrations (pgx v5)
✅ Document parsers: Markdown · HTML · DOCX · PDF · Text
✅ Optional direct TLS (opt-in; default is plaintext behind a reverse proxy)
🚧 Official SDKs — TypeScript, Python, Go (separate repos)
🚧 Dockerfile + Helm chart
🚧 Benchmarks vs. traditional RAG

Roadmap

Phase 0 — scaffold ✅ — interfaces, HTTP layer, local + QStash + Anthropic stubs
Phase 1 — ingest ✅ — parsers (MD/HTML/DOCX/PDF/TXT), tree builder, summarizer, Postgres migrations, TLS, docker
Phase 2 — retrieval 🚧 — single-pass and chunked-tree live, real LLM clients, benchmarks
Phase 3 — ecosystem ⏭ — River + Asynq live, S3 live, SDKs, Helm, goreleaser
Phase 4 — scale ⏭ — multi-document queries, streaming, caching, tree compaction

→ See ROADMAP.md for the full task list with subtasks and checkboxes.

Track progress in GitHub Issues and Projects.

Project layout

cmd/engine/            # main binary entry point
internal/
  api/                 # chi HTTP router, v1 routes (private to the binary)
pkg/
  config/              # YAML + env config with validation
  db/                  # pgx pool, embedded migrations, CRUD helpers
  ingest/              # parse → persist → summarize pipeline
  parser/              # Parser interface + MD / HTML / DOCX / PDF / TXT drivers
  queue/               # Queue interface + QStash / River / Asynq drivers
  retrieval/           # Strategy interface + single-pass / chunked-tree
  storage/             # Storage interface + local / S3 drivers
  tree/                # core tree / section data model
docs/                  # API spec, architecture notes, images

LLM provider access lives in a separate module, llmgate, which the engine imports as github.com/hallelx2/llmgate. That's where Anthropic / OpenAI / Gemini clients, retry / budget / cache middleware, and the cost table live.

Contributing

Contributions are very welcome — especially parsers, benchmarks, and new LLM / storage drivers. Please open an issue first for anything non-trivial so we can align on the design.

Run tests: go test ./...
Build binary: go build -o engine ./cmd/engine
Lint: go vet ./...

Related projects

vectorless-dashboard (private) — web UI + control plane built on top of this engine
vectorless-mcp (private) — Model Context Protocol server for agents
@vectorless/sdk-* (open source, coming soon) — TS / Python / Go SDKs

Acknowledgements

Inspired by prior work on tree-structured retrieval (RAPTOR), the llms.txt proposal, and the broader movement toward reasoning-native retrieval.

License

Licensed under the Apache License, Version 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github		.github
cmd/engine		cmd/engine
docs		docs
internal/api		internal/api
pkg		pkg
.gitignore		.gitignore
.golangci.yml		.golangci.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
config.example.yaml		config.example.yaml
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vectorless-engine

Why vectorless

How it works

1. Ingest — build a structured tree

2. Query — the LLM reasons over the tree

3. Return — full sections, not fragments

Architecture

Quick start

Prerequisites

Run locally

Ingest and query

HTTP API (v1)

Configuration

TLS

Supported document formats

Features

Roadmap

Project layout

Contributing

Related projects

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vectorless-engine

Why vectorless

How it works

1. Ingest — build a structured tree

2. Query — the LLM reasons over the tree

3. Return — full sections, not fragments

Architecture

Quick start

Prerequisites

Run locally

Ingest and query

HTTP API (v1)

Configuration

TLS

Supported document formats

Features

Roadmap

Project layout

Contributing

Related projects

Acknowledgements

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages