An adressable hierarchical memory system for LLMs. Compress a 10k-token context into 2k active tokens without losing information — by making the rest adressable on demand.
MindCity is a research proof-of-concept that explores a new direction for LLM long-context memory: instead of trying to fit everything into the context window, make everything adressable at the right granularity, on demand.
The core thesis:
The "lossless" compression of a long LLM context does not mean fitting everything into fewer tokens. It means making the full content adressable on demand while exposing, by default, only what is relevant at the relevant level of granularity.
MindCity combines:
- Global entity deduplication — a shared dictionary of entities and concepts across documents, strictly reversible
- Hierarchical spatial metaphor — a "city" structure (districts, buildings, apartments, rooms, drawers) that gives the LLM a stable mental model for navigation
- A minimal navigation DSL — around ten verbs (
enter,list,zoom,follow,search_local, etc.) exposed via standard tool calls, no fine-tuning required - A zero-copy binary storage layer — Apache Arrow + Kùzu + LanceDB for physical speed
The LLM never sees the binary. It walks through the city, zooming in where it needs detail. Dense summaries at each level prevent it from being overwhelmed.
This is a research proof-of-concept, not a production system.
Current phase: Phase 1 — Ingestion & entity dictionary (3/5 workstreams complete).
What's done:
- Full project scaffolding (config, logging, types, tests, CI)
- Synthetic corpus generator with 10 diverse topic templates
corpus_tiny(10 conversations) generated and committed- Ingestion pipeline: loader (multi-format JSON), normalizer (MindCity/Claude/ChatGPT exports), chunker (1 message = 1 chunk)
- Entity system: spaCy NER + pattern matching extractor, LMDB-backed entity dictionary, mention resolver with alias tracking
- 39 unit tests passing, lint clean (ruff + black)
What's next: entity pointer encoding/decoding (@ent:xxx), full pipeline orchestration, and phase 1 exit criteria validation.
See CLAUDE.md for the current state of each phase and PLAN.md for the roadmap.
Start here, in order:
| Document | Purpose |
|---|---|
VISION.md |
The thesis, the three levers, the v1/v2 scope |
BENCHMARK.md |
Evaluation protocol, metrics, baselines |
PLAN.md |
Implementation roadmap, stack, phase-by-phase milestones |
CLAUDE.md |
Living context file, updated every session |
paper/main.tex |
The accompanying research paper (drafted alongside the code) |
MindCity is structured around seven explicit research questions (see VISION.md section 9):
- RQ1 — Compression. What effective compression ratio does MindCity achieve versus a naive RAG, at equivalent answer quality?
- RQ2 — Quality. At a fixed token budget, does MindCity's answer quality match or exceed naive RAG?
- RQ3 — Adressability. What fraction of hard questions require explicit zooming? Does the LLM learn zero-shot to zoom at the right moment?
- RQ4 — Hierarchy. Does the spatial metaphor (city/district/building/…) improve navigation over abstract hierarchies?
- RQ5 — Deduplication. What gain does the entity dictionary bring in isolation? Does it combine linearly with the hierarchy?
- RQ6 — Latency. How many tool calls does MindCity need per query on average?
- RQ7 — Scaling. How do metrics evolve from 100 to 10,000 conversations?
| System | Structure | Compression | LLM-driven navigation |
|---|---|---|---|
| Naive RAG | flat chunks | top-k | no |
| LLMLingua | prompt-level | perplexity pruning | no |
| Gist tokens / ICAE | fine-tuned memory slots | learned | no |
| MemGPT / Letta | paginated memory | coarse | yes (page-level) |
| GraphRAG | hierarchical communities | multi-level summaries | no |
| HippoRAG | concept graph + PageRank | none (retrieval only) | no |
| MindCity | spatial hierarchy | deduplication + adressable zoom | yes, fine-grained |
MindCity is the only system combining global deduplication, LLM-driven hierarchical navigation, and a cognitive metaphor that the LLM can use zero-shot.
# Clone and install
git clone https://github.com/berch-t/mindcity.git
cd mindcity
uv venv --python 3.12 .venv
source .venv/bin/activate
uv pip install -e ".[dev]"
# Verify everything works
make lint # ruff + black
make test # 39 unit tests
# Generate the synthetic test corpus
python scripts/generate_synthetic_corpus.py --size tiny
# (Coming soon) Ingest and benchmark
make ingest
make benchmarkmindcity/
├── VISION.md, BENCHMARK.md, PLAN.md, PROMPT.md, CLAUDE.md # research docs
├── src/mindcity/ # core code
│ ├── ingestion/ # loading, normalizing, chunking
│ ├── entities/ # deduplication dictionary (lever 1)
│ ├── hierarchy/ # city construction (lever 2)
│ ├── storage/ # Kùzu + LanceDB + Arrow wrappers
│ ├── dsl/ # navigation verbs + LLM loop
│ └── api/ # FastAPI exposure
├── baselines/ # raw context, RAG, BM25, GraphRAG-lite
├── benchmarks/ # questions, judge, metrics
├── scripts/ # run_benchmark, generate_corpus, etc.
├── tests/ # pytest suite
├── data/ # corpora (synthetic committed, real gitignored)
├── results/ # benchmark outputs, committed
└── paper/ # LaTeX research paper, drafted along the way
This is a research project in active development. Issues, discussions, and pull requests are welcome, especially around:
- Alternative clustering strategies for hierarchy construction
- Improvements to the DSL specification
- New baselines to compare against
- Additional corpora for robustness evaluation
See CLAUDE.md for the current state and open questions.
If you use or reference MindCity in your research, please cite the paper (when available) or this repository:
@misc{mindcity2026,
author = {Berchet, Thomas},
title = {MindCity: An Adressable Hierarchical Memory System for LLMs},
year = {2026},
url = {https://github.com/berch-t/mindcity}
}MIT — see LICENSE.
This project builds conceptually on the excellent prior work of LLMLingua (Microsoft), GraphRAG (Microsoft Research), MemGPT / Letta, HippoRAG, and the broader LLM long-context research community. MindCity's contribution is to combine adressability with cognitive metaphor and global deduplication in a single evaluated system.
"Quand on veut, on peut."