meomory

A memory prosthesis for LLM agents that learns from usage. Not better search — associative memory that improves with every interaction.

Blog (中文): 我让三个 AI 互相竞争进化，两天后它们发明了一个我看不懂的算法 | 知乎

What is this

LLM agents are brilliant amnesiacs — every conversation starts from scratch. Existing memory systems (RAG, vector search) are libraries: they store things but always return the same results for the same query.

meomory replaces static retrieval with online-learning associative memory. A 256×256 weight matrix that adjusts after every feedback signal — the more you use it, the better it gets.

The algorithm wasn't written by a human. Three AI models discovered it through evolutionary search.

Results

Embedding: qwen3-embedding 8B → random projection to 256d. The DGD learning layer is a 256×256 weight matrix on top.

Train=Test evaluation (early, flawed)

Our initial evaluation trained and tested on the same questions. The numbers looked impressive but did not reflect real-world generalization:

Method	LoCoMo P@1 (1976 Q)	LongMemEval P@1 (470 Q)
Cosine similarity (no learning)	22.4%	41.7%
BM25 keyword matching	30.9%	48.3%
Hand-written DGD	32.8%	56.0%
AI-evolved algorithm (train=test)	60.1%	95.5%

When we tested on unseen questions (train on 500, test on 1476), the evolved algorithm scored 0.9% — worse than cosine baseline. The 60.1% was a fitness illusion caused by overfitting to the evaluation set.

Online streaming evaluation (current, honest)

We redesigned the evaluation: each question is tested before the algorithm learns from it. Questions are shuffled across conversations to expose cross-topic interference.

Method	Online P@1 (1976 Q, shuffled)
Cosine similarity (no learning)	22.4%
AI-evolved algorithm (v6)	26.0%

The real gain from online learning is +3.6pp over cosine — modest but genuine. The evolved algorithm (gen107) discovered experience replay, row normalization, and learning rate decay to mitigate catastrophic interference.

What we learned

Shared weight matrices cause catastrophic interference: learning query A degrades retrieval for query B
Fitness function design matters more than algorithm design: train=test rewards memorization, online streaming rewards generalization
The evolved algorithms are useful for repeated query patterns: same user asking similar questions over time. Not useful for completely unseen queries.
Evolution independently discovered experience replay, residual connections, and adaptive learning rates — not because it "invented" them, but because selection pressure guided it toward known effective techniques.

Evolution

Three AI models evolving unsupervised on a server:

qwen3.5:9b (30%)   — mass exploration, local model
DeepSeek (20%)      — precise optimization
Opus 4.6 (50%)     — highest quality, primary driver

v4 (train=test): From 26.8% to 87% in two hours — fast but illusory. Evolved Adam + Momentum + dual-channel + BatchNorm.

v6 (online streaming + shuffled): From 22% to 26% overnight — slow but real. Evolved experience replay + row normalization + residual connections. Completely different algorithm direction under different selection pressure.

Usage

As a reranker plugin (plug into any memory system):

from src.mem0_integration.dgd_reranker import DGDReranker

reranker = DGDReranker({"dim": 256})

# Rerank after retrieval
results = your_memory_system.search(query)
reranked = reranker.rerank(query, results, top_k=5)

# Feedback: tell it which memory was correct
reranker.feedback(query, correct_memory)
# Next time, similar queries rank better

Run evolutionary search (discover better algorithms):

python scripts/run_funsearch.py \
  --ollama-host http://localhost:11434 \
  --deepseek-key sk-xxx \
  --iterations 0 --rounds 1 --samples 12 \
  --seed-from multi-arch

Project Structure

src/funsearch/        — Evolution framework (island model + multi-model + sandbox + numpy)
src/evolution/        — Prompt evolution (Judge prompt 28.9% → 90.9%)
src/mem0_integration/ — Mem0 reranker plugin
src/dgd.py           — Original DGD associative memory
src/bench/            — Benchmark framework (65 experiments, 2446 questions)
scripts/dashboard.py  — Real-time evolution dashboard
experiments/          — All experiment results + evolution population data

Inspiration

This project grew from my cognitive science knowledge base (49 concept cards). An AI read my notes on memory, forgetting, and preconscious activation, then auto-linked them to the DGD algorithm from the Hope/Nested Learning paper (NeurIPS 2025). I had reproduced that paper but focused on a different module — the AI found the piece I overlooked.

Name

meomory = me + memory。打错了，留下了。

Zhiyu Fang

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
.claude		.claude
.playwright-mcp		.playwright-mcp
data/benchmarks		data/benchmarks
docs		docs
experiments		experiments
memory		memory
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

meomory

What is this

Results

Train=Test evaluation (early, flawed)

Online streaming evaluation (current, honest)

What we learned

Evolution

Usage

Project Structure

Inspiration

Name

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

meomory

What is this

Results

Train=Test evaluation (early, flawed)

Online streaming evaluation (current, honest)

What we learned

Evolution

Usage

Project Structure

Inspiration

Name

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages