High-performance long-term memory for AI agents — production-grade pipeline with semantic compression, lifecycle reconciliation, full CRUD, hybrid retrieval, and persistent vector storage, written in Rust.
meme implements a production-grade memory pipeline with a Rust core: (1) Semantic Structured Compression extracts lossless, disambiguated memory entries from dialogues or raw facts via LLM, (2) Lifecycle Reconciliation deduplicates and manages ADD/UPDATE/DELETE/NOOP via LLM-driven conflict resolution at write time, and (3) Intent-Aware Retrieval Planning combines semantic, lexical (FTS), and structured metadata search with LLM-driven reflection. Memory is stored persistently on disk via LanceDB with full change history tracking.
Shell (macOS / Linux):
curl -fsSL https://sh.qntx.fun/meme | shPowerShell (Windows):
irm https://sh.qntx.fun/meme/ps | iex# Initialize configuration
meme init
# Add dialogues
meme add -s Alice "I'll be in Tokyo next Monday for the conference."
meme add -s Bob "Let's meet at Shibuya station at 3pm."
# Add raw facts (no speaker needed)
meme add "Alice prefers coffee over tea"
# Import from JSONL file
meme add --file conversation.jsonl
# Ask questions
meme ask "Where will Alice and Bob meet?"
# Semantic search
meme search "Alice travel plans"
# CRUD operations
meme get <uuid>
meme update <uuid> "Updated content here"
meme delete <uuid>
# View change history
meme history <uuid>
# List stored memories
meme list
meme list --json --limit 50
# Export / import
meme export -o memories.json
meme import memories.jsonuse meme::MemeBuilder;
let meme = MemeBuilder::new()
.api_key("sk-...")
.model("gpt-4.1-mini")
.build()
.await?;
// Dialogue-based ingestion — automatically extracted into structured memory entries.
meme.add_dialogue("Alice", "Let's meet at 2pm tomorrow", None).await?;
meme.add_dialogue("Bob", "Sure, I'll bring the Q3 report", None).await?;
meme.finalize().await?;
// Direct fact ingestion — bypasses dialogue windowing.
meme.add("Alice prefers coffee over tea").await?;
// CRUD operations.
let results = meme.search("Alice meeting").await?;
let entry = meme.get(results[0].id).await?;
meme.update(results[0].id, "Alice prefers tea over coffee").await?;
meme.delete(results[0].id).await?;
// Change history tracking.
let events = meme.history(results[0].id).await?;
// Q&A — hybrid retrieval + LLM answer generation.
let answer = meme.ask("When will Alice meet?").await?;See examples/ for more: basic, batch import.
| Feature | Default | Description |
|---|---|---|
api-embedding |
yes | Remote OpenAI-compatible embedding API |
onnx |
no | Local ONNX embedding via fastembed — auto-downloads models from Hugging Face Hub |
No configuration file is required. The library is configured entirely through MemeBuilder:
let meme = MemeBuilder::new()
.api_key("sk-...")
.model("gpt-4.1-mini")
.base_url("https://api.openai.com/v1")
.user_id("alice") // multi-tenant isolation
.session_id("session-001") // multi-session isolation
.build()
.await?;For full control, pass a Config struct directly:
use meme::config::{Config, LlmConfig, EmbeddingConfig, StoreConfig, PipelineConfig};
let config = Config {
llm: LlmConfig { api_key: Some("sk-...".into()), ..Default::default() },
embedding: EmbeddingConfig { model: "text-embedding-3-small".into(), dimension: 1536, ..Default::default() },
store: StoreConfig { lancedb_path: "/custom/path/lancedb".into(), ..Default::default() },
pipeline: PipelineConfig { semantic_top_k: 25, enable_reflection: true, ..Default::default() },
};
let meme = MemeBuilder::new().config(config).build().await?;The CLI tool (meme-cli) optionally reads ~/.meme/config.toml. Environment variables override any file or default values:
| Env Var | Overrides | Default |
|---|---|---|
MEME_LLM_API_KEY |
llm.api_key |
(required) |
MEME_LLM_BASE_URL |
llm.base_url |
https://api.openai.com/v1 |
MEME_LLM_MODEL |
llm.model |
gpt-4.1-mini |
MEME_EMBEDDING_PROVIDER |
embedding.provider |
api |
Full config.toml reference
[llm]
api_key = "sk-..."
base_url = "https://api.openai.com/v1"
model = "gpt-4.1-mini"
temperature = 0.1
max_retries = 3
[embedding]
provider = "api" # "api" or "onnx"
model = "text-embedding-3-small" # API model name or fastembed model code
dimension = 1024 # vector dimension (auto-detected for onnx)
[store]
lancedb_path = "~/.meme/lancedb"
table_name = "memories"
[pipeline]
window_size = 40 # dialogues per extraction window
overlap_size = 2 # overlap between consecutive windows
semantic_top_k = 25 # max semantic search results
keyword_top_k = 5 # max keyword search results
structured_top_k = 5 # max structured search results
enable_planning = true # LLM-driven query analysis
enable_reflection = true # iterative completeness checking
max_reflection_rounds = 2
max_build_workers = 16 # parallel extraction workers
max_retrieval_workers = 8 # parallel search workers
enable_rerank = false # LLM-based reranking
# custom_extraction_prompt = "..." # override built-in extraction prompt
# custom_answer_prompt = "..." # override built-in answer promptflowchart TB
subgraph Write["Write Path"]
D["Dialogues / Facts"] --> W[Windowing]
W --> LLM1["LLM Extraction<br/><i>Semantic Structured Compression</i>"]
LLM1 --> E[MemoryEntry]
E --> EMB1[Embedding]
EMB1 --> RC{"LLM Reconciliation<br/><i>ADD / UPDATE / DELETE / NOOP</i>"}
RC --> VS[(VectorStore<br/>LanceDB)]
RC --> HS[(HistoryStore<br/>Change Tracking)]
end
subgraph CRUD["CRUD API"]
GA["get(id)"] --> VS
UA["update(id, content)"] --> VS
DA["delete(id)"] --> VS
SA["search(query)"] --> VS
HA["history(id)"] --> HS
end
subgraph Read["Read Path"]
Q[Query] --> P["LLM Planning<br/><i>Intent-Aware Retrieval</i>"]
P --> S1[Semantic Search<br/>dense vectors]
P --> S2[Keyword Search<br/>FTS / Tantivy]
P --> S3[Structured Search<br/>metadata filters]
S1 & S2 & S3 --> M[Merge + Deduplicate]
M --> R{Reflection}
R -->|incomplete| P
R -->|complete| G["LLM Answer Generation"]
end
VS -.-> S1 & S2 & S3
Each MemoryEntry is a self-contained, unambiguous unit of knowledge stored with three index layers:
| Index Layer | Type | Purpose | Implementation |
|---|---|---|---|
| Semantic | Dense vector | Conceptual similarity | 1024-d embeddings via OpenAI or local ONNX |
| Lexical | Inverted index | Exact term matching | FTS (Tantivy) + BM25-style keywords |
| Symbolic | Structured metadata | Filtered lookup | Timestamp, location, persons, entities, topic |
Raw dialogues (or direct facts via add()) are split into overlapping windows and sent to an LLM. The LLM extracts atomic, self-contained memory entries — each entry is a complete, independent fact with all pronouns resolved and all timestamps converted to absolute ISO 8601 format.
Each entry contains:
- Lossless restatement — complete sentence (no pronouns, no relative time)
- Keywords — core terms for BM25-style lexical matching
- Structured metadata — ISO 8601 timestamp, location, person names, entity names, topic phrase
New entries are reconciled against existing memories in a single LLM call. For each new fact, the LLM decides:
| Action | When | Effect |
|---|---|---|
| ADD | Genuinely new information | Store the new entry |
| UPDATE | Supersedes an existing memory | Delete old + store new |
| DELETE | Contradicts an existing memory | Remove the obsolete entry |
| NOOP | Duplicate of existing memory | Skip (no storage) |
All write operations are tracked in the HistoryStore for audit and debugging.
A single LLM call analyzes the user's query and produces a unified retrieval plan:
- Query analysis — extract keywords, person names, entities, time expressions, and question type
- Search planning — generate 1–3 targeted search queries for semantic retrieval
- Information requirements — identify what specific facts are needed for a complete answer
The plan drives parallel execution of all three search layers (semantic, keyword, structured). Results are merged via ID-based deduplication.
When reflection is enabled, the system iteratively assesses completeness: if retrieved context is insufficient, additional targeted queries are generated and executed until the information requirement is satisfied or the max reflection rounds are reached.
meme-bench evaluates memory quality using the LOCOMO benchmark format:
MEME_LLM_API_KEY=sk-... meme-bench run --dataset locomo10.json
meme-bench run --dataset data.json --model gpt-4.1-mini --output report.json
meme-bench sample -o sample_bench.json # generate sample datasetMetrics: token-level F1, precision, recall, exact match — per question category (single-hop, temporal, commonsense, open-domain, adversarial).
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or https://www.apache.org/licenses/LICENSE-2.0)
- MIT License (LICENSE-MIT or https://opensource.org/licenses/MIT)
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this project shall be dual-licensed as above, without any additional terms or conditions.