Skip to content

qntx/meme

meme

Crates.io Documentation CI License Rust

High-performance long-term memory for AI agents — production-grade pipeline with semantic compression, lifecycle reconciliation, full CRUD, hybrid retrieval, and persistent vector storage, written in Rust.

meme implements a production-grade memory pipeline with a Rust core: (1) Semantic Structured Compression extracts lossless, disambiguated memory entries from dialogues or raw facts via LLM, (2) Lifecycle Reconciliation deduplicates and manages ADD/UPDATE/DELETE/NOOP via LLM-driven conflict resolution at write time, and (3) Intent-Aware Retrieval Planning combines semantic, lexical (FTS), and structured metadata search with LLM-driven reflection. Memory is stored persistently on disk via LanceDB with full change history tracking.

Quick Start

Install the CLI

Shell (macOS / Linux):

curl -fsSL https://sh.qntx.fun/meme | sh

PowerShell (Windows):

irm https://sh.qntx.fun/meme/ps | iex

CLI

# Initialize configuration
meme init

# Add dialogues
meme add -s Alice "I'll be in Tokyo next Monday for the conference."
meme add -s Bob "Let's meet at Shibuya station at 3pm."

# Add raw facts (no speaker needed)
meme add "Alice prefers coffee over tea"

# Import from JSONL file
meme add --file conversation.jsonl

# Ask questions
meme ask "Where will Alice and Bob meet?"

# Semantic search
meme search "Alice travel plans"

# CRUD operations
meme get <uuid>
meme update <uuid> "Updated content here"
meme delete <uuid>

# View change history
meme history <uuid>

# List stored memories
meme list
meme list --json --limit 50

# Export / import
meme export -o memories.json
meme import memories.json

Library

use meme::MemeBuilder;

let meme = MemeBuilder::new()
    .api_key("sk-...")
    .model("gpt-4.1-mini")
    .build()
    .await?;

// Dialogue-based ingestion — automatically extracted into structured memory entries.
meme.add_dialogue("Alice", "Let's meet at 2pm tomorrow", None).await?;
meme.add_dialogue("Bob", "Sure, I'll bring the Q3 report", None).await?;
meme.finalize().await?;

// Direct fact ingestion — bypasses dialogue windowing.
meme.add("Alice prefers coffee over tea").await?;

// CRUD operations.
let results = meme.search("Alice meeting").await?;
let entry = meme.get(results[0].id).await?;
meme.update(results[0].id, "Alice prefers tea over coffee").await?;
meme.delete(results[0].id).await?;

// Change history tracking.
let events = meme.history(results[0].id).await?;

// Q&A — hybrid retrieval + LLM answer generation.
let answer = meme.ask("When will Alice meet?").await?;

See examples/ for more: basic, batch import.

Feature Flags

Feature Default Description
api-embedding yes Remote OpenAI-compatible embedding API
onnx no Local ONNX embedding via fastembed — auto-downloads models from Hugging Face Hub

Configuration

No configuration file is required. The library is configured entirely through MemeBuilder:

let meme = MemeBuilder::new()
    .api_key("sk-...")
    .model("gpt-4.1-mini")
    .base_url("https://api.openai.com/v1")
    .user_id("alice")           // multi-tenant isolation
    .session_id("session-001")  // multi-session isolation
    .build()
    .await?;

For full control, pass a Config struct directly:

use meme::config::{Config, LlmConfig, EmbeddingConfig, StoreConfig, PipelineConfig};

let config = Config {
    llm: LlmConfig { api_key: Some("sk-...".into()), ..Default::default() },
    embedding: EmbeddingConfig { model: "text-embedding-3-small".into(), dimension: 1536, ..Default::default() },
    store: StoreConfig { lancedb_path: "/custom/path/lancedb".into(), ..Default::default() },
    pipeline: PipelineConfig { semantic_top_k: 25, enable_reflection: true, ..Default::default() },
};

let meme = MemeBuilder::new().config(config).build().await?;

The CLI tool (meme-cli) optionally reads ~/.meme/config.toml. Environment variables override any file or default values:

Env Var Overrides Default
MEME_LLM_API_KEY llm.api_key (required)
MEME_LLM_BASE_URL llm.base_url https://api.openai.com/v1
MEME_LLM_MODEL llm.model gpt-4.1-mini
MEME_EMBEDDING_PROVIDER embedding.provider api
Full config.toml reference
[llm]
api_key = "sk-..."
base_url = "https://api.openai.com/v1"
model = "gpt-4.1-mini"
temperature = 0.1
max_retries = 3

[embedding]
provider = "api"                        # "api" or "onnx"
model = "text-embedding-3-small"        # API model name or fastembed model code
dimension = 1024                        # vector dimension (auto-detected for onnx)

[store]
lancedb_path = "~/.meme/lancedb"
table_name = "memories"

[pipeline]
window_size = 40                        # dialogues per extraction window
overlap_size = 2                        # overlap between consecutive windows
semantic_top_k = 25                     # max semantic search results
keyword_top_k = 5                       # max keyword search results
structured_top_k = 5                    # max structured search results
enable_planning = true                  # LLM-driven query analysis
enable_reflection = true                # iterative completeness checking
max_reflection_rounds = 2
max_build_workers = 16                  # parallel extraction workers
max_retrieval_workers = 8               # parallel search workers
enable_rerank = false                   # LLM-based reranking
# custom_extraction_prompt = "..."      # override built-in extraction prompt
# custom_answer_prompt = "..."          # override built-in answer prompt

Architecture

flowchart TB
    subgraph Write["Write Path"]
        D["Dialogues / Facts"] --> W[Windowing]
        W --> LLM1["LLM Extraction<br/><i>Semantic Structured Compression</i>"]
        LLM1 --> E[MemoryEntry]
        E --> EMB1[Embedding]
        EMB1 --> RC{"LLM Reconciliation<br/><i>ADD / UPDATE / DELETE / NOOP</i>"}
        RC --> VS[(VectorStore<br/>LanceDB)]
        RC --> HS[(HistoryStore<br/>Change Tracking)]
    end

    subgraph CRUD["CRUD API"]
        GA["get(id)"] --> VS
        UA["update(id, content)"] --> VS
        DA["delete(id)"] --> VS
        SA["search(query)"] --> VS
        HA["history(id)"] --> HS
    end

    subgraph Read["Read Path"]
        Q[Query] --> P["LLM Planning<br/><i>Intent-Aware Retrieval</i>"]
        P --> S1[Semantic Search<br/>dense vectors]
        P --> S2[Keyword Search<br/>FTS / Tantivy]
        P --> S3[Structured Search<br/>metadata filters]
        S1 & S2 & S3 --> M[Merge + Deduplicate]
        M --> R{Reflection}
        R -->|incomplete| P
        R -->|complete| G["LLM Answer Generation"]
    end

    VS -.-> S1 & S2 & S3
Loading

Each MemoryEntry is a self-contained, unambiguous unit of knowledge stored with three index layers:

Index Layer Type Purpose Implementation
Semantic Dense vector Conceptual similarity 1024-d embeddings via OpenAI or local ONNX
Lexical Inverted index Exact term matching FTS (Tantivy) + BM25-style keywords
Symbolic Structured metadata Filtered lookup Timestamp, location, persons, entities, topic

Pipeline

Stage 1: Semantic Structured Compression

Raw dialogues (or direct facts via add()) are split into overlapping windows and sent to an LLM. The LLM extracts atomic, self-contained memory entries — each entry is a complete, independent fact with all pronouns resolved and all timestamps converted to absolute ISO 8601 format.

Each entry contains:

  • Lossless restatement — complete sentence (no pronouns, no relative time)
  • Keywords — core terms for BM25-style lexical matching
  • Structured metadata — ISO 8601 timestamp, location, person names, entity names, topic phrase

Stage 2: Lifecycle Reconciliation

New entries are reconciled against existing memories in a single LLM call. For each new fact, the LLM decides:

Action When Effect
ADD Genuinely new information Store the new entry
UPDATE Supersedes an existing memory Delete old + store new
DELETE Contradicts an existing memory Remove the obsolete entry
NOOP Duplicate of existing memory Skip (no storage)

All write operations are tracked in the HistoryStore for audit and debugging.

Stage 3: Intent-Aware Retrieval Planning

A single LLM call analyzes the user's query and produces a unified retrieval plan:

  1. Query analysis — extract keywords, person names, entities, time expressions, and question type
  2. Search planning — generate 1–3 targeted search queries for semantic retrieval
  3. Information requirements — identify what specific facts are needed for a complete answer

The plan drives parallel execution of all three search layers (semantic, keyword, structured). Results are merged via ID-based deduplication.

When reflection is enabled, the system iteratively assesses completeness: if retrieved context is insufficient, additional targeted queries are generated and executed until the information requirement is satisfied or the max reflection rounds are reached.

Benchmark

meme-bench evaluates memory quality using the LOCOMO benchmark format:

MEME_LLM_API_KEY=sk-... meme-bench run --dataset locomo10.json
meme-bench run --dataset data.json --model gpt-4.1-mini --output report.json
meme-bench sample -o sample_bench.json  # generate sample dataset

Metrics: token-level F1, precision, recall, exact match — per question category (single-hop, temporal, commonsense, open-domain, adversarial).

License

Licensed under either of:

at your option.

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this project shall be dual-licensed as above, without any additional terms or conditions.


A QNTX open-source project.

QNTX

Code is law. We write both.

About

Long term memory for AI agents.

Topics

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Contributors