Vector Index Graph Memory

Important

This is a prototype focused on memory architecture, not a production-ready agent platform. The extraction pipeline is intentionally lightweight, the embedding model is deterministic and local, and the graph writes are designed to demonstrate identity-preserving memory behavior with minimal external dependencies. Use this as a learning reference or starting scaffold, not a drop-in production system.

This project is a graph-native memory system that solves a fundamental problem with how modern AI agents store and recall information. A flat vector index is excellent at finding similar text, but it has no way to know whether two similar-sounding things are actually the same entity. This repository demonstrates an architecture where identity is managed explicitly in a graph and semantic similarity is used only as a search signal, not as a definition of truth.

The system combines FastAPI, Neo4j 5 vector indexes, a deterministic local embedding service, lightweight entity extraction, and a conservative resolution gate. Each of these pieces addresses a specific weakness in purely vector-based memory systems. If you are evaluating whether this approach makes sense for your use case, the sections below provide enough detail to understand both the benefits and the tradeoffs.

The architecture material in this README is mirrored by the companion document at docs/architecture.md. The README gives the broader project narrative; the doc keeps a tighter architecture-only reference.

What Is A Vector Index?
What Is Graph Memory?
How Does This Compare To Other Approaches?
Is This Optimal - And For What Situations?
Why This Project Exists
What The System Does
Tech Stack And Why It Was Chosen
Architecture Overview
How Does Ingest Work Step By Step?
Memory Tiers
Identity Resolution Strategy
Retrieval Strategy
Repository Structure
Quick Start
Configuration
API Reference
Generated API Response Examples
Example Workflows
Testing And Validation
Current Constraints And Tradeoffs
When Should You Use This vs Something Else?
Summary

What Is A Vector Index?

A vector index is a data structure that stores numerical representations of text - called embeddings - and lets you find the most similar ones quickly. When you embed a sentence, you convert it into a list of numbers (a vector) where sentences with similar meanings end up close together in multi-dimensional space. The classic example: "The cat sat on the mat" and "A feline rested on the rug" would produce vectors that are close to each other even though the words are different.

Vector indexes are used everywhere in modern AI: in retrieval-augmented generation (RAG), semantic search, recommendation systems, and agent memory. They are fast, scalable, and require no manual schema design. The most common operations are insert (add a new vector) and approximate nearest neighbor search (find the top-k most similar vectors to a query).

Note

Popular vector databases include Pinecone, Weaviate, Chroma, Qdrant, and FAISS. Neo4j 5 added native vector index support so a graph database can now store both relationships and embeddings on the same nodes, eliminating the need for a separate vector store.

The limitation of a pure vector index is that it treats every stored chunk as isolated. There is no native concept of "this chunk and that chunk are about the same entity." Two different descriptions of the same person will be stored as two separate vectors. Two mentions of a company under different names will never be linked. Retrieval might surface both, but no identity logic exists to say they refer to the same thing. That is the problem this project addresses.

What Is Graph Memory?

Graph memory is a pattern where knowledge is stored as a property graph - a network of nodes and edges - rather than as a flat list of chunks or rows in a table. Each node represents an entity (a person, place, organization, event, or object). Each edge represents a relationship between two entities (one person works at one company, one event happened at one location).

The power of graph memory over flat storage is that it preserves structure. When you ask "what do we know about Anthropic?" a graph can not only return the Anthropic entity, but also traverse its edges to find related companies, people who work there, products it has created, and events it has been involved in - all without having to embed and search each of those separately.

Note

The term "memory" in this context refers to persistent structured knowledge that an AI agent can read and write over time, as opposed to the in-context window which only holds the current conversation. Graph memory is a form of long-term external memory.

In this project, graph memory is implemented using Neo4j 5, a mature graph database that adds native vector index support in its 5.x releases. That means the same Neo4j instance stores both the graph structure (nodes, edges, properties) and the vector embeddings (stored as node properties and indexed for similarity search). This is architecturally significant: you do not need a separate vector store. Identity, relationships, and semantic search all live together.

The graph also stores reasoning traces - records of which entities were touched during a retrieval operation and why. This is provenance: the ability to explain where retrieved context came from and retrace the reasoning path later.

How Does This Compare To Other Approaches?

There are several common ways to build memory for AI agents, each with different strengths. Understanding the tradeoff space helps you decide which approach fits your situation.

Note

The table below is a comparison across five memory patterns. None of these is universally best. The right choice depends on whether you need identity preservation, multi-hop traversal, operational simplicity, or semantic recall quality.

#	Approach	Identity Modeling	Relationship Traversal	Semantic Recall	Operational Complexity	Best For
1	Flat vector store (Chroma, Pinecone)	None - chunks are isolated	None	Excellent with modern embeddings	Low - single service	Pure document retrieval with no entity identity needs
2	Relational DB (Postgres, SQLite)	Strong via primary keys and foreign keys	Poor for deep traversal	Requires extension or separate vector store	Medium - familiar tooling	Structured data with well-defined schemas
3	In-memory object graph (Python dicts)	Moderate - depends on implementation	Good within a single session	None unless you add embeddings manually	Very low	Short sessions, rapid prototyping, no durability needed
4	Knowledge graph only (RDF, SPARQL)	Excellent - formal ontology	Excellent - multi-hop queries	Poor - requires bolted-on embedding layer	High - formal schema and query language	Formal knowledge bases with strict ontology
5	This project - hybrid graph plus vector	Explicit merge gate with conservative thresholds	Good via RELATED_TO and SAME_AS edges	Good - deterministic local embeddings	Medium - Neo4j plus FastAPI	Agent memory where identity drift is a risk

Tip

If you are building a simple RAG pipeline over static documents, a flat vector store is faster to set up and sufficient. This project is most valuable when your agent accumulates knowledge over time and you need stable references to the same entities across many conversations.

A critical distinction is what happens when the same entity appears under different names or descriptions. In a flat vector store, "OpenAI" and "Open AI" and "the company that made GPT-4" become three separate vectors with no link. In this system, all three would be candidates for merging into a single canonical entity node, with the decision made by the resolution gate rather than silently assumed.

Is This Optimal - And For What Situations?

This is one of the most important questions to answer honestly. The short answer: this system is optimal for a specific problem profile, not universally optimal.

Important

If semantic recall quality over large document corpora is your primary goal, a dedicated vector store with a strong embedding model will outperform this prototype. The deterministic hash embeddings used here sacrifice recall quality for reproducibility and zero external dependencies.

The system is well-suited when all of the following are true:

You have an AI agent that accumulates knowledge over time across many sessions
Entities (people, companies, tools, events) appear repeatedly under different names or descriptions
Incorrect merges are costly - you need a human review pathway for ambiguous cases
You need to explain why certain context was retrieved, not just what was retrieved
You want to store relationships between entities, not just similarity scores

The system is less well-suited when:

You need the highest possible semantic recall quality over large document sets
You have no entities - just raw text chunks with no identity semantics
Operational simplicity is a hard requirement (Neo4j adds infrastructure overhead)
You cannot tolerate the conservative merge policy creating duplicate nodes

#	Situation	Recommended Approach	Why This Project Fits or Does Not
1	Static document search over PDFs and articles	Flat vector store with strong embeddings	No identity management needed; vector recall is the whole problem
2	Agent with growing personal knowledge base	This project or similar hybrid	Entity identity and relationship traversal matter as knowledge grows
3	Customer support bot with product catalog	Relational DB plus vector search	Schema is well-defined; relational constraints are valuable
4	Research assistant tracking people and orgs over time	This project is a good fit	Same entities appear under many names; merge decisions need human review
5	Real-time chat with no persistent memory	In-context window only	Persistence overhead is wasted if memory does not outlive the session

Note

The most honest framing is this: this project trades peak recall performance for identity safety. It is the right trade when you are building a system where wrong merges cause real harm, and where explainability matters more than raw retrieval speed.

Why This Project Exists

A flat vector index is excellent for fuzzy retrieval, but it tends to blur identity boundaries. If two mentions are semantically close, a naive system may treat them as the same thing even when they should remain distinct. That becomes a memory bug, not just a retrieval bug. And memory bugs in AI agents are insidious: they accumulate silently, affect all future reasoning that draws on the corrupted memory, and are hard to detect because the agent will answer confidently based on the wrong merged entity.

The classic failure mode looks like this: an agent hears "Sam Altman leads OpenAI" and "Sam Adams is a beer brand." Because both involve someone named Sam, a pure similarity-based system with a low merge threshold might conflate these into one entity. Future queries about Sam Altman might return beer facts. The agent has no way to know this happened.

In this repository, identity lives in the graph and similarity remains a signal. That distinction is the central design choice. It makes the system more conservative than a pure semantic search stack, but it also makes it safer for agent memory where stable references matter over time.

What Does "Identity Lives In The Graph" Mean?

The graph in this sentence is a property graph - a data structure made of nodes (records with properties) and edges (directed, typed connections between nodes). This is not a mathematical graph in the abstract algebra sense, and it is not a neural network computation graph. It is a database model, the same kind used by Neo4j, Amazon Neptune, and similar graph databases.

In this system, every named entity that the system has ever seen becomes a node in that graph. A node looks like this in Neo4j's internal representation:

Node {
  id:          "entity:anthropic"
  name:        "Anthropic"
  entity_type: "Organization"
  aliases:     ["Anthropic PBC", "Anthropic AI"]
  description: "AI safety company that develops Claude"
  embedding:   [0.031, -0.012, 0.044, ..., 0.019]  -- 256 floats
}

When a relationship is extracted from text - for example "Anthropic developed Claude Code" - the system creates an edge between the Anthropic node and the Claude Code node:

(Anthropic) --[RELATED_TO {predicate: "developed"}]--> (Claude Code)

"Identity lives in the graph" means that the node is the identity. The Anthropic node is the single canonical record for Anthropic. It is not a chunk of text, not a row in a table, and not a vector. It is a named, addressable object in the graph with stable edges pointing to related objects. When you later ingest "Anthropic released a new model", the system does not create a second Anthropic node - it finds the existing one by running the resolution gate and merges new information into it. The identity is preserved.

Compare this to a flat vector store: if you embed and store "Anthropic, an AI safety company" and then later store "Anthropic PBC develops Claude", you now have two separate vectors. There is no "Anthropic node" - there are two anonymous chunks that happen to be similar. Future searches may return both, or only one, or neither, depending on the query. The identity is not preserved anywhere.

What Does "Similarity Remains A Signal" Mean?

A signal in this context is a scalar number between 0.0 and 1.0 - nothing more. It is not a matrix, not a gradient, and not a vector. It is a single floating-point score that measures how alike two things are.

When a new entity candidate arrives - say, a mention of "Anthropic AI" extracted from a new document - the system computes three signals against each existing Organization node:

#	Signal Name	How It Is Computed	Example Output	Data Type
1	Exact match	Case-insensitive string comparison of name and aliases	1.0 if "anthropic ai" is in aliases, else 0.0	float, range 0-1
2	Fuzzy match	`difflib.SequenceMatcher(None, a, b).ratio()`	0.87 for "Anthropic AI" vs "Anthropic"	float, range 0-1
3	Semantic match	Cosine similarity between two 256-dimensional vectors	0.91 for closely related descriptions	float, range 0-1

Those three numbers are combined into one final score using a weighted formula:

$$ score = \max!\bigl(exact,\ 0.45 \times fuzzy + 0.55 \times semantic\bigr) $$

The result is a single number like 0.89. That number is the signal. It says: "this candidate and this existing node are probably about the same thing, with confidence 0.89."

The crucial point is what happens next. The signal does not automatically rewrite the graph. Instead it is fed into a decision gate with two thresholds:

score >= 0.95  ->  merge  (update the existing node, absorb new aliases)
0.85 <= score < 0.95  ->  pending  (create new node, flag SAME_AS for human review)
score < 0.85  ->  create  (treat as genuinely new entity)

The signal informs the decision. The graph owns the decision. The graph is never automatically rewritten just because two things scored high similarity - the threshold must be met, and for the ambiguous middle band, a human must confirm. That is the separation the phrase captures.

Does This System Have A Vanishing Gradient Problem?

No - this system has no gradients at all. The vanishing gradient problem is a training-time pathology in deep neural networks. It occurs when backpropagation computes gradients that shrink exponentially as they flow backward through many layers, causing early layers to receive near-zero gradient updates and stop learning. It has nothing to do with this project.

This system does not train any neural network. There is no backpropagation, no loss function, no parameter update step, and no learning loop. The components of this system are:

Hash-based embedding - a deterministic mathematical function that converts text into a vector using SHA-256 hash arithmetic. No weights, no training, no gradients.
Cosine similarity - a geometric dot product between two unit vectors. No gradients.
String matching - difflib.SequenceMatcher. No gradients.
Threshold comparisons - if/else logic on scalar scores. No gradients.
Neo4j Cypher writes - database operations. No gradients.

The only place "gradient" appears is in the math of cosine similarity, which requires the vectors to be normalized. That is not a training operation - it is a normalization step on fixed vectors.

If you swap the HashEmbeddingService for a pre-trained sentence transformer model like all-MiniLM-L6-v2, the inference call to that model also has no gradients (inference mode, not training mode). The pre-trained model has already solved its vanishing gradient problem during its own training process before it was packaged. You use it as a frozen function, not a trainable layer.

Note

The vanishing gradient problem would only be relevant if you were trying to fine-tune the embedding model end-to-end with a signal derived from the resolution decisions. That is a valid research direction (training an embedding model to optimize entity resolution quality), but it is not what this prototype does. Here the embedding model is fixed and the resolution logic is rule-based.

What Does A Signal Look Like Concretely?

Here is the complete data flow for a single resolution decision, showing every intermediate value:

Input: New candidate extracted from text - "Anthropic AI", type Organization

Step 1 - Retrieve existing candidates from Neo4j:

existing_entities = [
    ExistingEntity(
        id="entity:anthropic",
        name="Anthropic",
        aliases=["Anthropic PBC"],
        embedding=[0.031, -0.012, 0.044, ...]  # 256 floats
    )
]

Step 2 - Embed the candidate:

candidate_embedding = hash_embed("Organization Anthropic AI an AI safety company")
# Returns a list of 256 floats, e.g.:
# [0.028, -0.009, 0.041, 0.003, -0.017, ...]

The embedding vector is not a matrix. It is a one-dimensional list of 256 floating-point numbers. Each number is derived deterministically from the SHA-256 hash of the input text - not from any learned weights. Two texts that share character patterns will tend to share some hash-derived float values, which is why the similarity measure has any signal at all. It is a crude approximation of semantic similarity, not a learned representation.

Step 3 - Compute three scalar signals:

exact  = 1.0   # "anthropic ai" matches alias "Anthropic AI" case-insensitively? Yes -> 1.0
                # If no match -> 0.0

fuzzy  = SequenceMatcher(None, "anthropic ai", "anthropic").ratio()
       = 0.857  # ratio of matching characters to total characters

semantic = cosine_similarity(candidate_embedding, existing_embedding)
         = sum(a * b for a, b in zip(cand_emb, exist_emb))  # dot product of unit vectors
         = 0.91  # a scalar, not a matrix or vector

Step 4 - Combine into one score:

score = max(exact, 0.45 * fuzzy + 0.55 * semantic)
      = max(1.0,   0.45 * 0.857 + 0.55 * 0.91)
      = max(1.0,   0.386 + 0.501)
      = max(1.0,   0.887)
      = 1.0

Step 5 - Apply decision gate:

# score 1.0 >= AUTO_MERGE_THRESHOLD 0.95
decision = ResolutionDecision(
    action="merge",
    confidence=1.0,
    matched_entity_id="entity:anthropic",
    matched_name="Anthropic",
    reason="exact=1.00, fuzzy=0.86, semantic=0.91"
)

The signal 1.0 told the gate to merge. The graph now absorbs "Anthropic AI" as a new alias on the existing Anthropic node. No new node is created. Identity is preserved.

Tip

The reason string in every ResolutionDecision is the human-readable version of these three scalar signals. When you see exact=0.00, fuzzy=0.91, semantic=0.87 in the API response, you are reading the three numbers described above. They are not hidden inside a black-box model - they are explicit, inspectable, and auditable.

#	Decision Area	Chosen Approach	Typical Alternative	Why This Helps
1	Primary memory store	Neo4j graph with vector indexes	Standalone vector database	Keeps identity, relationships, and embeddings on the same node set
2	Entity identity	Explicit node identity with merge gate	Similarity-only matching	Reduces accidental collapse of near matches into one memory
3	Reasoning retention	Reasoning traces stored in the graph	Prompt-only transient chain of thought	Preserves provenance about how context was assembled
4	Retrieval model	Hybrid graph plus vector retrieval	Top-k embedding recall only	Combines semantic recall with neighborhood expansion and provenance
5	Dedup behavior	Merge, pending review, or create	Always merge when similar enough	Adds a safety band for ambiguous cases that need human judgment

Note

The conservative deduplication gate is the key architectural difference from simpler systems. It exists because memory systems often fail gradually through incorrect merges, and those errors are harder to recover from than missed links. A missed link means one extra node; a wrong merge means corrupted identity forever.

What The System Does

At a high level, the service ingests text, extracts candidate entities and relationships from it, resolves those candidates against existing graph nodes using a layered scoring approach, stores the message in short-term memory, stores entities in long-term memory with their embeddings and relationships, and then retrieves context by mixing semantic search with graph traversal. During chat requests, it also records a reasoning trace that points back to the message and the entities that were touched during retrieval.

This is deliberately different from a typical RAG pipeline. In a typical RAG system, text is chunked, embedded, stored, and retrieved by similarity. There is no entity extraction, no identity resolution, no relationship storage, and no provenance. This system adds all four layers on top of the embedding foundation.

#	Capability	What It Does	Why It Is Needed
1	Document ingest	Stores a message, extracts entities, resolves duplicates, and writes relationships	Turns raw notes into structured graph memory with stable identity
2	Chat context retrieval	Stores a user message and returns message hits, entity hits, neighbors, and related reasoning	Lets a downstream assistant retrieve grounded, multi-tier context
3	Duplicate review	Confirms or rejects pending SAME_AS links	Provides a human checkpoint for ambiguous identity cases before they corrupt memory
4	Health reporting	Checks whether Neo4j is reachable and the service is running	Separates service availability from storage connectivity for clean monitoring
5	Graph statistics	Returns counts for conversations, messages, entities, traces, and pending duplicates	Gives a fast operational snapshot of memory growth and pending review items

Tip

The pending duplicates count in the stats response is a useful health indicator. If it grows without bound, it means the extraction pipeline is generating many ambiguous candidates that are not being reviewed. That warrants tuning the resolution thresholds or improving the reviewer workflow.

Tech Stack And Why It Was Chosen

The technology choices in this repository are intentional and each one serves the architecture rather than being chosen for familiarity or popularity. Understanding why each piece is here helps you know what to replace if your requirements differ.

FastAPI was chosen because it provides typed request and response models via Pydantic, generates interactive OpenAPI documentation automatically, handles dependency injection cleanly, and uses Python type hints throughout which makes the codebase easier to read. The alternative would be Flask or Django, but neither provides the same out-of-the-box Pydantic integration.

Neo4j 5.x was chosen because it is the only major graph database that natively supports both labeled property graphs with typed edges and vector similarity indexes on the same nodes. This eliminates the need for a separate vector store while keeping the graph structure. Earlier versions of Neo4j required a plugin for vector search; version 5 makes it a first-class feature.

Local deterministic hash embeddings were chosen to remove all external dependencies and make tests reproducible. A hash-based embedding function always produces the same vector for the same input string, which means tests can assert on exact embedding values and similarity scores without mocking an API. The tradeoff is lower semantic quality compared to a trained model like text-embedding-3-small, but this can be swapped behind the same service interface.

Pydantic v2 was chosen for all schema definitions because it provides fast validation, clear error messages, and automatic JSON serialization. Every request and response shape in the API is a Pydantic model, which means the contract is self-documenting and the serialization is handled automatically.

#	Layer	Technology	Why It Was Chosen	Practical Consequence
1	API layer	FastAPI	Typed request models, automatic OpenAPI docs, simple dependency wiring	Interactive docs at /docs with no extra boilerplate
2	Data validation	Pydantic v2	Keeps API contracts explicit, fast validation, automatic JSON handling	Request and response shapes are self-documenting and testable
3	Graph store	Neo4j 5.x	Supports graph traversal and vector index queries in the same database	No separate vector service needed; identity and embeddings coexist
4	Embedding service	Local deterministic hash embedding	Removes external API cost and makes tests deterministic	Semantic quality is lower than learned models but reproducibility is high
5	Extraction strategy	Regex and heuristic extraction	Keeps the prototype easy to run and inspect without NLP dependencies	Coverage is limited - a scaffold for replacing with spaCy or a fine-tuned model
6	Test stack	Pytest plus FastAPI TestClient	Supports narrow deterministic tests without requiring a live Neo4j	Core behavior can be validated locally before doing full end-to-end runs
7	Container runtime	Docker Compose	Provides a one-command local Neo4j instance with predictable configuration	No manual Neo4j installation needed to start working

Note

The local hash embedding is the most significant quality compromise in the current implementation. Hash-based embeddings map text to vectors using character n-gram frequencies rather than learned semantic representations. They capture some surface similarity but miss conceptual relationships. Replacing this with a real embedding model is the highest-leverage improvement you can make to this system.

Architecture Overview

The architecture is organized around a clear layered boundary. Requests enter through FastAPI routes, which delegate all business logic to MemoryService. The service layer coordinates four subordinate services: embedding, extraction, resolution, and the graph repository. The repository layer owns all Cypher queries and Neo4j interactions. Nothing outside the repository layer touches the database directly.

This separation matters because it makes each layer independently testable. The resolution service can be tested with mock entities and mock embeddings. The extraction service can be tested on raw strings. The API routes can be tested with a mock memory service. The only layer that requires a real Neo4j is the repository.

If Mermaid does not render in your viewer, the static fallback image shows the same control flow. The architecture reference in docs/architecture.md reuses the same SVG.

flowchart TD
    subgraph Client["Client Layer"]
        U([HTTP Client])
    end
    subgraph API["FastAPI Layer"]
        R[Routes - api.py]
    end
    subgraph Services["Service Layer"]
        MS[MemoryService]
        ES[ExtractionService]
        RS[ResolutionService]
        HS[HashEmbeddingService]
    end
    subgraph Persistence["Persistence Layer"]
        GR[GraphRepository]
        N4[(Neo4j 5.x)]
    end
    subgraph Memory["Memory Tiers in Neo4j"]
        ST[Short-term - Conversations and Messages]
        LT[Long-term - Entity nodes]
        RM[Reasoning - ReasoningTrace and ReasoningStep]
    end

    U --> R
    R --> MS
    MS --> ES
    MS --> RS
    MS --> HS
    MS --> GR
    GR --> N4
    N4 --> ST
    N4 --> LT
    N4 --> RM

Note

The MemoryService is the only class that touches all four subordinate services. No route handler calls the graph repository directly. This means the API layer has zero knowledge of Cypher, embeddings, or resolution logic - it only speaks request and response models.

The sequence diagram below shows the complete ingest flow end to end, including the per-entity resolution loop that is the heart of the system.

sequenceDiagram
    participant U as User
    participant A as FastAPI
    participant M as MemoryService
    participant X as ExtractionService
    participant R as ResolutionService
    participant H as HashEmbeddingService
    participant G as GraphRepository
    participant N as Neo4j

    U->>A: POST /api/documents
    A->>M: ingest_document(request)
    M->>G: ensure_schema()
    H-->>M: embed(content)
    M->>G: create_message(session_id, content, embedding)
    M->>X: extract(content)
    X-->>M: entities[], relations[]
    loop for each extracted entity
        M->>G: find_existing_entities(entity_type)
        G-->>M: existing_entities[]
        M->>R: decide(candidate, existing_entities)
        R->>H: embed(candidate description)
        H-->>R: candidate_embedding
        R-->>M: ResolutionDecision(action, confidence, reason)
        alt action == merge
            M->>G: merge_entity(matched_id, candidate_payload)
        else action == pending
            M->>G: create_entity(candidate_payload, embedding)
            M->>G: create_pending_same_as(new_id, matched_id, confidence)
        else action == create
            M->>G: create_entity(candidate_payload, embedding)
        end
    end
    M->>G: connect_message_mentions(message_id, entity_ids)
    M->>G: connect_entities(relations, resolved_ids)
    G->>N: persist all graph changes
    M-->>A: IngestResult
    A-->>U: 200 OK with JSON

Important

The loop in the sequence diagram above runs once per extracted entity. If a document mentions ten entities, the system makes ten resolution decisions. For each one it queries Neo4j for same-type existing entities, computes scores, and decides whether to merge, flag as pending, or create new. This is intentionally synchronous and conservative.

How Does Ingest Work Step By Step?

This section explains the ingest pipeline in detail because it is the most complex flow in the system and understanding it is necessary to evaluate whether the architecture is right for your use case.

Step 1 - Schema setup. Before any write, the service calls ensure_schema() which creates the necessary Neo4j constraints and vector indexes if they do not exist. This is idempotent - it is safe to call every time because Neo4j ignores index and constraint creation if they already exist.

Step 2 - Message embedding and storage. The raw content text is embedded using the hash embedding service to produce a 256-dimensional vector. The message is then stored as a Message node attached to the Conversation node for this session, with the embedding stored as a node property and indexed in the message_embedding_index. This makes the message retrievable by semantic similarity in future chat calls.

Step 3 - Entity and relation extraction. The extraction service runs regex and heuristic patterns over the content to produce a list of EntityCandidate objects (each with a name, type, aliases, and description) and a list of RelationCandidate objects (each with a subject name, predicate, and object name).

Step 4 - Entity resolution loop. For each extracted entity, the system queries Neo4j for all existing entities of the same top-level type. It then runs the resolution logic: exact name match, fuzzy string match, and semantic embedding similarity. The three signals are combined into a single score. Depending on the score and configured thresholds, the entity is merged into an existing node, created as a new node with a pending SAME_AS edge to the closest match, or created as a new isolated node.

Step 5 - Graph connection. After all entities are resolved, the service connects the message node to each entity node with MENTIONS edges, and connects entity pairs from the extracted relations with RELATED_TO edges labeled with the extracted predicate.

Note

The resolution decision is the only step in the pipeline that has side effects beyond the current request. A merge permanently modifies an existing entity node. A pending decision creates a new SAME_AS edge that will remain in the graph until a reviewer acts on it. Create decisions are the safest because they add a new node without touching anything existing.

flowchart LR
    A([Raw text content]) --> B[HashEmbeddingService]
    B --> C[Message stored in graph]
    A --> D[ExtractionService]
    D --> E{Entity candidates}
    D --> F{Relation candidates}
    E --> G[Resolution loop per entity]
    G --> H{Score vs thresholds}
    H -->|score >= 0.95| I[Merge into existing node]
    H -->|0.85 to 0.95| J[Create new + pending SAME_AS]
    H -->|score < 0.85| K[Create new isolated node]
    I --> L[connect_message_mentions]
    J --> L
    K --> L
    F --> M[connect_entities with RELATED_TO]
    L --> N([IngestResult returned])
    M --> N

Tip

If you want to understand what the resolution service is actually doing on a given ingest, look at the resolutions array in the IngestResult response. Each entry shows the action taken, the confidence score, and the human-readable reason string that explains the exact score breakdown (e.g., exact=0.00, fuzzy=0.91, semantic=0.87).

Memory Tiers

The system deliberately separates storage into three distinct tiers rather than mixing everything into one undifferentiated index. This separation is essential for keeping different kinds of memory from contaminating each other. A message in a conversation from three weeks ago should not be treated the same as a stable fact about a well-known entity. A reasoning trace from a previous retrieval session should not be confused with an entity relationship.

Short-term memory holds Conversation and Message nodes. A conversation groups messages from the same session. Messages are linked in sequence with NEXT edges and each message has an embedding stored for vector retrieval. This tier is temporal and session-scoped.

Long-term memory holds Entity nodes with POLE+O type labels (Person, Object, Location, Event, Organization). These nodes accumulate canonical names, aliases, descriptions, embeddings, and typed edges. This tier persists across sessions and is the main knowledge base of the system.

Reasoning memory holds ReasoningTrace and ReasoningStep nodes. When a chat request triggers a retrieval query, the system records which message initiated the trace, which steps were taken, and which entities were touched. This makes retrieval behavior inspectable and auditable over time.

#	Tier	Main Node Types	Edges Used	Lifetime	Why It Exists As A Separate Tier
1	Short-term	Conversation, Message	HAS_MESSAGE, NEXT, MENTIONS	Session-scoped, grows with each ingest	Conversational flow is temporal and should not pollute entity identity
2	Long-term	Entity with POLE+O labels	RELATED_TO, SAME_AS	Persistent across all sessions	Entity knowledge must be stable and addressable by canonical name
3	Reasoning	ReasoningTrace, ReasoningStep	INITIATED_BY, HAS_STEP, TOUCHED	Permanent audit log	Provenance about retrieval behavior must be separate from entity state

graph TD
    subgraph ShortTerm["Short-term Memory"]
        CV[Conversation]
        M1[Message 1]
        M2[Message 2]
        M3[Message 3]
    end
    subgraph LongTerm["Long-term Memory"]
        EP[Entity: Person]
        EO[Entity: Object]
        EG[Entity: Organization]
    end
    subgraph Reasoning["Reasoning Memory"]
        RT[ReasoningTrace]
        RS1[ReasoningStep]
        RS2[ReasoningStep]
    end

    CV -->|HAS_MESSAGE| M1
    M1 -->|NEXT| M2
    M2 -->|NEXT| M3
    M2 -->|MENTIONS| EP
    M2 -->|MENTIONS| EO
    EP -->|RELATED_TO| EG
    EO -->|RELATED_TO| EG
    RT -->|INITIATED_BY| M3
    RT -->|HAS_STEP| RS1
    RT -->|HAS_STEP| RS2
    RT -->|TOUCHED| EP
    RT -->|TOUCHED| EO

Note

POLE+O stands for Person, Object, Location, Event, Organization. This is a coarse ontology that covers the most common entity categories in natural language text. It is used as the top-level type filter during resolution: the system only compares an entity candidate against existing entities of the same POLE+O type. This prevents a person named "London" from being matched against the city "London."

Identity Resolution Strategy

Identity resolution is the central algorithmic challenge in this system. The question being answered for each extracted entity is: "Does this candidate refer to something that already exists in our graph, or is it something new?" Getting this wrong in either direction has costs. A false merge corrupts existing entity data. A false split creates a duplicate node that fragments knowledge.

The system addresses this by layering three signals rather than relying on any one of them alone. Exact string matching catches the obvious cases. Fuzzy string matching catches spelling variants and minor formatting differences. Semantic cosine similarity catches conceptual relatedness when surface forms differ substantially. The three signals are combined with a weighted formula and the result is mapped to one of three outcomes.

Signal 1 - Exact match compares the candidate name and all its aliases against the existing entity's name and aliases, case-insensitively. An exact match produces a score of 1.0 and immediately triggers the merge check.

Signal 2 - Fuzzy match uses Python's difflib.SequenceMatcher to compute a ratio between the candidate name and the existing entity's name and aliases. This handles cases like "GPT4" vs "GPT-4" or "Sam Altman" vs "Samuel Altman."

Signal 3 - Semantic match embeds the candidate using the description and type context (not just the name) and computes cosine similarity against the existing entity's embedding. This captures cases where surface names differ substantially but the contextual descriptions are similar.

#	Signal	Implementation	Captures	Weight In Formula	Why This Weight
1	Exact match	Case-insensitive name and alias comparison	Literal identity agreement	Overrides others if 1.0	Exact matches should always merge regardless of other signals
2	Fuzzy match	difflib.SequenceMatcher ratio	Surface-form similarity and typos	0.45	Less reliable than semantics for capturing meaning differences
3	Semantic match	Cosine similarity over hash embeddings	Contextual resemblance	0.55	More reliable for catching entity equivalence across different phrasings
4	Type filter	Same entity_type only	Coarse ontology guardrail	N/A - acts as gate before scoring	Prevents cross-category false matches entirely

The scoring formula for the best non-exact candidate is:

$$ score = \max\left(exact,\ 0.45 \times fuzzy + 0.55 \times semantic\right) $$

Cosine similarity for two unit vectors $u$ and $v$ with dimension $n$:

$$ \mathrm{cosine}(u, v) = \sum_{i=1}^{n} u_i v_i $$

The decision thresholds are:

$$ \text{merge if } score \ge 0.95 $$

$$ \text{pending review if } 0.85 \le score < 0.95 $$

$$ \text{create new if } score < 0.85 $$

#	Score Band	Graph Action	Human Review Required	Why This Band Exists
1	score >= 0.95	Merge into canonical node, absorb aliases	No - automatic	Only very strong evidence should auto-collapse identity
2	0.85 to 0.95	Create new node, add pending SAME_AS edge	Yes - via /api/duplicates/review	Ambiguous cases need a human decision rather than silent auto-merge
3	score < 0.85	Create new isolated entity node	No - treated as new	Weak evidence should not rewrite existing identity

Important

The source field on DocumentIngestRequest is accepted at the API boundary but is not yet stored in the graph in this prototype. This is a known gap documented in the constraints section. The graph writes use session_id and content but do not attach a source provenance label to the message node. [!WARNING] Lowering AUTO_MERGE_THRESHOLD below 0.90 significantly increases the risk of incorrect merges. The 0.95 default was chosen because hash-based embeddings have lower semantic fidelity than learned models, so a higher bar compensates for noisy similarity scores. If you replace the embedding service with a stronger model, you may be able to safely lower this threshold.

Retrieval Strategy

The retrieval strategy is designed to return multiple types of context in a single response rather than a flat list of similar chunks. This is what makes graph-backed memory qualitatively different from vector-only retrieval.

When a chat request arrives, the system does four things in sequence. First, it stores the current message as a Message node with its embedding, so future queries will find it. Second, it runs a vector similarity search over all messages in the current session to find the most relevant prior messages. Third, it runs a vector similarity search over all entity nodes globally to find the most relevant long-term knowledge. Fourth, it traverses the RELATED_TO edges from each returned entity to pull in graph neighbors, then looks up any ReasoningTrace nodes that previously touched the returned entities.

The result is a ContextResponse with four fields: message_hits (semantically relevant messages), entities (relevant entity nodes with similarity scores), related_names (graph neighbors of the hit entities), and reasoning (prior retrieval queries that touched these entities).

#	Retrieval Step	Index Used	What It Returns	Why It Is Included
1	Message vector search	message_embedding_index	Relevant messages from the same session	Session history gives temporal context for the current question
2	Entity vector search	entity_embedding_index	Entity nodes with similarity scores	Long-term knowledge about named entities crosses session boundaries
3	Neighbor expansion	RELATED_TO graph traversal	Entities one hop away from the hit set	Adds structural context; not just isolated hits but their connections
4	Reasoning trace lookup	ReasoningTrace to Entity links	Prior retrieval queries that touched similar entities	Provides provenance and shows how the agent has previously used this knowledge

Note

The neighbor expansion step is what distinguishes hybrid graph-vector retrieval from pure vector retrieval. If you ask "what do we know about Anthropic?", pure vector search returns nodes similar to "Anthropic". Neighbor expansion additionally returns Claude, Claude Code, and any other entities connected to Anthropic by RELATED_TO edges - even if those entities were not semantically close to your query.

flowchart LR
    Q([Chat request: message + session_id]) --> A[Embed message with HashEmbeddingService]
    A --> B[Store as Message node]
    A --> C[Vector search: message_embedding_index]
    A --> D[Vector search: entity_embedding_index]
    C --> E[message_hits - top similar messages this session]
    D --> F[entity_hits - top similar entities globally]
    F --> G[Graph traversal: RELATED_TO neighbors]
    G --> H[related_names - neighbor entity names]
    F --> I[Trace lookup: ReasoningTrace TOUCHED entity]
    I --> J[reasoning - prior queries touching these entities]
    E --> K([ContextResponse assembled])
    H --> K
    J --> K

Repository Structure

The repository layout mirrors the conceptual architecture. Code is not grouped by file type but by responsibility. This makes it easier to find the code responsible for a specific behavior and to replace individual layers without touching others.

#	Path	Responsibility	What Lives Here
1	app/main.py	Application assembly and dependency wiring	FastAPI app creation, service instantiation, config loading
2	app/routes/api.py	HTTP route definitions	All endpoint handlers, request validation, error handling
3	app/services/memory.py	Ingest and chat orchestration	MemoryService coordinating all subordinate services
4	app/services/embedding.py	Vector generation and similarity	HashEmbeddingService with cosine similarity
5	app/services/extraction.py	Entity and relation extraction	ExtractionService with regex and heuristic patterns
6	app/services/resolution.py	Duplicate detection and merge decisions	ResolutionService with multi-signal scoring
7	app/repositories/graph.py	All Neo4j interactions	GraphRepository with schema, writes, and retrieval Cypher
8	app/models/schemas.py	Pydantic data models	All request, response, and internal data schemas
9	tests/	Unit and integration tests	Deterministic tests without Neo4j; API client tests with mocks
10	docs/	Architecture reference	Companion architecture.md and static SVG diagram

The same component boundaries are summarized in docs/architecture.md.

Quick Start

The setup favors local reproducibility over cloud dependency. Docker Compose provides Neo4j with a single command, and the application itself only needs a Python virtual environment. The entire local stack requires no cloud accounts, no API keys, and no paid services.

Step 1 - Start Neo4j

docker compose up -d

This starts neo4j:5.26 in a container, exposes the Bolt interface on port 7687 and the browser UI on port 7474. Neo4j will take a few seconds to initialize. You can check readiness by visiting http://localhost:7474 in a browser.

Step 2 - Create and activate a virtual environment

python -m venv .venv
source .venv/bin/activate   # Linux / macOS
# .venv\Scripts\activate    # Windows

Step 3 - Install the package and development dependencies

pip install -e .[dev]

The -e flag installs in editable mode so you can edit source files and see changes without reinstalling. The [dev] extras include pytest and the test dependencies.

Step 4 - Run the API server

uvicorn app.main:app --reload

The --reload flag watches for source file changes and restarts the server automatically. This is useful during development but should not be used in production.

Step 5 - Open the interactive docs

http://127.0.0.1:8000/docs

FastAPI generates a full interactive Swagger UI from the Pydantic models. You can send requests directly from the browser without needing curl or a separate API client.

#	Runtime Dependency	Required Version	Why It Is Needed	Default Source
1	Python	3.12 or newer	Project metadata and type annotations require 3.12+	System Python or pyenv
2	Docker or Podman	Any current version	Runs Neo4j locally with configured ports	docker-compose.yml in repo root
3	Neo4j	5.x (via Docker)	Graph persistence and vector index support	neo4j:5.26 from docker-compose.yml
4	Virtual environment	Any	Isolates app and test dependencies from system Python	Local .venv directory

Tip

If the API starts but /api/health returns a degraded status, check whether Neo4j is reachable at bolt://localhost:7687 and whether the password matches the configured environment variables. The most common cause is the container not yet having finished its startup sequence.

Configuration

The application loads all runtime settings from environment variables. Sensible defaults are provided for every variable so the service runs out of the box in a local development environment without any .env file. The defaults are designed to match the docker-compose.yml configuration in the repository.

#	Variable	Default Value	What It Controls	When To Change It
1	NEO4J_URI	bolt://localhost:7687	Neo4j Bolt connection endpoint	Point to a remote or Docker-networked Neo4j instance
2	NEO4J_USERNAME	neo4j	Database username	Match your hosted Neo4j credentials
3	NEO4J_PASSWORD	change-this-password	Database password	Always change in any non-local environment
4	MEMORY_EMBEDDING_DIMENSIONS	256	Length of generated embedding vectors	Change if swapping in a different embedding model with different dimensions
5	AUTO_MERGE_THRESHOLD	0.95	Minimum score for automatic entity merge	Raise to be more conservative; lower if using a stronger embedding model
6	PENDING_MATCH_THRESHOLD	0.85	Minimum score to flag a match for human review	Lower to increase the number of matches flagged for review

Example local .env file that matches the defaults:

NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=change-this-password
MEMORY_EMBEDDING_DIMENSIONS=256
AUTO_MERGE_THRESHOLD=0.95
PENDING_MATCH_THRESHOLD=0.85

Warning

The default password change-this-password is only safe for local development. If you expose the Neo4j port outside your local machine, change this immediately. Neo4j does not restrict connections by IP by default.

API Reference

The API surface is small and intentional. Each endpoint corresponds to exactly one part of the memory lifecycle. There are no bulk endpoints, no admin endpoints, and no authentication middleware in this prototype.

GET /api/health- Check service and database connectivity

Purpose: Returns the operational status of the API service and whether the Neo4j database is reachable. This endpoint does not require any request body.

Response shape:

{
  "status": "ok",
  "neo4j": "connected"
}

When to use: Call this before running any operations to confirm the service is up and the database connection is healthy. Use it as a readiness probe in deployment environments.

Note: If neo4j returns "disconnected" the API is running but cannot reach the database. Check your NEO4J_URI and confirm the container is healthy.

POST /api/documents- Ingest text and build graph memory

Purpose: Takes a raw text document, extracts entities and relationships, resolves candidates against existing graph nodes, and persists the results. This is the main write operation in the system.

Request body:

{
  "content": "Anthropic developed Claude Code. Claude Code competes with Codex.",
  "source": "example-note",
  "session_id": "demo"
}

Response shape:

{
  "message_id": "9ca7c7b5-8a96-4f81-a5ff-0e1d5b991c2e",
  "entity_count": 3,
  "relation_count": 2,
  "resolutions": [
    {
      "action": "create",
      "confidence": 0.0,
      "matched_entity_id": null,
      "matched_name": null,
      "reason": "No same-type candidates exist yet."
    },
    {
      "action": "pending",
      "confidence": 0.89,
      "matched_entity_id": "entity:claude-code",
      "matched_name": "Claude Code",
      "reason": "exact=0.00, fuzzy=0.91, semantic=0.87"
    }
  ]
}

When to use: Call this whenever new text should be added to the agent's memory. Each call creates a message node, resolves entities, and builds relationships.

Note: The resolutions array tells you exactly what the system decided for each extracted entity and why. Check action: "pending" entries to see which entities need human review.

POST /api/chat- Retrieve hybrid context for a user message

Purpose: Stores the user's message in the graph and returns a multi-tier context response combining message history, entity knowledge, and reasoning provenance.

Request body:

{
  "message": "What do we know about Claude Code?",
  "session_id": "demo"
}

Response shape:

{
  "query": "What do we know about Claude Code?",
  "session_id": "demo",
  "message_hits": [
    "Anthropic developed Claude Code.",
    "Claude Code competes with Codex."
  ],
  "entities": [
    {
      "id": "entity:claude-code",
      "name": "Claude Code",
      "entity_type": "Object",
      "score": 0.97,
      "related_names": ["Anthropic", "Codex"]
    },
    {
      "id": "entity:anthropic",
      "name": "Anthropic",
      "entity_type": "Organization",
      "score": 0.88,
      "related_names": ["Claude Code"]
    }
  ],
  "reasoning": [
    "What do we know about Claude Code?"
  ]
}

When to use: Call this on every user message turn to retrieve relevant context for your LLM. Pass the returned context as part of the system prompt or user context window.

POST /api/duplicates/review- Confirm or reject a pending identity link

Purpose: Resolves a pending SAME_AS edge between two entity nodes. A confirmation merges the entities; a rejection marks the edge as rejected and keeps the entities separate.

Request body:

{
  "left_id": "entity:claude-code-v2",
  "right_id": "entity:claude-code",
  "confirm": true,
  "reviewer": "human-reviewer-id"
}

When to use: Call this when reviewing the pending_duplicates count from the stats endpoint. Each pending duplicate represents an ambiguous identity match that the system flagged for human judgment.

Note: Confirming a duplicate merges the left entity into the right entity and rewrites the SAME_AS edge status to confirmed. Rejecting marks it as rejected and ensures the two entities remain permanently separate, which prevents the resolution service from flagging them as candidates again.

GET /api/stats- Get current graph memory statistics

Purpose: Returns a snapshot of the current graph state including counts for every major node type and the number of pending duplicate reviews.

Response shape:

{
  "conversations": 4,
  "messages": 18,
  "entities": 9,
  "traces": 6,
  "pending_duplicates": 1,
  "checked_at": "2026-06-03T12:00:00Z"
}

When to use: Use this endpoint to monitor memory growth over time and to check whether pending duplicates are accumulating. High pending_duplicates values indicate that the resolution thresholds may need tuning or that more frequent human review is needed.

Tip

The FastAPI-generated docs at http://127.0.0.1:8000/docs give you an interactive way to test every endpoint without curl. Each endpoint shows its full schema, required fields, and example values derived from the Pydantic models.

Generated API Response Examples

These response examples were generated directly from the actual Pydantic models in the codebase. They match the current response shapes exactly rather than being hand-authored approximations.

Health check

{
  "status": "ok",
  "neo4j": "connected"
}

The health response is minimal by design. Its purpose is to give a deployment environment a lightweight reachability check that does not require any graph state.

Stats snapshot

{
  "conversations": 4,
  "messages": 18,
  "entities": 9,
  "traces": 6,
  "pending_duplicates": 1,
  "checked_at": "2026-06-03T12:00:00Z"
}

The stats response reflects the actual growth of the graph. The pending_duplicates field is the most operationally significant: it tells you how many entity pairs are waiting for human review before they can be merged or separated.

Document ingest

{
  "message_id": "9ca7c7b5-8a96-4f81-a5ff-0e1d5b991c2e",
  "entity_count": 3,
  "relation_count": 2,
  "resolutions": [
    {
      "action": "create",
      "confidence": 0.0,
      "matched_entity_id": null,
      "matched_name": null,
      "reason": "No same-type candidates exist yet."
    },
    {
      "action": "pending",
      "confidence": 0.89,
      "matched_entity_id": "entity:claude-code",
      "matched_name": "Claude Code",
      "reason": "exact=0.00, fuzzy=0.91, semantic=0.87"
    }
  ]
}

The ingest response exposes the internal decision logic for every extracted entity. The reason string shows the exact score breakdown so you can see how the combined signal was computed and why the threshold boundary was crossed.

Chat context

{
  "query": "What do we know about Claude Code?",
  "session_id": "demo",
  "message_hits": [
    "Anthropic developed Claude Code.",
    "Claude Code competes with Codex."
  ],
  "entities": [
    {
      "id": "entity:claude-code",
      "name": "Claude Code",
      "entity_type": "Object",
      "score": 0.97,
      "related_names": ["Anthropic", "Codex"]
    },
    {
      "id": "entity:anthropic",
      "name": "Anthropic",
      "entity_type": "Organization",
      "score": 0.88,
      "related_names": ["Claude Code"]
    }
  ],
  "reasoning": [
    "What do we know about Claude Code?"
  ]
}

The chat response combines four retrieval signals into one payload. A downstream LLM can use message_hits for conversational context, entities for structured facts, related_names for graph neighborhood context, and reasoning for provenance about how this context was previously used.

Example Workflows

Workflow 1: Ingesting a note with multiple entities

You post a document: "Anthropic developed Claude Code. Claude Code competes with Codex." The system stores the message, extracts three entities (Anthropic, Claude Code, Codex) and two relations (Anthropic developed Claude Code; Claude Code competes with Codex). If no entities of these types exist yet, all three are created as new isolated nodes with embeddings. Two RELATED_TO edges are created between them. The response shows three action: "create" resolution decisions.

The next time you post "OpenAI made Codex, a code completion tool", the extraction service finds Codex and OpenAI. The resolution service compares Codex against existing entities of type Object and finds the existing Codex node. If the similarity score is above 0.95, it merges. If between 0.85 and 0.95, it creates a pending review. The response shows one resolution decision for each entity with the full score breakdown.

Workflow 2: Asking a question across sessions

A user in session demo asks "What do we know about Claude Code?" The system stores this message, embeds it, and runs four retrieval operations. The message vector search finds the two previously ingested messages about Claude Code with high similarity scores. The entity vector search finds the Claude Code entity node and the Anthropic entity node. Neighbor expansion adds Codex as a related entity. The reasoning trace lookup finds any prior queries that touched these entities. All four sets of results are assembled into a single ContextResponse.

A downstream LLM receives this structured context and can ground its answer in it without needing to re-derive the relationships from scratch.

Workflow 3: Handling a duplicate detection

The system ingests "Claude Code by Anthropic" and "Claude Code - the AI coding assistant". Both mentions extract to an Object entity named "Claude Code". The first creates a new node. The second finds the existing node and scores it. If the score is 0.89, it creates a pending SAME_AS edge and returns action: "pending" in the resolution array.

A human reviewer calls GET /api/stats, sees pending_duplicates: 1, calls POST /api/duplicates/review with confirm: true, and the entities are merged. The SAME_AS edge is updated to confirmed. Future extractions of "Claude Code" will merge into the canonical node automatically if the score is above the auto-merge threshold.

#	Workflow	Services Involved	Graph Changes	Key Output Field
1	First-time ingest of new entities	MemoryService, ExtractionService, ResolutionService, GraphRepository	Message node, N entity nodes, M RELATED_TO edges	resolutions[*].action == "create"
2	Re-ingest with merge decisions	Same plus HashEmbeddingService for scoring	Existing entity updated or new node plus pending SAME_AS	resolutions[*].action in merge, pending
3	Chat retrieval across memory tiers	MemoryService, HashEmbeddingService, GraphRepository	New Message node, new ReasoningTrace node	message_hits, entities, reasoning
4	Human duplicate review	GraphRepository only	SAME_AS edge status updated, entity merged or separated	HTTP 200 on success

Testing And Validation

The test suite is designed to validate deterministic behavior without requiring a running Neo4j instance. This is possible because the embedding service is hash-based (deterministic), the extraction service is regex-based (deterministic), and the API routes are tested with a mock MemoryService that returns predictable responses.

The five test cases cover the core invariants of the system: embedding stability, similarity ordering, extraction correctness, API health shape, and API stats shape. These tests are not comprehensive integration tests; they are narrow unit tests that confirm the most important behaviors have not regressed.

#	Test File	Test Name	What It Asserts	Why This Assertion Matters
1	test_embedding.py	test_embed_is_deterministic	Same text always produces the same vector	Resolution scoring depends on stable embeddings
2	test_embedding.py	test_similarity_prefers_related	Related text scores higher similarity than unrelated text	Validates that the hash embedding captures at least surface similarity
3	test_extraction.py	test_extraction_finds_expected	Known entity names and relation patterns are detected	Confirms the extraction pipeline produces usable candidates for the resolver
4	test_api.py	test_health_endpoint	Health route returns correct structure with mocked repository	API contract is stable regardless of database state
5	test_api.py	test_stats_endpoint	Stats route returns expected fields with zero counts	Operational reporting shape is stable and testable without a graph

Run all tests:

source .venv/bin/activate
pytest

Run with verbose output to see each test name:

pytest -v

Note

The unit tests do not require a running Neo4j instance. End-to-end validation of graph writes, schema creation, vector index queries, and the full ingest-to-retrieval pipeline still requires a live Neo4j 5.x environment started with docker compose up -d. [!TIP] To observe the full memory lifecycle end to end, start the API with uvicorn app.main:app --reload, post a few documents to /api/documents, then call /api/chat with a related question. The response will show message hits, entity hits, and reasoning traces that grew from the ingested documents. You can also open the Neo4j browser at http://localhost:7474 and run MATCH (n) RETURN n LIMIT 50 to see the graph structure visually.

Current Constraints And Tradeoffs

Honest documentation explains limitations, not only strengths. The following constraints are known, intentional, and represent scaffolding choices rather than fundamental architectural limits. Each one can be addressed independently without restructuring the whole system.

#	Constraint	Current Behavior	Why It Was Acceptable	Recommended Upgrade Path
1	Embedding quality	Uses deterministic hash embeddings with 256 dimensions	Removes external dependencies and makes tests fully deterministic	Swap HashEmbeddingService for OpenAI, Cohere, or sentence-transformers behind the same interface
2	Entity extraction coverage	Regex and heuristic patterns only	Makes the pipeline transparent and runnable without NLP libraries	Replace ExtractionService with spaCy NER or a fine-tuned token classifier
3	Ontology depth	POLE+O top-level types only	Keeps resolution comparison simple and type-safe at a coarse level	Add sub-types, taxonomies, and richer relation predicates
4	Duplicate review merge logic	Merges name and aliases but does not rewrite incoming edges	Enough to demonstrate the review pathway without complex migration logic	Add full edge canonicalization - all edges pointing to right entity should point to merged canonical
5	Source field persistence	Accepted on request but not written to graph	Prototype focused on identity and retrieval, not provenance labeling	Add source as a property on Message nodes or as a dedicated Evidence edge
6	Authentication	No auth middleware on any endpoint	Local prototype with no multi-user requirement	Add OAuth2 or API key middleware before any deployment beyond localhost
7	Concurrent write safety	No locking or transaction management on the resolution loop	Single-user prototype with no concurrent write scenarios tested	Add Neo4j transaction management with retry on lock contention

Tip

If you want to evolve this prototype, the highest-leverage improvements in order are: (1) swap in a real embedding model to dramatically improve resolution quality, (2) add a proper NER pipeline for better extraction coverage, (3) add full edge rewriting on duplicate confirmation. These three changes improve the quality of all three memory tiers without changing the graph shape or API contract.

When Should You Use This vs Something Else?

This is the question the README needs to answer honestly. The system is well-suited for a specific profile of use case and poorly suited for others. Rather than claiming universal optimality, the goal here is to map the system to the problem shapes it was designed for.

Use this system when:

You are building an agent that accumulates knowledge over many sessions and needs to recall entities by name across those sessions
You expect the same entity to appear under different names, abbreviations, or descriptions across documents
Incorrect identity merges are costly and you want a human review step before ambiguous cases are resolved
You want to know not just what context is retrieved but which prior reasoning sessions touched it (provenance)
You want graph structure around retrieved entities, not just the entities themselves

Do not use this system when:

You are building a simple document search pipeline over a static corpus with no entity identity concerns
You need the highest possible semantic recall quality and can accept an external embedding API dependency
You cannot operate Neo4j and prefer a fully managed single-service solution
Your entities are well-defined, stable, and enumerated in a fixed schema that a relational database handles cleanly
You need sub-100ms retrieval at scale; this prototype is not optimized for latency

#	Use Case	Fit	Reasoning
1	Personal AI assistant with growing knowledge base	Excellent	Entity drift over time is the exact problem this architecture addresses
2	Research assistant tracking people and organizations	Very good	Same entities appear under many names; provenance is valuable
3	RAG over static PDF document collection	Poor	No identity management needed; a flat vector store is faster and simpler
4	Customer support bot with product catalog	Moderate	Better served by a relational DB plus vector search unless product entities are ambiguous
5	Multi-agent collaboration memory layer	Good candidate	Shared graph identity prevents different agents from creating conflicting entity records
6	Real-time chat with ephemeral context	Poor	In-context window is sufficient; graph persistence overhead is wasted

Note

This system is not competing with Pinecone or Chroma for semantic retrieval benchmarks. It is competing with the problem of memory correctness in agents that accumulate knowledge over time. The right comparison is not "which system returns the most similar chunks?" but "which system maintains the most accurate identity model as an agent learns more about the world?"

Summary

The central thesis of this repository is straightforward: vector similarity helps memory retrieval, but identity should be managed explicitly. That single principle explains every design decision in the codebase. The graph exists to own identity. The vector indexes exist to power retrieval. The resolution gate exists to protect identity from similarity's overreach. The memory tiers exist to keep temporal, durable, and provenance data separate. The reasoning traces exist so retrieval behavior is auditable.

This is not the right architecture for every problem. If you need pure semantic recall over a large static corpus, use a dedicated vector store with a strong embedding model. If you need strict relational constraints, use a relational database. If you need a formal ontology with SPARQL queries, use a dedicated knowledge graph stack.

This architecture earns its complexity when you have an agent that learns over time, when the same entities appear under different names, when identity mistakes compound into reasoning errors, and when you need to explain why certain context was retrieved. In those situations, the conservative merge gate, the hybrid retrieval strategy, and the three memory tiers each contribute something that simpler approaches cannot provide.

Note

For the full architecture reference including component boundaries, Cypher patterns, and synchronization notes, see docs/architecture.md.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.vscode		.vscode
app		app
docs		docs
tests		tests
vector_index_graph_memory.egg-info		vector_index_graph_memory.egg-info
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Vector Index Graph Memory

Table of Contents

What Is A Vector Index?

What Is Graph Memory?

How Does This Compare To Other Approaches?

Is This Optimal - And For What Situations?

Why This Project Exists

What Does "Identity Lives In The Graph" Mean?

What Does "Similarity Remains A Signal" Mean?

Does This System Have A Vanishing Gradient Problem?

What Does A Signal Look Like Concretely?

What The System Does

Tech Stack And Why It Was Chosen

Architecture Overview

How Does Ingest Work Step By Step?

Memory Tiers

Identity Resolution Strategy

Retrieval Strategy

Repository Structure

Quick Start

Step 1 - Start Neo4j

Step 2 - Create and activate a virtual environment

Step 3 - Install the package and development dependencies

Step 4 - Run the API server

Step 5 - Open the interactive docs

Configuration

API Reference

GET /api/health- Check service and database connectivity

POST /api/documents- Ingest text and build graph memory

POST /api/chat- Retrieve hybrid context for a user message

POST /api/duplicates/review- Confirm or reject a pending identity link

GET /api/stats- Get current graph memory statistics

Generated API Response Examples

Health check

Stats snapshot

Document ingest

Chat context

Example Workflows

Workflow 1: Ingesting a note with multiple entities

Workflow 2: Asking a question across sessions

Workflow 3: Handling a duplicate detection

Testing And Validation

Current Constraints And Tradeoffs

When Should You Use This vs Something Else?

Summary

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages