Skip to content

Inferensys/ai-memory-architecture

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ai-memory-architecture

Memories are only useful if retrieval is better than dumping the last hundred messages back into a prompt.

This repo is a small memory system for agent runtimes:

  • store typed memories
  • embed them at write time
  • rank them at recall time with similarity, recency, and importance
  • return the top slice that should re-enter the prompt

The current implementation is intentionally local and inspectable. There is no vector database hiding the ranking path. The retrieval score is visible and reproducible.

What Runs Here

  • typed memory records: episodic, semantic, procedural, preference
  • Azure embedding backend
  • in-memory storage layer
  • blended ranking: 0.70*similarity + 0.15*recency + 0.15*importance
  • CLI for replaying memory scenarios
  • checked-in live demo artifacts for support-memory recall

Live Demo

The checked-in demo stores four support memories and recalls the top three for this query:

How should I explain billing API timeout incidents to an enterprise customer?

Artifacts:

  • demo/input/support-recall.json
  • demo/output/support-recall.json
  • demo/output/demo-summary.json

Rendered recall captures:

Memory recall summary Top recall matches

Observed summary:

{
  "embedding_model": "text-embedding-3-small",
  "stored_count": 4,
  "top_match": "semantic",
  "top_blended_score": 0.7304
}

Top match shape:

{
  "memory_kind": "semantic",
  "similarity_score": 0.6363,
  "blended_score": 0.7304,
  "metadata": {
    "topic": "billing_api"
  }
}

Python API

from ai_memory_architecture import MemorySystem, Settings
from ai_memory_architecture.embedding_backend import AzureEmbeddingBackend

memory = MemorySystem(AzureEmbeddingBackend(Settings.from_env()))

memory.remember(
    agent_id="support_bot",
    memory_kind="semantic",
    content="Billing API timeout incidents usually come from stale cache invalidation.",
    importance=0.9,
    metadata={"topic": "billing_api"},
)

matches, trace = memory.recall(
    agent_id="support_bot",
    query="How should I explain the timeout to the customer?",
    limit=3,
)

print(trace.ranking_formula)
print(matches[0].content)

CLI

Install:

uv sync --extra dev

Replay the checked-in scenario:

export AZURE_OPENAI_ENDPOINT="https://<resource>.openai.azure.com/"
export AZURE_OPENAI_API_KEY="<key>"
export AZURE_OPENAI_API_VERSION="2025-04-01-preview"
export AZURE_OPENAI_EMBEDDING_DEPLOYMENT="text-embedding-3-small"

uv run aimem \
  --scenario-file demo/input/support-recall.json \
  --out /tmp/support-recall.json

Regenerate the live demo:

uv run python scripts/run_live_demo.py

Design Notes

  • embeddings are provider-backed
  • ranking is app-owned
  • memory strength is not a black box score from a hosted vector service
  • the returned matches are ready to be inserted into a downstream prompt builder

Files Worth Reading

  • src/ai_memory_architecture/system.py
  • src/ai_memory_architecture/ranking.py
  • src/ai_memory_architecture/embedding_backend.py
  • scripts/run_live_demo.py
  • docs/architecture.md
  • docs/azure-foundry.md

Tests

uv run pytest -q

About

Embedding-backed memory retrieval for agent runtimes with local ranking and Azure integration.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages