ai-memory-architecture

Memories are only useful if retrieval is better than dumping the last hundred messages back into a prompt.

This repo is a small memory system for agent runtimes:

store typed memories
embed them at write time
rank them at recall time with similarity, recency, and importance
return the top slice that should re-enter the prompt

The current implementation is intentionally local and inspectable. There is no vector database hiding the ranking path. The retrieval score is visible and reproducible.

What Runs Here

typed memory records: episodic, semantic, procedural, preference
Azure embedding backend
in-memory storage layer
blended ranking: 0.70*similarity + 0.15*recency + 0.15*importance
CLI for replaying memory scenarios
checked-in live demo artifacts for support-memory recall

Live Demo

The checked-in demo stores four support memories and recalls the top three for this query:

How should I explain billing API timeout incidents to an enterprise customer?

Artifacts:

demo/input/support-recall.json
demo/output/support-recall.json
demo/output/demo-summary.json

Rendered recall captures:

Observed summary:

{
  "embedding_model": "text-embedding-3-small",
  "stored_count": 4,
  "top_match": "semantic",
  "top_blended_score": 0.7304
}

Top match shape:

{
  "memory_kind": "semantic",
  "similarity_score": 0.6363,
  "blended_score": 0.7304,
  "metadata": {
    "topic": "billing_api"
  }
}

Python API

from ai_memory_architecture import MemorySystem, Settings
from ai_memory_architecture.embedding_backend import AzureEmbeddingBackend

memory = MemorySystem(AzureEmbeddingBackend(Settings.from_env()))

memory.remember(
    agent_id="support_bot",
    memory_kind="semantic",
    content="Billing API timeout incidents usually come from stale cache invalidation.",
    importance=0.9,
    metadata={"topic": "billing_api"},
)

matches, trace = memory.recall(
    agent_id="support_bot",
    query="How should I explain the timeout to the customer?",
    limit=3,
)

print(trace.ranking_formula)
print(matches[0].content)

CLI

Install:

uv sync --extra dev

Replay the checked-in scenario:

export AZURE_OPENAI_ENDPOINT="https://<resource>.openai.azure.com/"
export AZURE_OPENAI_API_KEY="<key>"
export AZURE_OPENAI_API_VERSION="2025-04-01-preview"
export AZURE_OPENAI_EMBEDDING_DEPLOYMENT="text-embedding-3-small"

uv run aimem \
  --scenario-file demo/input/support-recall.json \
  --out /tmp/support-recall.json

Regenerate the live demo:

uv run python scripts/run_live_demo.py

Design Notes

embeddings are provider-backed
ranking is app-owned
memory strength is not a black box score from a hosted vector service
the returned matches are ready to be inserted into a downstream prompt builder

Files Worth Reading

src/ai_memory_architecture/system.py
src/ai_memory_architecture/ranking.py
src/ai_memory_architecture/embedding_backend.py
scripts/run_live_demo.py
docs/architecture.md
docs/azure-foundry.md

Tests

uv run pytest -q

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
demo		demo
docs		docs
scripts		scripts
src/ai_memory_architecture		src/ai_memory_architecture
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ai-memory-architecture

What Runs Here

Live Demo

Python API

CLI

Design Notes

Files Worth Reading

Tests

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ai-memory-architecture

What Runs Here

Live Demo

Python API

CLI

Design Notes

Files Worth Reading

Tests

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages