Memories are only useful if retrieval is better than dumping the last hundred messages back into a prompt.
This repo is a small memory system for agent runtimes:
- store typed memories
- embed them at write time
- rank them at recall time with similarity, recency, and importance
- return the top slice that should re-enter the prompt
The current implementation is intentionally local and inspectable. There is no vector database hiding the ranking path. The retrieval score is visible and reproducible.
- typed memory records:
episodic,semantic,procedural,preference - Azure embedding backend
- in-memory storage layer
- blended ranking:
0.70*similarity + 0.15*recency + 0.15*importance - CLI for replaying memory scenarios
- checked-in live demo artifacts for support-memory recall
The checked-in demo stores four support memories and recalls the top three for this query:
How should I explain billing API timeout incidents to an enterprise customer?
Artifacts:
demo/input/support-recall.jsondemo/output/support-recall.jsondemo/output/demo-summary.json
Rendered recall captures:
Observed summary:
{
"embedding_model": "text-embedding-3-small",
"stored_count": 4,
"top_match": "semantic",
"top_blended_score": 0.7304
}Top match shape:
{
"memory_kind": "semantic",
"similarity_score": 0.6363,
"blended_score": 0.7304,
"metadata": {
"topic": "billing_api"
}
}from ai_memory_architecture import MemorySystem, Settings
from ai_memory_architecture.embedding_backend import AzureEmbeddingBackend
memory = MemorySystem(AzureEmbeddingBackend(Settings.from_env()))
memory.remember(
agent_id="support_bot",
memory_kind="semantic",
content="Billing API timeout incidents usually come from stale cache invalidation.",
importance=0.9,
metadata={"topic": "billing_api"},
)
matches, trace = memory.recall(
agent_id="support_bot",
query="How should I explain the timeout to the customer?",
limit=3,
)
print(trace.ranking_formula)
print(matches[0].content)Install:
uv sync --extra devReplay the checked-in scenario:
export AZURE_OPENAI_ENDPOINT="https://<resource>.openai.azure.com/"
export AZURE_OPENAI_API_KEY="<key>"
export AZURE_OPENAI_API_VERSION="2025-04-01-preview"
export AZURE_OPENAI_EMBEDDING_DEPLOYMENT="text-embedding-3-small"
uv run aimem \
--scenario-file demo/input/support-recall.json \
--out /tmp/support-recall.jsonRegenerate the live demo:
uv run python scripts/run_live_demo.py- embeddings are provider-backed
- ranking is app-owned
- memory strength is not a black box score from a hosted vector service
- the returned matches are ready to be inserted into a downstream prompt builder
src/ai_memory_architecture/system.pysrc/ai_memory_architecture/ranking.pysrc/ai_memory_architecture/embedding_backend.pyscripts/run_live_demo.pydocs/architecture.mddocs/azure-foundry.md
uv run pytest -q