upsidelab · Davsooonowy · May 14, 2026
diff --git a/docs/content/docs/agents/agent-config.md b/docs/content/docs/agents/agent-config.md
@@ -96,7 +96,12 @@ class AgentConfig(ArbitraryTypeBaseModel, Generic[InjectorT]):
   - `LLMToolConfig`: AI-powered operations with language models
   - `AgentToolConfig`: Tools that use other agents
 
-#### 9. **agent_callback_handler**
+#### 9. **memory_compactor_enabled**
+- **Type**: `bool`
+- **Required**: No (default: `False`)
+- **Description**: Enables the memory compactor for this agent. When `True`, an LLM-generated summary of the conversation is persisted every 10 human messages and injected as a `SystemMessage` at the start of each agent call. See [Memory](./memory.md) for details.
+
+#### 10. **agent_callback_handler**
 - **Type**: `Optional[AgentCallbackHandlerConfig]`
 - **Required**: No
 - **Description**: Callback handler for agent-specific events and monitoring

diff --git a/docs/content/docs/agents/memory.md b/docs/content/docs/agents/memory.md
@@ -2,134 +2,118 @@
 sidebar_position: 3
 ---
 
-# Memory Types
+# Memory
 
-Enthusiast provides by default two memory management strategies to help agents maintain context and conversation history. Each memory type is designed for different use cases and performance requirements.
+Enthusiast manages conversation memory through two complementary mechanisms: **persistent chat history** and an optional **memory compactor**. Together they ensure agents have accurate, token-efficient access to conversation context.
 
 ## Overview
 
 Memory in Enthusiast serves two main purposes:
-1. **Conversation Persistence**: Storing and retrieving chat history from the database
-2. **Context Management**: Providing relevant conversation context to agents while managing token limits
+1. **Conversation Persistence**: Storing and retrieving chat history from the database via `PersistentChatHistory`
+2. **Context Management**: Assembling the right messages for each LLM call via `ContextWindowBuilder`
 
-## Available Memory Types
+On every agent invocation the context window is built as follows:
 
-### 1. Summary Chat Memory
+```
+[SystemMessage: summary of earlier conversation]   ← only when compactor has a summary
+[trim_messages: most recent 3000 tokens of history]
+[new HumanMessage]
+```
+
+## Chat History
 
-**Class**: `SummaryChatMemory`  
-**Base**: `ConversationSummaryBufferMemory` with intermediate steps persistence
+**Class**: `PersistentChatHistory`  
+**Base**: `BaseChatMessageHistory`
 
-**Description**: This memory type automatically summarizes conversation history when it exceeds the token limit, ensuring agents always have relevant context without hitting token constraints.
+`PersistentChatHistory` stores all conversation messages in the database and is the single source of truth for chat history. It is injected into every agent via the `chat_history` property on `BaseInjector`.
 
 **Key Features**:
-- Automatically summarizes long conversations
-- Persists intermediate agent steps (tool calls, observations)
-- Configurable token limit (default: 3000 tokens)
-- Maintains conversation flow while optimizing memory usage
+- Persists all message types: human, AI, tool calls, tool results, intermediate steps
+- Interleaves parallel tool calls with their results for correct reconstruction
+- Provides `messages_after(message_id)` for incremental compaction
+- Accessible via `self._injector.chat_history` inside an agent
+
+## Context Window
+
+**Class**: `ContextWindowBuilder`
 
-**Best For**:
-- Long-running conversations
-- Agents that need to remember key points from extended discussions
-- Scenarios where conversation context is important but token efficiency is required
+`ContextWindowBuilder` assembles the list of messages passed to the LLM on each agent call. It trims history to fit the token budget and optionally prepends a summary from the memory compactor.
+
+**Key Features**:
+- Trims history to the most recent 3000 tokens using `strategy="last"`
+- Prepends a `SystemMessage` with the compacted summary when one is available
+- Token counting uses an approximate counter
 
 **Configuration**:
 ```python
-SummaryChatMemory(
-    llm=language_model,
-    memory_key="chat_history",
-    return_messages=True,
-    max_token_limit=3000,
-    output_key="output",
-    chat_memory=persistent_history
-)
+MAX_HISTORY_TOKENS = 3000  # configurable in base_tool_calling_agent.py
+
+context_messages = ContextWindowBuilder(
+    chat_history=self._injector.chat_history,
+    memory_compactor=self._injector.memory_compactor,
+).build(max_tokens=MAX_HISTORY_TOKENS)
 ```
 
-### 2. Limited Chat Memory
+## Memory Compactor
 
-**Class**: `LimitedChatMemory`  
-**Base**: `ConversationTokenBufferMemory` with intermediate steps persistence
+The memory compactor is an optional mechanism that generates an LLM-based summary of the conversation and persists it to the database. It ensures that context beyond the token-trimming window is not silently lost.
 
-**Description**: This memory type maintains a fixed token limit for conversation history, automatically truncating older messages when the limit is exceeded.
+### How it works
 
-**Key Features**:
-- Fixed token limit for conversation context
-- Persists intermediate agent steps
-- Configurable token limit (default: 3000 tokens)
-- Predictable memory usage
+Every **10 human messages**, the compactor invokes the LLM to produce a summary of the conversation so far. The summary is stored on the `Conversation` record and injected as a `SystemMessage` at the start of each subsequent agent call:
 
-**Best For**:
-- Real-time applications with strict token budgets
-- Scenarios where recent context is more important than historical context
-- High-frequency chat applications
+```
+[SystemMessage: summary of earlier conversation]
+[trim_messages: most recent 3000 tokens]
+[new HumanMessage]
+```
+
+Summarisation is **incremental** — subsequent compactions send only the new messages alongside the existing summary, so the cost of each compaction stays constant regardless of conversation length.
+
+### Enabling the compactor
+
+The compactor is opt-in per agent type via `memory_compactor_enabled` in `AgentConfig`. Set it to `True` in the `AgentConfigWithDefaults` returned from your agent's `get_config()`:
 
-**Configuration**:
 ```python
-LimitedChatMemory(
-    llm=language_model,
-    memory_key="chat_history",
-    return_messages=True,
-    max_token_limit=3000,
-    output_key="output",
-    chat_memory=persistent_history
-)
+# your_agent/config.py
+
+def get_config() -> AgentConfigWithDefaults:
+    return AgentConfigWithDefaults(
+        agent_class=MyAgent,
+        system_prompt="...",
+        tools=[...],
+        memory_compactor_enabled=True,
+    )
 ```
 
-## Configuration
+No agent gets a compactor unless it explicitly opts in.
 
-### Default Settings
+## Customization
 
-- **Token Limit**: 3000 tokens (configurable)
-- **Memory Key**: "chat_history"
-- **Output Key**: "output"
-- **Message Return**: True (returns structured messages)
+### Custom Injector
 
-### Customization
+`BaseInjector` exposes `chat_history` (required) and `memory_compactor` (optional). Custom injectors must implement `chat_history`:
 
-In need of customization, those classes may be changed inside builder's methods responsible for creating it.
 ```python
-    def _build_chat_summary_memory(self) -> SummaryChatMemory:
-        history = PersistentChatHistory(self._repositories.conversation, self._config.conversation_id)
-        return SummaryChatMemory(
-            llm=self._llm,
-            memory_key="chat_history",
-            return_messages=True,
-            max_token_limit=3000,
-            output_key="output",
-            chat_memory=history,
-        )
-
-    def _build_chat_limited_memory(self) -> LimitedChatMemory:
-        history = PersistentChatHistory(self._repositories.conversation, self._config.conversation_id)
-        return LimitedChatMemory(
-            llm=self._llm,
-            memory_key="chat_history",
-            return_messages=True,
-            max_token_limit=3000,
-            output_key="output",
-            chat_memory=history,
-        )
-```
+from enthusiast_common.injectors import BaseInjector
+from enthusiast_common.memory import BaseMemoryCompactor
+from langchain_core.chat_history import BaseChatMessageHistory
+from typing import Optional
 
-## Additional memory
-In order to add additional type of memory:
-Create custom memory class and then, build custom Injector based on enthusiast-common interface - `BaseInjector`:
-```python
 class Injector(BaseInjector):
     def __init__(
         self,
         document_retriever: BaseRetriever,
         product_retriever: BaseRetriever,
         repositories: RepositoriesInstances,
-        chat_summary_memory: SummaryChatMemory,
-        chat_limited_memory: LimitedChatMemory,
-        additional_memory: AdditionalMemoryClass,
+        chat_history: BaseChatMessageHistory,
+        memory_compactor: Optional[BaseMemoryCompactor] = None,
     ):
         super().__init__(repositories)
         self._document_retriever = document_retriever
         self._product_retriever = product_retriever
-        self._chat_summary_memory = chat_summary_memory
-        self._chat_limited_memory = chat_limited_memory
-        self._additional_memory = additional_memory
+        self._chat_history = chat_history
+        self._memory_compactor = memory_compactor
 
     @property
     def document_retriever(self) -> BaseRetriever:
@@ -140,63 +124,72 @@ class Injector(BaseInjector):
         return self._product_retriever
 
     @property
-    def chat_summary_memory(self) -> SummaryChatMemory:
-        return self._chat_summary_memory
+    def chat_history(self) -> BaseChatMessageHistory:
+        return self._chat_history
 
     @property
-    def chat_limited_memory(self) -> LimitedChatMemory:
-        return self._chat_limited_memory
-
-    @property
-    def additional_memory(self) -> AdditionalMemory:
-        return self.additional_memory
+    def memory_compactor(self) -> Optional[BaseMemoryCompactor]:
+        return self._memory_compactor
 ```
-Add method to build memory class instance inside Builder:
+
+### Custom Memory Compactor
+
+To implement a custom compactor, subclass `BaseMemoryCompactor` from `enthusiast-common`:
+
 ```python
-    def _build_additional_memory(self) -> AdditionalMemory:
-        history = PersistentChatHistory(self._repositories.conversation, self._config.conversation_id)
-        return AdditionalMemory(
-            llm=self._llm,
-            memory_key="chat_history",
-            return_messages=True,
-            max_token_limit=3000,
-            output_key="output",
-            chat_memory=history,
-        )
+from enthusiast_common.memory import BaseMemoryCompactor
+from typing import Optional
+
+class MyMemoryCompactor(BaseMemoryCompactor):
+    def get_summary(self) -> Optional[str]:
+        """Return the persisted summary, or None if not yet generated."""
+        ...
+
+    def compact_if_needed(self) -> None:
+        """Generate and persist a new summary when the threshold is reached."""
+        ...
 ```
-Add it to injector:
+
+Build it in your `AgentBuilder` and pass it to the injector:
 
 ```python
+class MyAgentBuilder(BaseAgentBuilder):
     def _build_injector(self) -> BaseInjector:
-        document_retriever = self._build_document_retriever()
-        product_retriever = self._build_product_retriever()
-        chat_summary_memory = self._build_chat_summary_memory()
-        chat_limited_memory = self._build_chat_limited_memory()
-        additional_memory = self._build_additional_memory()
+        chat_history = self._build_chat_history()
+        memory_compactor = MyMemoryCompactor(...) if self._config.memory_compactor_enabled else None
         return self._config.injector(
-            product_retriever=product_retriever,
-            document_retriever=document_retriever,
+            product_retriever=self._build_product_retriever(),
+            document_retriever=self._build_document_retriever(),
             repositories=self._repositories,
-            chat_summary_memory=chat_summary_memory,
-            chat_limited_memory=chat_limited_memory,
-            additional_memory=additional_memory
+            chat_history=chat_history,
+            memory_compactor=memory_compactor,
         )
 ```
+
 ## Usage Examples
 
-### Basic Memory Usage
-All memory instances are accessible inside Agent class via `self.injector`
+### Accessing chat history in an agent
+
+`chat_history` is accessible via `self._injector` inside any agent class:
+
 ```python
 from enthusiast_common.agents import BaseAgent
-from langchain.agents import AgentExecutor, create_tool_calling_agent
+from langchain_core.messages import HumanMessage
 
 class MyAgent(BaseAgent):
-    def _build_agent_executor(self) -> AgentExecutor:
-        tools = self._build_tools()
-        agent = create_tool_calling_agent(
-            tools=tools,
-            llm=self._llm,
-            prompt=self._prompt,
-        )
-        return AgentExecutor(agent=agent, tools=tools, memory=self.injector.chat_limited_memory)
+    def get_answer(self, input_text: str) -> str:
+        history = self._injector.chat_history
+        context = ContextWindowBuilder(
+            chat_history=history,
+            memory_compactor=self._injector.memory_compactor,
+        ).build(max_tokens=3000)
+
+        input_messages = context + [HumanMessage(content=input_text)]
+        result = self._agent.invoke({"messages": input_messages})
+        history.add_messages(result["messages"][len(context):])
+
+        if self._injector.memory_compactor:
+            self._injector.memory_compactor.compact_if_needed()
+
+        return result["output"]
 ```