diff --git a/docs/content/docs/agents/agent-config.md b/docs/content/docs/agents/agent-config.md index 619d3c90..8239d929 100644 --- a/docs/content/docs/agents/agent-config.md +++ b/docs/content/docs/agents/agent-config.md @@ -96,7 +96,12 @@ class AgentConfig(ArbitraryTypeBaseModel, Generic[InjectorT]): - `LLMToolConfig`: AI-powered operations with language models - `AgentToolConfig`: Tools that use other agents -#### 9. **agent_callback_handler** +#### 9. **memory_compactor_enabled** +- **Type**: `bool` +- **Required**: No (default: `False`) +- **Description**: Enables the memory compactor for this agent. When `True`, an LLM-generated summary of the conversation is persisted every 10 human messages and injected as a `SystemMessage` at the start of each agent call. See [Memory](./memory.md) for details. + +#### 10. **agent_callback_handler** - **Type**: `Optional[AgentCallbackHandlerConfig]` - **Required**: No - **Description**: Callback handler for agent-specific events and monitoring diff --git a/docs/content/docs/agents/memory.md b/docs/content/docs/agents/memory.md index ae13bf10..dca959eb 100644 --- a/docs/content/docs/agents/memory.md +++ b/docs/content/docs/agents/memory.md @@ -2,134 +2,118 @@ sidebar_position: 3 --- -# Memory Types +# Memory -Enthusiast provides by default two memory management strategies to help agents maintain context and conversation history. Each memory type is designed for different use cases and performance requirements. +Enthusiast manages conversation memory through two complementary mechanisms: **persistent chat history** and an optional **memory compactor**. Together they ensure agents have accurate, token-efficient access to conversation context. ## Overview Memory in Enthusiast serves two main purposes: -1. **Conversation Persistence**: Storing and retrieving chat history from the database -2. **Context Management**: Providing relevant conversation context to agents while managing token limits +1. **Conversation Persistence**: Storing and retrieving chat history from the database via `PersistentChatHistory` +2. **Context Management**: Assembling the right messages for each LLM call via `ContextWindowBuilder` -## Available Memory Types +On every agent invocation the context window is built as follows: -### 1. Summary Chat Memory +``` +[SystemMessage: summary of earlier conversation] ← only when compactor has a summary +[trim_messages: most recent 3000 tokens of history] +[new HumanMessage] +``` + +## Chat History -**Class**: `SummaryChatMemory` -**Base**: `ConversationSummaryBufferMemory` with intermediate steps persistence +**Class**: `PersistentChatHistory` +**Base**: `BaseChatMessageHistory` -**Description**: This memory type automatically summarizes conversation history when it exceeds the token limit, ensuring agents always have relevant context without hitting token constraints. +`PersistentChatHistory` stores all conversation messages in the database and is the single source of truth for chat history. It is injected into every agent via the `chat_history` property on `BaseInjector`. **Key Features**: -- Automatically summarizes long conversations -- Persists intermediate agent steps (tool calls, observations) -- Configurable token limit (default: 3000 tokens) -- Maintains conversation flow while optimizing memory usage +- Persists all message types: human, AI, tool calls, tool results, intermediate steps +- Interleaves parallel tool calls with their results for correct reconstruction +- Provides `messages_after(message_id)` for incremental compaction +- Accessible via `self._injector.chat_history` inside an agent + +## Context Window + +**Class**: `ContextWindowBuilder` -**Best For**: -- Long-running conversations -- Agents that need to remember key points from extended discussions -- Scenarios where conversation context is important but token efficiency is required +`ContextWindowBuilder` assembles the list of messages passed to the LLM on each agent call. It trims history to fit the token budget and optionally prepends a summary from the memory compactor. + +**Key Features**: +- Trims history to the most recent 3000 tokens using `strategy="last"` +- Prepends a `SystemMessage` with the compacted summary when one is available +- Token counting uses an approximate counter **Configuration**: ```python -SummaryChatMemory( - llm=language_model, - memory_key="chat_history", - return_messages=True, - max_token_limit=3000, - output_key="output", - chat_memory=persistent_history -) +MAX_HISTORY_TOKENS = 3000 # configurable in base_tool_calling_agent.py + +context_messages = ContextWindowBuilder( + chat_history=self._injector.chat_history, + memory_compactor=self._injector.memory_compactor, +).build(max_tokens=MAX_HISTORY_TOKENS) ``` -### 2. Limited Chat Memory +## Memory Compactor -**Class**: `LimitedChatMemory` -**Base**: `ConversationTokenBufferMemory` with intermediate steps persistence +The memory compactor is an optional mechanism that generates an LLM-based summary of the conversation and persists it to the database. It ensures that context beyond the token-trimming window is not silently lost. -**Description**: This memory type maintains a fixed token limit for conversation history, automatically truncating older messages when the limit is exceeded. +### How it works -**Key Features**: -- Fixed token limit for conversation context -- Persists intermediate agent steps -- Configurable token limit (default: 3000 tokens) -- Predictable memory usage +Every **10 human messages**, the compactor invokes the LLM to produce a summary of the conversation so far. The summary is stored on the `Conversation` record and injected as a `SystemMessage` at the start of each subsequent agent call: -**Best For**: -- Real-time applications with strict token budgets -- Scenarios where recent context is more important than historical context -- High-frequency chat applications +``` +[SystemMessage: summary of earlier conversation] +[trim_messages: most recent 3000 tokens] +[new HumanMessage] +``` + +Summarisation is **incremental** — subsequent compactions send only the new messages alongside the existing summary, so the cost of each compaction stays constant regardless of conversation length. + +### Enabling the compactor + +The compactor is opt-in per agent type via `memory_compactor_enabled` in `AgentConfig`. Set it to `True` in the `AgentConfigWithDefaults` returned from your agent's `get_config()`: -**Configuration**: ```python -LimitedChatMemory( - llm=language_model, - memory_key="chat_history", - return_messages=True, - max_token_limit=3000, - output_key="output", - chat_memory=persistent_history -) +# your_agent/config.py + +def get_config() -> AgentConfigWithDefaults: + return AgentConfigWithDefaults( + agent_class=MyAgent, + system_prompt="...", + tools=[...], + memory_compactor_enabled=True, + ) ``` -## Configuration +No agent gets a compactor unless it explicitly opts in. -### Default Settings +## Customization -- **Token Limit**: 3000 tokens (configurable) -- **Memory Key**: "chat_history" -- **Output Key**: "output" -- **Message Return**: True (returns structured messages) +### Custom Injector -### Customization +`BaseInjector` exposes `chat_history` (required) and `memory_compactor` (optional). Custom injectors must implement `chat_history`: -In need of customization, those classes may be changed inside builder's methods responsible for creating it. ```python - def _build_chat_summary_memory(self) -> SummaryChatMemory: - history = PersistentChatHistory(self._repositories.conversation, self._config.conversation_id) - return SummaryChatMemory( - llm=self._llm, - memory_key="chat_history", - return_messages=True, - max_token_limit=3000, - output_key="output", - chat_memory=history, - ) - - def _build_chat_limited_memory(self) -> LimitedChatMemory: - history = PersistentChatHistory(self._repositories.conversation, self._config.conversation_id) - return LimitedChatMemory( - llm=self._llm, - memory_key="chat_history", - return_messages=True, - max_token_limit=3000, - output_key="output", - chat_memory=history, - ) -``` +from enthusiast_common.injectors import BaseInjector +from enthusiast_common.memory import BaseMemoryCompactor +from langchain_core.chat_history import BaseChatMessageHistory +from typing import Optional -## Additional memory -In order to add additional type of memory: -Create custom memory class and then, build custom Injector based on enthusiast-common interface - `BaseInjector`: -```python class Injector(BaseInjector): def __init__( self, document_retriever: BaseRetriever, product_retriever: BaseRetriever, repositories: RepositoriesInstances, - chat_summary_memory: SummaryChatMemory, - chat_limited_memory: LimitedChatMemory, - additional_memory: AdditionalMemoryClass, + chat_history: BaseChatMessageHistory, + memory_compactor: Optional[BaseMemoryCompactor] = None, ): super().__init__(repositories) self._document_retriever = document_retriever self._product_retriever = product_retriever - self._chat_summary_memory = chat_summary_memory - self._chat_limited_memory = chat_limited_memory - self._additional_memory = additional_memory + self._chat_history = chat_history + self._memory_compactor = memory_compactor @property def document_retriever(self) -> BaseRetriever: @@ -140,63 +124,72 @@ class Injector(BaseInjector): return self._product_retriever @property - def chat_summary_memory(self) -> SummaryChatMemory: - return self._chat_summary_memory + def chat_history(self) -> BaseChatMessageHistory: + return self._chat_history @property - def chat_limited_memory(self) -> LimitedChatMemory: - return self._chat_limited_memory - - @property - def additional_memory(self) -> AdditionalMemory: - return self.additional_memory + def memory_compactor(self) -> Optional[BaseMemoryCompactor]: + return self._memory_compactor ``` -Add method to build memory class instance inside Builder: + +### Custom Memory Compactor + +To implement a custom compactor, subclass `BaseMemoryCompactor` from `enthusiast-common`: + ```python - def _build_additional_memory(self) -> AdditionalMemory: - history = PersistentChatHistory(self._repositories.conversation, self._config.conversation_id) - return AdditionalMemory( - llm=self._llm, - memory_key="chat_history", - return_messages=True, - max_token_limit=3000, - output_key="output", - chat_memory=history, - ) +from enthusiast_common.memory import BaseMemoryCompactor +from typing import Optional + +class MyMemoryCompactor(BaseMemoryCompactor): + def get_summary(self) -> Optional[str]: + """Return the persisted summary, or None if not yet generated.""" + ... + + def compact_if_needed(self) -> None: + """Generate and persist a new summary when the threshold is reached.""" + ... ``` -Add it to injector: + +Build it in your `AgentBuilder` and pass it to the injector: ```python +class MyAgentBuilder(BaseAgentBuilder): def _build_injector(self) -> BaseInjector: - document_retriever = self._build_document_retriever() - product_retriever = self._build_product_retriever() - chat_summary_memory = self._build_chat_summary_memory() - chat_limited_memory = self._build_chat_limited_memory() - additional_memory = self._build_additional_memory() + chat_history = self._build_chat_history() + memory_compactor = MyMemoryCompactor(...) if self._config.memory_compactor_enabled else None return self._config.injector( - product_retriever=product_retriever, - document_retriever=document_retriever, + product_retriever=self._build_product_retriever(), + document_retriever=self._build_document_retriever(), repositories=self._repositories, - chat_summary_memory=chat_summary_memory, - chat_limited_memory=chat_limited_memory, - additional_memory=additional_memory + chat_history=chat_history, + memory_compactor=memory_compactor, ) ``` + ## Usage Examples -### Basic Memory Usage -All memory instances are accessible inside Agent class via `self.injector` +### Accessing chat history in an agent + +`chat_history` is accessible via `self._injector` inside any agent class: + ```python from enthusiast_common.agents import BaseAgent -from langchain.agents import AgentExecutor, create_tool_calling_agent +from langchain_core.messages import HumanMessage class MyAgent(BaseAgent): - def _build_agent_executor(self) -> AgentExecutor: - tools = self._build_tools() - agent = create_tool_calling_agent( - tools=tools, - llm=self._llm, - prompt=self._prompt, - ) - return AgentExecutor(agent=agent, tools=tools, memory=self.injector.chat_limited_memory) + def get_answer(self, input_text: str) -> str: + history = self._injector.chat_history + context = ContextWindowBuilder( + chat_history=history, + memory_compactor=self._injector.memory_compactor, + ).build(max_tokens=3000) + + input_messages = context + [HumanMessage(content=input_text)] + result = self._agent.invoke({"messages": input_messages}) + history.add_messages(result["messages"][len(context):]) + + if self._injector.memory_compactor: + self._injector.memory_compactor.compact_if_needed() + + return result["output"] ```