Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion docs/content/docs/agents/agent-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,12 @@ class AgentConfig(ArbitraryTypeBaseModel, Generic[InjectorT]):
- `LLMToolConfig`: AI-powered operations with language models
- `AgentToolConfig`: Tools that use other agents

#### 9. **agent_callback_handler**
#### 9. **memory_compactor_enabled**
- **Type**: `bool`
- **Required**: No (default: `False`)
- **Description**: Enables the memory compactor for this agent. When `True`, an LLM-generated summary of the conversation is persisted every 10 human messages and injected as a `SystemMessage` at the start of each agent call. See [Memory](./memory.md) for details.

#### 10. **agent_callback_handler**
- **Type**: `Optional[AgentCallbackHandlerConfig]`
- **Required**: No
- **Description**: Callback handler for agent-specific events and monitoring
Expand Down
249 changes: 121 additions & 128 deletions docs/content/docs/agents/memory.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,134 +2,118 @@
sidebar_position: 3
---

# Memory Types
# Memory

Enthusiast provides by default two memory management strategies to help agents maintain context and conversation history. Each memory type is designed for different use cases and performance requirements.
Enthusiast manages conversation memory through two complementary mechanisms: **persistent chat history** and an optional **memory compactor**. Together they ensure agents have accurate, token-efficient access to conversation context.

## Overview

Memory in Enthusiast serves two main purposes:
1. **Conversation Persistence**: Storing and retrieving chat history from the database
2. **Context Management**: Providing relevant conversation context to agents while managing token limits
1. **Conversation Persistence**: Storing and retrieving chat history from the database via `PersistentChatHistory`
2. **Context Management**: Assembling the right messages for each LLM call via `ContextWindowBuilder`

## Available Memory Types
On every agent invocation the context window is built as follows:

### 1. Summary Chat Memory
```
[SystemMessage: summary of earlier conversation] ← only when compactor has a summary
[trim_messages: most recent 3000 tokens of history]
[new HumanMessage]
```

## Chat History

**Class**: `SummaryChatMemory`
**Base**: `ConversationSummaryBufferMemory` with intermediate steps persistence
**Class**: `PersistentChatHistory`
**Base**: `BaseChatMessageHistory`

**Description**: This memory type automatically summarizes conversation history when it exceeds the token limit, ensuring agents always have relevant context without hitting token constraints.
`PersistentChatHistory` stores all conversation messages in the database and is the single source of truth for chat history. It is injected into every agent via the `chat_history` property on `BaseInjector`.

**Key Features**:
- Automatically summarizes long conversations
- Persists intermediate agent steps (tool calls, observations)
- Configurable token limit (default: 3000 tokens)
- Maintains conversation flow while optimizing memory usage
- Persists all message types: human, AI, tool calls, tool results, intermediate steps
- Interleaves parallel tool calls with their results for correct reconstruction
- Provides `messages_after(message_id)` for incremental compaction
- Accessible via `self._injector.chat_history` inside an agent

## Context Window

**Class**: `ContextWindowBuilder`

**Best For**:
- Long-running conversations
- Agents that need to remember key points from extended discussions
- Scenarios where conversation context is important but token efficiency is required
`ContextWindowBuilder` assembles the list of messages passed to the LLM on each agent call. It trims history to fit the token budget and optionally prepends a summary from the memory compactor.

**Key Features**:
- Trims history to the most recent 3000 tokens using `strategy="last"`
- Prepends a `SystemMessage` with the compacted summary when one is available
- Token counting uses an approximate counter

**Configuration**:
```python
SummaryChatMemory(
llm=language_model,
memory_key="chat_history",
return_messages=True,
max_token_limit=3000,
output_key="output",
chat_memory=persistent_history
)
MAX_HISTORY_TOKENS = 3000 # configurable in base_tool_calling_agent.py

context_messages = ContextWindowBuilder(
chat_history=self._injector.chat_history,
memory_compactor=self._injector.memory_compactor,
).build(max_tokens=MAX_HISTORY_TOKENS)
```

### 2. Limited Chat Memory
## Memory Compactor

**Class**: `LimitedChatMemory`
**Base**: `ConversationTokenBufferMemory` with intermediate steps persistence
The memory compactor is an optional mechanism that generates an LLM-based summary of the conversation and persists it to the database. It ensures that context beyond the token-trimming window is not silently lost.

**Description**: This memory type maintains a fixed token limit for conversation history, automatically truncating older messages when the limit is exceeded.
### How it works

**Key Features**:
- Fixed token limit for conversation context
- Persists intermediate agent steps
- Configurable token limit (default: 3000 tokens)
- Predictable memory usage
Every **10 human messages**, the compactor invokes the LLM to produce a summary of the conversation so far. The summary is stored on the `Conversation` record and injected as a `SystemMessage` at the start of each subsequent agent call:

**Best For**:
- Real-time applications with strict token budgets
- Scenarios where recent context is more important than historical context
- High-frequency chat applications
```
[SystemMessage: summary of earlier conversation]
[trim_messages: most recent 3000 tokens]
[new HumanMessage]
```

Summarisation is **incremental** — subsequent compactions send only the new messages alongside the existing summary, so the cost of each compaction stays constant regardless of conversation length.

### Enabling the compactor

The compactor is opt-in per agent type via `memory_compactor_enabled` in `AgentConfig`. Set it to `True` in the `AgentConfigWithDefaults` returned from your agent's `get_config()`:

**Configuration**:
```python
LimitedChatMemory(
llm=language_model,
memory_key="chat_history",
return_messages=True,
max_token_limit=3000,
output_key="output",
chat_memory=persistent_history
)
# your_agent/config.py

def get_config() -> AgentConfigWithDefaults:
return AgentConfigWithDefaults(
agent_class=MyAgent,
system_prompt="...",
tools=[...],
memory_compactor_enabled=True,
)
```

## Configuration
No agent gets a compactor unless it explicitly opts in.

### Default Settings
## Customization

- **Token Limit**: 3000 tokens (configurable)
- **Memory Key**: "chat_history"
- **Output Key**: "output"
- **Message Return**: True (returns structured messages)
### Custom Injector

### Customization
`BaseInjector` exposes `chat_history` (required) and `memory_compactor` (optional). Custom injectors must implement `chat_history`:

In need of customization, those classes may be changed inside builder's methods responsible for creating it.
```python
def _build_chat_summary_memory(self) -> SummaryChatMemory:
history = PersistentChatHistory(self._repositories.conversation, self._config.conversation_id)
return SummaryChatMemory(
llm=self._llm,
memory_key="chat_history",
return_messages=True,
max_token_limit=3000,
output_key="output",
chat_memory=history,
)

def _build_chat_limited_memory(self) -> LimitedChatMemory:
history = PersistentChatHistory(self._repositories.conversation, self._config.conversation_id)
return LimitedChatMemory(
llm=self._llm,
memory_key="chat_history",
return_messages=True,
max_token_limit=3000,
output_key="output",
chat_memory=history,
)
```
from enthusiast_common.injectors import BaseInjector
from enthusiast_common.memory import BaseMemoryCompactor
from langchain_core.chat_history import BaseChatMessageHistory
from typing import Optional

## Additional memory
In order to add additional type of memory:
Create custom memory class and then, build custom Injector based on enthusiast-common interface - `BaseInjector`:
```python
class Injector(BaseInjector):
def __init__(
self,
document_retriever: BaseRetriever,
product_retriever: BaseRetriever,
repositories: RepositoriesInstances,
chat_summary_memory: SummaryChatMemory,
chat_limited_memory: LimitedChatMemory,
additional_memory: AdditionalMemoryClass,
chat_history: BaseChatMessageHistory,
memory_compactor: Optional[BaseMemoryCompactor] = None,
):
super().__init__(repositories)
self._document_retriever = document_retriever
self._product_retriever = product_retriever
self._chat_summary_memory = chat_summary_memory
self._chat_limited_memory = chat_limited_memory
self._additional_memory = additional_memory
self._chat_history = chat_history
self._memory_compactor = memory_compactor

@property
def document_retriever(self) -> BaseRetriever:
Expand All @@ -140,63 +124,72 @@ class Injector(BaseInjector):
return self._product_retriever

@property
def chat_summary_memory(self) -> SummaryChatMemory:
return self._chat_summary_memory
def chat_history(self) -> BaseChatMessageHistory:
return self._chat_history

@property
def chat_limited_memory(self) -> LimitedChatMemory:
return self._chat_limited_memory

@property
def additional_memory(self) -> AdditionalMemory:
return self.additional_memory
def memory_compactor(self) -> Optional[BaseMemoryCompactor]:
return self._memory_compactor
```
Add method to build memory class instance inside Builder:

### Custom Memory Compactor

To implement a custom compactor, subclass `BaseMemoryCompactor` from `enthusiast-common`:

```python
def _build_additional_memory(self) -> AdditionalMemory:
history = PersistentChatHistory(self._repositories.conversation, self._config.conversation_id)
return AdditionalMemory(
llm=self._llm,
memory_key="chat_history",
return_messages=True,
max_token_limit=3000,
output_key="output",
chat_memory=history,
)
from enthusiast_common.memory import BaseMemoryCompactor
from typing import Optional

class MyMemoryCompactor(BaseMemoryCompactor):
def get_summary(self) -> Optional[str]:
"""Return the persisted summary, or None if not yet generated."""
...

def compact_if_needed(self) -> None:
"""Generate and persist a new summary when the threshold is reached."""
...
```
Add it to injector:

Build it in your `AgentBuilder` and pass it to the injector:

```python
class MyAgentBuilder(BaseAgentBuilder):
def _build_injector(self) -> BaseInjector:
document_retriever = self._build_document_retriever()
product_retriever = self._build_product_retriever()
chat_summary_memory = self._build_chat_summary_memory()
chat_limited_memory = self._build_chat_limited_memory()
additional_memory = self._build_additional_memory()
chat_history = self._build_chat_history()
memory_compactor = MyMemoryCompactor(...) if self._config.memory_compactor_enabled else None
return self._config.injector(
product_retriever=product_retriever,
document_retriever=document_retriever,
product_retriever=self._build_product_retriever(),
document_retriever=self._build_document_retriever(),
repositories=self._repositories,
chat_summary_memory=chat_summary_memory,
chat_limited_memory=chat_limited_memory,
additional_memory=additional_memory
chat_history=chat_history,
memory_compactor=memory_compactor,
)
```

## Usage Examples

### Basic Memory Usage
All memory instances are accessible inside Agent class via `self.injector`
### Accessing chat history in an agent

`chat_history` is accessible via `self._injector` inside any agent class:

```python
from enthusiast_common.agents import BaseAgent
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.messages import HumanMessage

class MyAgent(BaseAgent):
def _build_agent_executor(self) -> AgentExecutor:
tools = self._build_tools()
agent = create_tool_calling_agent(
tools=tools,
llm=self._llm,
prompt=self._prompt,
)
return AgentExecutor(agent=agent, tools=tools, memory=self.injector.chat_limited_memory)
def get_answer(self, input_text: str) -> str:
history = self._injector.chat_history
context = ContextWindowBuilder(
chat_history=history,
memory_compactor=self._injector.memory_compactor,
).build(max_tokens=3000)

input_messages = context + [HumanMessage(content=input_text)]
result = self._agent.invoke({"messages": input_messages})
history.add_messages(result["messages"][len(context):])

if self._injector.memory_compactor:
self._injector.memory_compactor.compact_if_needed()

return result["output"]
```
Loading