-
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
0 / 20 of 2 issues completedOpen
0 / 20 of 2 issues completed
Copy link
Labels
component: frontendReact/TypeScript UIReact/TypeScript UIepicllm: inferenceModel loading/inferenceModel loading/inferencepriority: lowNice to haveNice to havesize: l1-3 days1-3 daystype: featureNew functionality or enhancementNew functionality or enhancement
Description
Summary
The agentic loop's context management uses character-based budgeting (MAX_CONTEXT_CHARS = 180_000 in agentLoop.ts) for pruning conversation history. Characters are a rough proxy for tokens and don't account for tokenizer-specific encoding. Additionally, the main agentic loop lacks the sophisticated context compression available in the deep research loop (roundSummaries, multi-round pruning).
Current Implementation
Character-based budget (agentLoop.ts):
export const MAX_CONTEXT_CHARS = 180_000;
export const KEEP_LAST_TOOL_MESSAGES = 10;
export const TOOL_RESULT_SNIPPET_CHARS = 4_000;
function totalChars(messages: ChatMessage[]): number {
return messages.reduce((acc, m) => acc + (m.content?.length ?? 0), 0);
}
export function pruneForBudget(messages: ChatMessage[]): ChatMessage[] {
if (totalChars(messages) <= MAX_CONTEXT_CHARS) return messages;
// ... drop old tool messages, then drop all but last 12 turns
}Tool result truncation:
export function summarizeToolResult(_name: string, res: ToolResult): string {
if (!res.success) {
return `ERROR: ${res.error}`.slice(0, TOOL_RESULT_SNIPPET_CHARS);
}
const raw = stableStringify(res.data);
return raw.slice(0, TOOL_RESULT_SNIPPET_CHARS); // ← naive truncation
}Problems
- Chars ≠ tokens — 180K chars might be 45K tokens or 90K tokens depending on content (code vs prose vs JSON)
- Naive truncation —
slice(0, 4000)can cut JSON mid-object, producing invalid data that confuses the model - No summarization — the deep research loop summarizes completed rounds, but the main agent loop doesn't summarize completed tool interactions
- Fixed budget — doesn't adapt to the actual model's context window size
Proposed Solution
Phase 1: Token-approximate budgeting
Replace character budget with a simple token approximation:
function estimateTokens(text: string): number {
// GPT-style: ~4 chars per token for English, ~3 for code/JSON
// This is intentionally conservative (overestimates)
return Math.ceil(text.length / 3);
}
export const MAX_CONTEXT_TOKENS = 32_000; // Default; should be configurable per modelMake the budget configurable based on the active model's known context window.
Phase 2: Smart tool result truncation
Replace naive slice() with structure-preserving truncation:
function truncateToolResult(data: unknown, maxChars: number): string {
const raw = stableStringify(data);
if (raw.length <= maxChars) return raw;
// For arrays: keep first and last elements, indicate truncation
if (Array.isArray(data) && data.length > 2) {
const first = stableStringify(data[0]);
const last = stableStringify(data[data.length - 1]);
return `[${first}, ... (${data.length - 2} items omitted), ${last}]`;
}
// For objects: keep keys, truncate long values
// For strings: truncate with "..." indicator
return raw.slice(0, maxChars - 20) + '... (truncated)';
}Phase 3: Lift round-summary compression from deep research
Port the createRoundSummary pattern from deep research into the main agent loop:
- After every N tool iterations (e.g., 5), summarize completed tool interactions into a compressed working memory entry
- Replace individual tool result messages with the summary
- This keeps the context lean during long agent sessions
Files to Modify
| File | Change |
|---|---|
src/hooks/useGglibRuntime/agentLoop.ts |
Replace MAX_CONTEXT_CHARS with token-based budget; improve summarizeToolResult |
src/hooks/useGglibRuntime/agentLoop.ts |
Add estimateTokens() utility |
src/hooks/useGglibRuntime/agentLoop.ts |
Port round-summary compression from deep research |
src/hooks/useGglibRuntime/runAgenticLoop.ts |
Use configurable budget based on model context window |
src/config/ |
Add agent loop configuration (max tokens, summarization interval) |
Acceptance Criteria
- Context budget uses token approximation instead of raw character count
- Budget is configurable and adapts to model's context window when known
- Tool result truncation preserves JSON structure (no mid-object cuts)
- Array results show first/last elements with count of omitted items
- Periodic summarization compresses old tool interactions
- Long agent sessions (20+ iterations) don't overflow context window
-
estimateTokens()has unit tests with known inputs - No regression for short conversations that fit within budget
Reactions are currently unavailable
Sub-issues
Metadata
Metadata
Assignees
Labels
component: frontendReact/TypeScript UIReact/TypeScript UIepicllm: inferenceModel loading/inferenceModel loading/inferencepriority: lowNice to haveNice to havesize: l1-3 days1-3 daystype: featureNew functionality or enhancementNew functionality or enhancement