Skip to content

feat(agent): Token-based context budgeting and smart tool result truncation #249

@mmogr

Description

@mmogr

Summary

The agentic loop's context management uses character-based budgeting (MAX_CONTEXT_CHARS = 180_000 in agentLoop.ts) for pruning conversation history. Characters are a rough proxy for tokens and don't account for tokenizer-specific encoding. Additionally, the main agentic loop lacks the sophisticated context compression available in the deep research loop (roundSummaries, multi-round pruning).

Current Implementation

Character-based budget (agentLoop.ts):

export const MAX_CONTEXT_CHARS = 180_000;
export const KEEP_LAST_TOOL_MESSAGES = 10;
export const TOOL_RESULT_SNIPPET_CHARS = 4_000;

function totalChars(messages: ChatMessage[]): number {
  return messages.reduce((acc, m) => acc + (m.content?.length ?? 0), 0);
}

export function pruneForBudget(messages: ChatMessage[]): ChatMessage[] {
  if (totalChars(messages) <= MAX_CONTEXT_CHARS) return messages;
  // ... drop old tool messages, then drop all but last 12 turns
}

Tool result truncation:

export function summarizeToolResult(_name: string, res: ToolResult): string {
  if (!res.success) {
    return `ERROR: ${res.error}`.slice(0, TOOL_RESULT_SNIPPET_CHARS);
  }
  const raw = stableStringify(res.data);
  return raw.slice(0, TOOL_RESULT_SNIPPET_CHARS); // ← naive truncation
}

Problems

  1. Chars ≠ tokens — 180K chars might be 45K tokens or 90K tokens depending on content (code vs prose vs JSON)
  2. Naive truncationslice(0, 4000) can cut JSON mid-object, producing invalid data that confuses the model
  3. No summarization — the deep research loop summarizes completed rounds, but the main agent loop doesn't summarize completed tool interactions
  4. Fixed budget — doesn't adapt to the actual model's context window size

Proposed Solution

Phase 1: Token-approximate budgeting

Replace character budget with a simple token approximation:

function estimateTokens(text: string): number {
  // GPT-style: ~4 chars per token for English, ~3 for code/JSON
  // This is intentionally conservative (overestimates)
  return Math.ceil(text.length / 3);
}

export const MAX_CONTEXT_TOKENS = 32_000; // Default; should be configurable per model

Make the budget configurable based on the active model's known context window.

Phase 2: Smart tool result truncation

Replace naive slice() with structure-preserving truncation:

function truncateToolResult(data: unknown, maxChars: number): string {
  const raw = stableStringify(data);
  if (raw.length <= maxChars) return raw;
  
  // For arrays: keep first and last elements, indicate truncation
  if (Array.isArray(data) && data.length > 2) {
    const first = stableStringify(data[0]);
    const last = stableStringify(data[data.length - 1]);
    return `[${first}, ... (${data.length - 2} items omitted), ${last}]`;
  }
  
  // For objects: keep keys, truncate long values
  // For strings: truncate with "..." indicator
  return raw.slice(0, maxChars - 20) + '... (truncated)';
}

Phase 3: Lift round-summary compression from deep research

Port the createRoundSummary pattern from deep research into the main agent loop:

  • After every N tool iterations (e.g., 5), summarize completed tool interactions into a compressed working memory entry
  • Replace individual tool result messages with the summary
  • This keeps the context lean during long agent sessions

Files to Modify

File Change
src/hooks/useGglibRuntime/agentLoop.ts Replace MAX_CONTEXT_CHARS with token-based budget; improve summarizeToolResult
src/hooks/useGglibRuntime/agentLoop.ts Add estimateTokens() utility
src/hooks/useGglibRuntime/agentLoop.ts Port round-summary compression from deep research
src/hooks/useGglibRuntime/runAgenticLoop.ts Use configurable budget based on model context window
src/config/ Add agent loop configuration (max tokens, summarization interval)

Acceptance Criteria

  • Context budget uses token approximation instead of raw character count
  • Budget is configurable and adapts to model's context window when known
  • Tool result truncation preserves JSON structure (no mid-object cuts)
  • Array results show first/last elements with count of omitted items
  • Periodic summarization compresses old tool interactions
  • Long agent sessions (20+ iterations) don't overflow context window
  • estimateTokens() has unit tests with known inputs
  • No regression for short conversations that fit within budget

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions