Skip to content

sub(#249) Phase 2: Smart truncation strategies and round-summary compression #272

@mmogr

Description

@mmogr

Parent: #249 — Token-based context budgeting

Goal

Implement smart truncation strategies and lift the round-summary pattern from deep research to compress context when nearing limits.

Background

Phase 1 (#271) adds token estimation and basic budget. This phase adds intelligent strategies for staying within budget while preserving important context.

Implementation

1. Smart truncation strategies

Instead of simply dropping old messages, implement a priority-based truncation:

interface TruncationStrategy {
  /** Messages to always keep (system, last user query) */
  pinned: Set<number>;
  
  /** Compress tool results beyond length limit */
  maxToolResultTokens: number;
  
  /** Summarize older conversation turns instead of dropping */
  summarizeAfterTurns: number;
}

function smartPrune(
  messages: any[],
  budget: ContextBudget,
  strategy: TruncationStrategy
): any[] {
  let pruned = [...messages];
  
  // Step 1: Truncate long tool results
  pruned = pruned.map(m => {
    if (m.role === 'tool' && estimateTokens(m.content) > strategy.maxToolResultTokens) {
      return { ...m, content: truncateToolResult(m.content, strategy.maxToolResultTokens) };
    }
    return m;
  });
  
  // Step 2: If still over budget, compress old conversation turns
  if (estimateMessagesTokens(pruned) > budget.availableForHistory) {
    pruned = compressOldTurns(pruned, strategy.summarizeAfterTurns);
  }
  
  // Step 3: If still over, drop tool results (keep tool calls for context)
  // Step 4: If still over, drop oldest turns entirely
  
  return pruned;
}

2. Lift round-summary pattern from deep research

runResearchLoop.ts already generates per-round summaries to compress long tool interactions:

// From runResearchLoop.ts — adapt for main agentic loop
function summarizeToolInteraction(
  toolCalls: ToolCall[],
  toolResults: ToolResult[]
): string {
  return toolCalls.map((tc, i) => {
    const result = toolResults[i];
    return `- ${tc.name}(${summarizeArgs(tc.arguments)}): ${
      result.success ? truncate(result.content, 200) : `ERROR: ${result.error}`
    }`;
  }).join('\n');
}

// Replace old tool messages with compressed summary
function compressOldTurns(messages: any[], keepLast: number): any[] {
  // Find assistant+tool message groups older than keepLast turns
  // Replace each group with a single system message: "Previous interaction summary: ..."
}

3. Add token budget UI indicator

Optional: show a small progress bar in the chat UI indicating context usage:

Context: ████████░░ 6,240 / 8,192 tokens

Files to Create/Modify

File Change
src/hooks/useGglibRuntime/agentLoop.ts Implement smartPrune() with priority-based truncation
src/utils/tokenEstimator.ts Add truncateToTokens() helper
src/hooks/useGglibRuntime/runAgenticLoop.ts Wire smart truncation into main loop
src/components/ContextBudgetIndicator.tsx Create (optional) — token usage progress bar

Acceptance Criteria

  • Tool results are truncated before older messages are dropped
  • Conversation summaries replace detailed tool interactions when compressing
  • System messages and last user query are never pruned
  • Smart truncation preserves more useful context than simple character cut-off
  • Round-summary pattern from deep research adapted for main agentic loop
  • Agentic loop completes successfully even with very long conversations
  • Optional: token budget indicator visible in chat UI

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions