-
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
component: frontendReact/TypeScript UIReact/TypeScript UIllm: inferenceModel loading/inferenceModel loading/inferencepriority: lowNice to haveNice to havesize: m4-8 hours (half to full day)4-8 hours (half to full day)type: featureNew functionality or enhancementNew functionality or enhancement
Description
Parent: #249 — Token-based context budgeting
Goal
Implement smart truncation strategies and lift the round-summary pattern from deep research to compress context when nearing limits.
Background
Phase 1 (#271) adds token estimation and basic budget. This phase adds intelligent strategies for staying within budget while preserving important context.
Implementation
1. Smart truncation strategies
Instead of simply dropping old messages, implement a priority-based truncation:
interface TruncationStrategy {
/** Messages to always keep (system, last user query) */
pinned: Set<number>;
/** Compress tool results beyond length limit */
maxToolResultTokens: number;
/** Summarize older conversation turns instead of dropping */
summarizeAfterTurns: number;
}
function smartPrune(
messages: any[],
budget: ContextBudget,
strategy: TruncationStrategy
): any[] {
let pruned = [...messages];
// Step 1: Truncate long tool results
pruned = pruned.map(m => {
if (m.role === 'tool' && estimateTokens(m.content) > strategy.maxToolResultTokens) {
return { ...m, content: truncateToolResult(m.content, strategy.maxToolResultTokens) };
}
return m;
});
// Step 2: If still over budget, compress old conversation turns
if (estimateMessagesTokens(pruned) > budget.availableForHistory) {
pruned = compressOldTurns(pruned, strategy.summarizeAfterTurns);
}
// Step 3: If still over, drop tool results (keep tool calls for context)
// Step 4: If still over, drop oldest turns entirely
return pruned;
}2. Lift round-summary pattern from deep research
runResearchLoop.ts already generates per-round summaries to compress long tool interactions:
// From runResearchLoop.ts — adapt for main agentic loop
function summarizeToolInteraction(
toolCalls: ToolCall[],
toolResults: ToolResult[]
): string {
return toolCalls.map((tc, i) => {
const result = toolResults[i];
return `- ${tc.name}(${summarizeArgs(tc.arguments)}): ${
result.success ? truncate(result.content, 200) : `ERROR: ${result.error}`
}`;
}).join('\n');
}
// Replace old tool messages with compressed summary
function compressOldTurns(messages: any[], keepLast: number): any[] {
// Find assistant+tool message groups older than keepLast turns
// Replace each group with a single system message: "Previous interaction summary: ..."
}3. Add token budget UI indicator
Optional: show a small progress bar in the chat UI indicating context usage:
Context: ████████░░ 6,240 / 8,192 tokens
Files to Create/Modify
| File | Change |
|---|---|
src/hooks/useGglibRuntime/agentLoop.ts |
Implement smartPrune() with priority-based truncation |
src/utils/tokenEstimator.ts |
Add truncateToTokens() helper |
src/hooks/useGglibRuntime/runAgenticLoop.ts |
Wire smart truncation into main loop |
src/components/ContextBudgetIndicator.tsx |
Create (optional) — token usage progress bar |
Acceptance Criteria
- Tool results are truncated before older messages are dropped
- Conversation summaries replace detailed tool interactions when compressing
- System messages and last user query are never pruned
- Smart truncation preserves more useful context than simple character cut-off
- Round-summary pattern from deep research adapted for main agentic loop
- Agentic loop completes successfully even with very long conversations
- Optional: token budget indicator visible in chat UI
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
component: frontendReact/TypeScript UIReact/TypeScript UIllm: inferenceModel loading/inferenceModel loading/inferencepriority: lowNice to haveNice to havesize: m4-8 hours (half to full day)4-8 hours (half to full day)type: featureNew functionality or enhancementNew functionality or enhancement