Skip to content

context token improvements #2968

Description

@arii

Looking at this diff, the main token-cost hotspots I see are:

  1. calculateReviewHash / semantic cache — already added, good
  2. getMemoizedBatchSummary — memoizes diff fetches within a run, but doesn't persist across runs
  3. batchFiles — fixed 10-file batches regardless of file size/churn
  4. pruneCache — capped at 15 entries, LRU-naive (insertion order)

Here are the highest-leverage optimizations:


1. Progressive / tiered review (biggest win)

Instead of sending all roles (SECURITY, PERFORMANCE, STYLE, ARCHITECTURE) to the LLM unconditionally, gate expensive roles behind a cheap pre-filter:

// codeReviewOrchestrator.ts

// Tier 0: cheap heuristic pre-filter (no LLM call)
const HIGH_RISK_PATTERNS = /\.(env|key|pem|secret)|auth|password|token|crypto|jwt/i;

function shouldSkipRole(role: CodeReviewRole, changedFiles: string[]): boolean {
  if (role === 'SECURITY') {
    // Only run security review if files touch auth/secrets surface
    return !changedFiles.some(f => HIGH_RISK_PATTERNS.test(f));
  }
  if (role === 'PERFORMANCE') {
    // Skip perf review for doc/config-only changes
    return changedFiles.every(f => /\.(md|json|yaml|yml|txt)$/.test(f));
  }
  return false;
}

// In the task queue loop:
for (const role of roles) {
  if (shouldSkipRole(role, batch)) {
    console.log(`⏭️  Skipping ${role} review — no matching surface in batch`);
    continue;
  }
  taskQueue.push(/* ... */);
}

This avoids 1–2 LLM calls per batch on pure config/doc PRs.


2. Smarter cache key + LRU eviction

The current pruneCache keeps the last N by insertion order, which is fine but doesn't account for result value — a fail result is more useful to keep than a pass. Add frequency-weighted eviction:

// codeReviewUtils.ts

interface CacheEntry {
  result: CodeReviewResult;
  hits: number;
  ts: number; // unix ms
}

export function pruneCache(
  cache: Record<string, CacheEntry>,
  maxEntries = 15
): Record<string, CacheEntry> {
  const keys = Object.keys(cache);
  if (keys.length <= maxEntries) return cache;

  // Score: fail/warn entries are 3x more valuable than pass
  const scored = keys.map(k => ({
    k,
    score: (cache[k].hits + 1) *
           (cache[k].result.llmVerdict === 'pass' ? 1 : 3) *
           (1 / (Date.now() - cache[k].ts + 1)), // recency
  }));
  scored.sort((a, b) => b.score - a.score);

  return Object.fromEntries(
    scored.slice(0, maxEntries).map(({ k }) => [k, cache[k]])
  );
}

Also increment hits on cache hit in the orchestrator so the scoring stays accurate.


3. Diff budget per batch (token-aware batching)

The current batchFiles(changedFiles, 10) splits by file count, but a 10-file batch of large TypeScript files can be 10× the tokens of a 10-file batch of config files. Add a token budget:

// codeReviewUtils.ts

const TOKENS_PER_CHAR = 0.25; // rough estimate
const MAX_TOKENS_PER_BATCH = 8_000;

export function batchFilesByTokenBudget(
  files: string[],
  diffSizes: Record<string, number>, // filename → char count
  maxTokens = MAX_TOKENS_PER_BATCH,
  maxFiles = 10
): string[][] {
  const batches: string[][] = [];
  let current: string[] = [];
  let currentTokens = 0;

  for (const file of files) {
    const fileTokens = Math.ceil((diffSizes[file] ?? 500) * TOKENS_PER_CHAR);
    if (current.length >= maxFiles || (current.length > 0 && currentTokens + fileTokens > maxTokens)) {
      batches.push(current);
      current = [];
      currentTokens = 0;
    }
    current.push(file);
    currentTokens += fileTokens;
  }
  if (current.length) batches.push(current);
  return batches;
}

You'd need to pass diff char counts from getCodeDiffSummary — it already fetches the diff, so you can measure string lengths before batching.


4. Cross-run diff cache (persist hash → skip re-fetch)

Right now getMemoizedBatchSummary only caches within a single run. If the same files appear in consecutive PR pushes with no diff change, you re-fetch anyway. Persist a lightweight diff hash alongside the review cache:

// In CodeReviewState:
export interface CodeReviewState {
  findings: ReviewFinding[];
  cache?: Record<string, CacheEntry>;
  diffHashes?: Record<string, string>; // filename → sha256(diff)
}

// Before fetching diff for a batch:
const batchDiffHash = sha256(batch.join(',') + prNumber);
if (prevState?.diffHashes?.[batchDiffHash]) {
  // files haven't changed between pushes — use cached review result
}

This is most valuable for PRs with many force-push cycles (common with Jules-generated PRs).


5. Early exit on role consensus

If SECURITY and ARCHITECTURE both return fail, don't bother running STYLE and PERFORMANCE — the PR is already blocked:

// After each worker resolves:
const failCount = allResults.filter(r => r.llmVerdict === 'fail').length;
if (failCount >= 2) {
  console.log('⚡ Early exit: 2+ fail verdicts — skipping remaining roles');
  taskQueue.length = 0; // drain queue
}

Priority ranking

Optimization Est. token savings Complexity
Progressive role gating (#1) 25–50% on config PRs Low
Token-aware batching (#3) 10–30% on large PRs Medium
Early exit on consensus (#5) 15–25% on bad PRs Low
Cross-run diff cache (#4) 20–40% on iterative PRs Medium
LRU eviction improvement (#2) Marginal (cache quality) Low

I'd ship #1 and #5 first — they're both a few lines and have zero risk of breaking the review pipeline.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions