context token improvements

Looking at this diff, the main token-cost hotspots I see are:

1. **`calculateReviewHash` / semantic cache** — already added, good
2. **`getMemoizedBatchSummary`** — memoizes diff fetches within a run, but doesn't persist across runs
3. **`batchFiles`** — fixed 10-file batches regardless of file size/churn
4. **`pruneCache`** — capped at 15 entries, LRU-naive (insertion order)

Here are the highest-leverage optimizations:

---

## 1. Progressive / tiered review (biggest win)

Instead of sending all roles (SECURITY, PERFORMANCE, STYLE, ARCHITECTURE) to the LLM unconditionally, gate expensive roles behind a cheap pre-filter:

```typescript
// codeReviewOrchestrator.ts

// Tier 0: cheap heuristic pre-filter (no LLM call)
const HIGH_RISK_PATTERNS = /\.(env|key|pem|secret)|auth|password|token|crypto|jwt/i;

function shouldSkipRole(role: CodeReviewRole, changedFiles: string[]): boolean {
  if (role === 'SECURITY') {
    // Only run security review if files touch auth/secrets surface
    return !changedFiles.some(f => HIGH_RISK_PATTERNS.test(f));
  }
  if (role === 'PERFORMANCE') {
    // Skip perf review for doc/config-only changes
    return changedFiles.every(f => /\.(md|json|yaml|yml|txt)$/.test(f));
  }
  return false;
}

// In the task queue loop:
for (const role of roles) {
  if (shouldSkipRole(role, batch)) {
    console.log(`⏭️  Skipping ${role} review — no matching surface in batch`);
    continue;
  }
  taskQueue.push(/* ... */);
}
```

This avoids 1–2 LLM calls per batch on pure config/doc PRs.

---

## 2. Smarter cache key + LRU eviction

The current `pruneCache` keeps the last N by insertion order, which is fine but doesn't account for result value — a `fail` result is more useful to keep than a `pass`. Add frequency-weighted eviction:

```typescript
// codeReviewUtils.ts

interface CacheEntry {
  result: CodeReviewResult;
  hits: number;
  ts: number; // unix ms
}

export function pruneCache(
  cache: Record<string, CacheEntry>,
  maxEntries = 15
): Record<string, CacheEntry> {
  const keys = Object.keys(cache);
  if (keys.length <= maxEntries) return cache;

  // Score: fail/warn entries are 3x more valuable than pass
  const scored = keys.map(k => ({
    k,
    score: (cache[k].hits + 1) *
           (cache[k].result.llmVerdict === 'pass' ? 1 : 3) *
           (1 / (Date.now() - cache[k].ts + 1)), // recency
  }));
  scored.sort((a, b) => b.score - a.score);

  return Object.fromEntries(
    scored.slice(0, maxEntries).map(({ k }) => [k, cache[k]])
  );
}
```

Also increment `hits` on cache hit in the orchestrator so the scoring stays accurate.

---

## 3. Diff budget per batch (token-aware batching)

The current `batchFiles(changedFiles, 10)` splits by file count, but a 10-file batch of large TypeScript files can be 10× the tokens of a 10-file batch of config files. Add a token budget:

```typescript
// codeReviewUtils.ts

const TOKENS_PER_CHAR = 0.25; // rough estimate
const MAX_TOKENS_PER_BATCH = 8_000;

export function batchFilesByTokenBudget(
  files: string[],
  diffSizes: Record<string, number>, // filename → char count
  maxTokens = MAX_TOKENS_PER_BATCH,
  maxFiles = 10
): string[][] {
  const batches: string[][] = [];
  let current: string[] = [];
  let currentTokens = 0;

  for (const file of files) {
    const fileTokens = Math.ceil((diffSizes[file] ?? 500) * TOKENS_PER_CHAR);
    if (current.length >= maxFiles || (current.length > 0 && currentTokens + fileTokens > maxTokens)) {
      batches.push(current);
      current = [];
      currentTokens = 0;
    }
    current.push(file);
    currentTokens += fileTokens;
  }
  if (current.length) batches.push(current);
  return batches;
}
```

You'd need to pass diff char counts from `getCodeDiffSummary` — it already fetches the diff, so you can measure string lengths before batching.

---

## 4. Cross-run diff cache (persist hash → skip re-fetch)

Right now `getMemoizedBatchSummary` only caches within a single run. If the same files appear in consecutive PR pushes with no diff change, you re-fetch anyway. Persist a lightweight diff hash alongside the review cache:

```typescript
// In CodeReviewState:
export interface CodeReviewState {
  findings: ReviewFinding[];
  cache?: Record<string, CacheEntry>;
  diffHashes?: Record<string, string>; // filename → sha256(diff)
}

// Before fetching diff for a batch:
const batchDiffHash = sha256(batch.join(',') + prNumber);
if (prevState?.diffHashes?.[batchDiffHash]) {
  // files haven't changed between pushes — use cached review result
}
```

This is most valuable for PRs with many force-push cycles (common with Jules-generated PRs).

---

## 5. Early exit on role consensus

If SECURITY and ARCHITECTURE both return `fail`, don't bother running STYLE and PERFORMANCE — the PR is already blocked:

```typescript
// After each worker resolves:
const failCount = allResults.filter(r => r.llmVerdict === 'fail').length;
if (failCount >= 2) {
  console.log('⚡ Early exit: 2+ fail verdicts — skipping remaining roles');
  taskQueue.length = 0; // drain queue
}
```

---

## Priority ranking

| Optimization | Est. token savings | Complexity |
|---|---|---|
| Progressive role gating (#1) | 25–50% on config PRs | Low |
| Token-aware batching (#3) | 10–30% on large PRs | Medium |
| Early exit on consensus (#5) | 15–25% on bad PRs | Low |
| Cross-run diff cache (#4) | 20–40% on iterative PRs | Medium |
| LRU eviction improvement (#2) | Marginal (cache quality) | Low |

I'd ship #1 and #5 first — they're both a few lines and have zero risk of breaking the review pipeline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

context token improvements #2968

1. Progressive / tiered review (biggest win)

2. Smarter cache key + LRU eviction

3. Diff budget per batch (token-aware batching)

4. Cross-run diff cache (persist hash → skip re-fetch)

5. Early exit on role consensus

Priority ranking

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Optimization	Est. token savings	Complexity
Progressive role gating (#1)	25–50% on config PRs	Low
Token-aware batching (#3)	10–30% on large PRs	Medium
Early exit on consensus (#5)	15–25% on bad PRs	Low
Cross-run diff cache (#4)	20–40% on iterative PRs	Medium
LRU eviction improvement (#2)	Marginal (cache quality)	Low

context token improvements #2968

Description

1. Progressive / tiered review (biggest win)

2. Smarter cache key + LRU eviction

3. Diff budget per batch (token-aware batching)

4. Cross-run diff cache (persist hash → skip re-fetch)

5. Early exit on role consensus

Priority ranking

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions