Looking at this diff, the main token-cost hotspots I see are:
calculateReviewHash / semantic cache — already added, good
getMemoizedBatchSummary — memoizes diff fetches within a run, but doesn't persist across runs
batchFiles — fixed 10-file batches regardless of file size/churn
pruneCache — capped at 15 entries, LRU-naive (insertion order)
Here are the highest-leverage optimizations:
1. Progressive / tiered review (biggest win)
Instead of sending all roles (SECURITY, PERFORMANCE, STYLE, ARCHITECTURE) to the LLM unconditionally, gate expensive roles behind a cheap pre-filter:
// codeReviewOrchestrator.ts
// Tier 0: cheap heuristic pre-filter (no LLM call)
const HIGH_RISK_PATTERNS = /\.(env|key|pem|secret)|auth|password|token|crypto|jwt/i;
function shouldSkipRole(role: CodeReviewRole, changedFiles: string[]): boolean {
if (role === 'SECURITY') {
// Only run security review if files touch auth/secrets surface
return !changedFiles.some(f => HIGH_RISK_PATTERNS.test(f));
}
if (role === 'PERFORMANCE') {
// Skip perf review for doc/config-only changes
return changedFiles.every(f => /\.(md|json|yaml|yml|txt)$/.test(f));
}
return false;
}
// In the task queue loop:
for (const role of roles) {
if (shouldSkipRole(role, batch)) {
console.log(`⏭️ Skipping ${role} review — no matching surface in batch`);
continue;
}
taskQueue.push(/* ... */);
}
This avoids 1–2 LLM calls per batch on pure config/doc PRs.
2. Smarter cache key + LRU eviction
The current pruneCache keeps the last N by insertion order, which is fine but doesn't account for result value — a fail result is more useful to keep than a pass. Add frequency-weighted eviction:
// codeReviewUtils.ts
interface CacheEntry {
result: CodeReviewResult;
hits: number;
ts: number; // unix ms
}
export function pruneCache(
cache: Record<string, CacheEntry>,
maxEntries = 15
): Record<string, CacheEntry> {
const keys = Object.keys(cache);
if (keys.length <= maxEntries) return cache;
// Score: fail/warn entries are 3x more valuable than pass
const scored = keys.map(k => ({
k,
score: (cache[k].hits + 1) *
(cache[k].result.llmVerdict === 'pass' ? 1 : 3) *
(1 / (Date.now() - cache[k].ts + 1)), // recency
}));
scored.sort((a, b) => b.score - a.score);
return Object.fromEntries(
scored.slice(0, maxEntries).map(({ k }) => [k, cache[k]])
);
}
Also increment hits on cache hit in the orchestrator so the scoring stays accurate.
3. Diff budget per batch (token-aware batching)
The current batchFiles(changedFiles, 10) splits by file count, but a 10-file batch of large TypeScript files can be 10× the tokens of a 10-file batch of config files. Add a token budget:
// codeReviewUtils.ts
const TOKENS_PER_CHAR = 0.25; // rough estimate
const MAX_TOKENS_PER_BATCH = 8_000;
export function batchFilesByTokenBudget(
files: string[],
diffSizes: Record<string, number>, // filename → char count
maxTokens = MAX_TOKENS_PER_BATCH,
maxFiles = 10
): string[][] {
const batches: string[][] = [];
let current: string[] = [];
let currentTokens = 0;
for (const file of files) {
const fileTokens = Math.ceil((diffSizes[file] ?? 500) * TOKENS_PER_CHAR);
if (current.length >= maxFiles || (current.length > 0 && currentTokens + fileTokens > maxTokens)) {
batches.push(current);
current = [];
currentTokens = 0;
}
current.push(file);
currentTokens += fileTokens;
}
if (current.length) batches.push(current);
return batches;
}
You'd need to pass diff char counts from getCodeDiffSummary — it already fetches the diff, so you can measure string lengths before batching.
4. Cross-run diff cache (persist hash → skip re-fetch)
Right now getMemoizedBatchSummary only caches within a single run. If the same files appear in consecutive PR pushes with no diff change, you re-fetch anyway. Persist a lightweight diff hash alongside the review cache:
// In CodeReviewState:
export interface CodeReviewState {
findings: ReviewFinding[];
cache?: Record<string, CacheEntry>;
diffHashes?: Record<string, string>; // filename → sha256(diff)
}
// Before fetching diff for a batch:
const batchDiffHash = sha256(batch.join(',') + prNumber);
if (prevState?.diffHashes?.[batchDiffHash]) {
// files haven't changed between pushes — use cached review result
}
This is most valuable for PRs with many force-push cycles (common with Jules-generated PRs).
5. Early exit on role consensus
If SECURITY and ARCHITECTURE both return fail, don't bother running STYLE and PERFORMANCE — the PR is already blocked:
// After each worker resolves:
const failCount = allResults.filter(r => r.llmVerdict === 'fail').length;
if (failCount >= 2) {
console.log('⚡ Early exit: 2+ fail verdicts — skipping remaining roles');
taskQueue.length = 0; // drain queue
}
Priority ranking
| Optimization |
Est. token savings |
Complexity |
| Progressive role gating (#1) |
25–50% on config PRs |
Low |
| Token-aware batching (#3) |
10–30% on large PRs |
Medium |
| Early exit on consensus (#5) |
15–25% on bad PRs |
Low |
| Cross-run diff cache (#4) |
20–40% on iterative PRs |
Medium |
| LRU eviction improvement (#2) |
Marginal (cache quality) |
Low |
I'd ship #1 and #5 first — they're both a few lines and have zero risk of breaking the review pipeline.
Looking at this diff, the main token-cost hotspots I see are:
calculateReviewHash/ semantic cache — already added, goodgetMemoizedBatchSummary— memoizes diff fetches within a run, but doesn't persist across runsbatchFiles— fixed 10-file batches regardless of file size/churnpruneCache— capped at 15 entries, LRU-naive (insertion order)Here are the highest-leverage optimizations:
1. Progressive / tiered review (biggest win)
Instead of sending all roles (SECURITY, PERFORMANCE, STYLE, ARCHITECTURE) to the LLM unconditionally, gate expensive roles behind a cheap pre-filter:
This avoids 1–2 LLM calls per batch on pure config/doc PRs.
2. Smarter cache key + LRU eviction
The current
pruneCachekeeps the last N by insertion order, which is fine but doesn't account for result value — afailresult is more useful to keep than apass. Add frequency-weighted eviction:Also increment
hitson cache hit in the orchestrator so the scoring stays accurate.3. Diff budget per batch (token-aware batching)
The current
batchFiles(changedFiles, 10)splits by file count, but a 10-file batch of large TypeScript files can be 10× the tokens of a 10-file batch of config files. Add a token budget:You'd need to pass diff char counts from
getCodeDiffSummary— it already fetches the diff, so you can measure string lengths before batching.4. Cross-run diff cache (persist hash → skip re-fetch)
Right now
getMemoizedBatchSummaryonly caches within a single run. If the same files appear in consecutive PR pushes with no diff change, you re-fetch anyway. Persist a lightweight diff hash alongside the review cache:This is most valuable for PRs with many force-push cycles (common with Jules-generated PRs).
5. Early exit on role consensus
If SECURITY and ARCHITECTURE both return
fail, don't bother running STYLE and PERFORMANCE — the PR is already blocked:Priority ranking
I'd ship #1 and #5 first — they're both a few lines and have zero risk of breaking the review pipeline.