Skip to content

Fetch full file context during review, not just diffs#700

Closed
iamnbutler wants to merge 1 commit intomainfrom
tasks/gh-iamnbutler-tasks-issue-668--92055ba7
Closed

Fetch full file context during review, not just diffs#700
iamnbutler wants to merge 1 commit intomainfrom
tasks/gh-iamnbutler-tasks-issue-668--92055ba7

Conversation

@iamnbutler
Copy link
Copy Markdown
Owner

Summary

  • Adds proactive context fetching to the orchestrator's PR evaluation flow: before pass 1, the diff is parsed to identify the most-changed files (up to 3), and surrounding code (±50 lines around each changed line) is fetched from the PR branch and included in the evaluation prompt
  • Adds a new diff module with a unified diff parser (parse_diff_files) and context windowing (extract_context_window) with configurable limits (max 3 files, ~1000 lines total, ±50 line window)
  • Updates the system prompt to instruct the reviewer to use the proactive file context for checking callers/dependents

Test plan

  • All 65 orchestrator tests pass (including 7 new tests for diff parsing, context extraction, and prompt integration)
  • Full workspace compiles cleanly
  • Verify proactive context appears in evaluation prompts during live review
  • Confirm token budget stays reasonable with context included

Closes #668

🤖 Generated with Claude Code

Add proactive context fetching to the orchestrator's PR evaluation flow.
Before pass 1, the diff is parsed to identify the most-changed files
(up to 3), and surrounding code (±50 lines around each change) is
fetched from the PR branch and included in the evaluation prompt.

- Add `diff` module with unified diff parser and context windowing
- Extend `build_evaluation_prompt` with `file_contexts` parameter
- Update system prompt to instruct reviewer to use file context
- Fetch proactive context in `evaluate()` before pass 1 LLM call
- Add comprehensive tests for diff parsing and context extraction

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds proactive context fetching to the Pass 1 evaluation flow. The approach is well-scoped: parse the diff, rank files by change count, fetch ±50-line windows around changed lines for the top 3 files, and include them in the prompt. The new diff module is cleanly isolated with good unit tests. All 65 tests pass, clippy is clean.

Two suggestions inline — no blockers.

Full Review

Summary

What changed: New diff module (parse_diff_files, extract_context_window) + proactive context fetching loop in ClaudeOrchestrator::evaluate() before Pass 1 + build_evaluation_prompt extended with file_contexts parameter.

Correctness

  • Error handling in the fetch loop is correct: Ok(None) (new file, not yet on branch) is silently skipped; Err is warned and skipped. Neither case aborts evaluation — appropriate for a best-effort enhancement.
  • build_deep_review_prompt correctly passes &[] for file contexts — deep review already includes full file contents, so no duplication.
  • std::mem::take correctly clears changed_lines on each file flush.
  • Deleted files (+++ /dev/null) are correctly excluded: current_path = None prevents accumulation, and the empty changed_lines guard would catch it even if it didn't.
  • The parse_hunk_new_start parser handles @@ -0,0 +1,5 @@ (new file) and single-number hunks (@@ -1 +1 @@) correctly, as verified by the tests.

Spec Compliance

No spec impact — this is an internal orchestrator enhancement to the evaluation prompt.

Architecture

Crate boundaries are correct. The diff module lives in orchestrator where it's used; the public visibility is consistent with other orchestrator modules. get_file_content_at_ref is an existing GitHub client method — no new API surface needed.

Code Quality

  • Constants (MAX_CONTEXT_FILES, MAX_CONTEXT_LINES, CONTEXT_WINDOW) are well-named and centralized.
  • The two suggestions (budget enforcement + removal-only files) are worth addressing but not blockers.
  • Test coverage is solid for the pure functions; the fetch loop in claude.rs is untested but that's consistent with how the rest of ClaudeOrchestrator::evaluate() is structured (it makes real network calls).

Reviewed by PR / Review

context_lines = line_count,
"Fetched proactive context"
);
total_lines += line_count;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SUGGESTION] Priority: Correctness

MAX_CONTEXT_LINES is checked before fetching the next file, but never enforced after extracting context for a single file. A file with many scattered changes can blow well past 1000 lines — e.g., 50 changes each 110 lines apart yields ~50 × 101 = ~5,000 output lines from extract_context_window.

The const comment says "Maximum total lines of context to include (across all files)" but the current loop treats it as a "stop fetching more files after exceeding the budget" cap, not a hard cap.

If token budget matters here (and it will on large refactoring PRs), consider one of:

// Option A: enforce a per-file cap inside extract_context_window (simplest)
let context_window = extract_context_window(
    &content,
    &file.changed_lines,
    CONTEXT_WINDOW,
);
let context_window_capped: String = context_window
    .lines()
    .take(MAX_CONTEXT_LINES.saturating_sub(total_lines))
    .collect::(Vec<_)>()
    .join("\n");

Or truncate before pushing. Either way, updating the doc comment on MAX_CONTEXT_LINES to match actual behavior would clarify intent.

} else if current_path.is_some() && new_line_num > 0 {
if line.starts_with('+') && !line.starts_with("+++") {
// Added line
changed_lines.push(new_line_num);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SUGGESTION] Priority: Code Quality

Only added lines are pushed to changed_lines; removed lines only increment change_count. As a result, a file whose diff is entirely removals (e.g., deleting a function) ends up with empty changed_lines and gets filtered out of context fetching at the flush step (if !changed_lines.is_empty()).

This is arguably correct — there's no "new" line position to anchor a context window to — but it creates a blind spot for the exact case the system prompt calls out: "For removals: is anything left that depends on the removed code?" That check needs context around the removal site in the post-PR file, which is currently not fetched.

Might be worth a comment explaining the trade-off, or tracking the nearest surviving context line for pure-deletion hunks.

@iamnbutler
Copy link
Copy Markdown
Owner Author

Closing duplicate - newer PR #701 exists

@iamnbutler iamnbutler closed this Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

orchestrator: Fetch full file context during review, not just diffs

1 participant