Skip to content

GTST-18 fix: handle untracked files in gitsift stage#15

Merged
MonteYin merged 4 commits intomainfrom
GTST-18.fix/untracked-files-staging
Mar 25, 2026
Merged

GTST-18 fix: handle untracked files in gitsift stage#15
MonteYin merged 4 commits intomainfrom
GTST-18.fix/untracked-files-staging

Conversation

@MonteYin
Copy link
Copy Markdown
Owner

Summary

Fix: gitsift stage now handles untracked/new files correctly instead of failing with "index does not contain".

Problem

Previously, if the working tree contained any untracked files, gitsift stage would fail for ALL hunks — even hunks belonging to tracked files. The error was:

Error: failed to apply selected hunks to index
Caused by: index does not contain 'new_file.py'

Fix

  • scan_hunk_metadata() — identifies which hunks belong to untracked vs tracked files
  • Untracked file hunks are staged via index.add_path() directly (bypasses apply())
  • Remaining untracked files are added to the index before apply() to prevent failures
  • Tracked file hunks are staged via repo.apply() as before

Tests (2 new, 81 total)

  • stage_with_untracked_files_present — staging tracked hunks works when untracked files exist
  • stage_untracked_file_hunk_directly — staging an untracked file's hunk works and file content is correct in index

Manually verified

  • Built release binary, tested in real repo with mixed tracked + untracked changes
  • Both hunks staged successfully, status shows 0 unstaged

- Scan diff for untracked files before staging
- Untracked file hunks staged via index.add_path() directly
- Tracked file hunks staged via repo.apply() as before
- Remaining untracked files added to index before apply() to prevent
  "index does not contain" failures
- 2 new tests: stage with untracked present, stage untracked directly
@MonteYin
Copy link
Copy Markdown
Owner Author

Review -- Claude Opus 4.6

Review Result

Overall Verdict: HAS_ISSUES


1. Correctness

Verdict: HAS_ISSUES

Issues

Critical
  • Unintended staging of ALL untracked files when staging only tracked hunks (src/git/stage.rs:303-312): When the user requests to stage only tracked file hunks (i.e., tracked_ids is non-empty but untracked_paths is empty), the code enters the block at line 303 and collects ALL untracked file paths from hunk_metadata at lines 305-311, then calls stage_untracked_files(&repo, &all_untracked) at line 312. This silently adds every untracked file in the working tree to the index, even though the user never requested them. This is a data-integrity issue -- the user's staging area is corrupted with files they did not ask to stage. The test stage_with_untracked_files_present (line 928) does not catch this because it never asserts that new_file.txt is absent from the index after the operation.
Important
  • Staged count semantics for untracked hunks (src/git/stage.rs:298-299): stage_untracked_files returns the number of unique file paths, and this is added to staged. But the user submits hunk IDs, not file paths. If an untracked file theoretically had multiple hunks (or if two hunk IDs mapped to the same file through hash collision), the staged count would undercount. In practice, untracked files always have exactly one hunk, so this is functionally correct today, but the semantics are fragile. Consider counting the number of hunk IDs that mapped to untracked files rather than unique paths.

  • Line-level staging does not handle untracked files (src/git/stage.rs:401-453): The line-level path calls collect_hunks() which includes untracked file content (via include_untracked(true) + show_untracked_content(true)), so an untracked file's hunk will be found and reconstruct_patch will be called. However, reconstruct_patch generates --- a/{path} (line 147), which is incorrect for new files (should be --- /dev/null). The subsequent repo.apply() will likely fail because the file doesn't exist in the index. This is a pre-existing issue not introduced by this PR, but since this PR addresses untracked file handling in the hunk-level path, it would be consistent to either (a) handle untracked files in the line-level path too, or (b) explicitly reject untracked file line-selections with a clear error message using the new hunk_metadata map.

Minor
  • Missing test for the "only untracked" edge case: There is no test where ALL requested hunk IDs are untracked (ensuring the tracked_ids.is_empty() branch is taken and apply() is correctly skipped). The stage_untracked_file_hunk_directly test includes tracked file changes in the working tree but only requests the untracked hunk -- this partially covers it, but the tracked file changes still exist and tracked_ids will be empty so the apply block is skipped. This path works correctly by inspection, but an explicit test would be good.

  • Missing test for mixed tracked + untracked hunk IDs in a single request: There's no test that stages both a tracked hunk and an untracked hunk in the same request. This would exercise the full flow: untracked staged first, then the apply path for tracked hunks.

Fix Instructions

Critical fix -- The "add remaining untracked to index" step at lines 303-312 must not stage files the user didn't request. Instead of adding ALL untracked files, it should add only the untracked files whose content is needed to make apply() succeed. The correct approach: after staging the explicitly-requested untracked files (which already happened at line 298), add the remaining untracked files to the index temporarily, run apply(), then unstage the unintentionally added files. Alternatively, and more simply: since untracked files that aren't being staged should not affect apply() on tracked files, the re-generated diff at line 318 should exclude untracked files entirely (don't set include_untracked(true) on the new diff options). This is already the case -- lines 315-316 create DiffOptions without include_untracked, so the diff will only contain tracked files. Therefore, the all_untracked staging at lines 305-312 is unnecessary for apply() to succeed AND it causes unintended side effects. The fix is to remove lines 304-312 entirely. The original bug was that include_untracked(true) in the old code caused untracked files to appear in the diff passed to apply(), which failed. The new code already solves this by regenerating the diff WITHOUT include_untracked. The extra stage_untracked_files call is both unnecessary and harmful.

Important fix (staged count) -- Replace the staged += untracked_staged at line 299 with a count of how many hunk IDs from unique_requested mapped to untracked files. For example, count the number of iterations through the Some((path, true)) branch:

let mut untracked_hunk_count = 0;
// In the Some((path, true)) branch:
untracked_hunk_count += 1;
// ...
staged += untracked_hunk_count; // instead of staged += untracked_staged

Important fix (line-level untracked) -- Add a check at the start of the line-level loop: if hunk_metadata.get(&sel.hunk_id) returns Some((_, true)), push an error like "line-level staging not supported for untracked files; use hunk-level staging" and continue. This prevents a confusing apply failure and is a small addition.

Test fix -- Add an assertion to stage_with_untracked_files_present that verifies new_file.txt is NOT in the index after staging:

let index = repo.index().unwrap();
assert!(index.get_path(Path::new("new_file.txt"), 0).is_none(),
    "untracked file should NOT be staged as side effect");

2. Architecture

Verdict: PASS

The design is sound. Separating scan_hunk_metadata() as a pre-pass to classify hunks is a clean approach. The separation of stage_untracked_files() into its own function with a clear contract is good. The flow -- classify, handle untracked, handle tracked -- is logical and easy to follow.

One minor observation: there is now significant duplication of the "iterate diff, extract path, compute hunk ID" pattern across scan_hunk_metadata, the available-ID scan inside the hunk-level block, and collect_hunks. This is a pre-existing pattern (noted in prior reviews) and not something this PR should fix, but it's worth tracking.


3. Security

Verdict: PASS

No security issues. File paths come from git2's diff output (the repository's own state). index.add_path() is bounded to the working tree by libgit2. No user-controlled strings are used in shell commands or similar injection vectors.


4. Performance

Verdict: PASS

Minor observations

  • scan_hunk_metadata adds one extra full diff iteration at the start of every stage_selection call, even if the request contains no untracked files. This is a minor cost -- the diff is computed against the working tree, which is the same operation already done later. For typical repository sizes this is negligible, and the code clarity benefit outweighs the cost.

  • The untracked_paths.contains(path) check at line 287 is O(n) per iteration, but the number of untracked files in a single staging request will always be small (single digits typically), so this is fine. If it ever mattered, switching to a HashSet would fix it.


5. Simplicity

Verdict: PASS

The approach is straightforward: scan once to classify, then branch. The new functions are focused and well-named. The code reads clearly.

Minor note: the RefCell-based pattern in scan_hunk_metadata with a current cell and a metadata cell mirrors the existing patterns in the file, maintaining consistency.


Summary

This PR correctly identifies the root cause (libgit2's apply() fails when untracked files appear in the diff) and introduces a clean separation between untracked and tracked file staging. However, it has one critical bug: when staging tracked hunks in the presence of untracked files, ALL untracked files are silently added to the index as a side effect of the "add remaining untracked to index" step. The irony is that this step is unnecessary -- the re-generated diff at line 318 already excludes untracked files by not setting include_untracked(true), so apply() will succeed without the extra staging. The fix is to remove lines 304-312. Additionally, the line-level staging path should explicitly reject untracked files with a clear error rather than failing with a confusing apply error.

- Remove silent staging of all untracked files when only tracked hunks
  requested (was corrupting staging area)
- Count untracked hunks instead of file paths for staged count
- Reject line-level staging for untracked files with clear error message
- Add assertion that untracked file is NOT in index after tracked-only staging
Cover the case where both tracked and untracked hunk IDs are in
the same request — verifies both are staged correctly.
@MonteYin MonteYin merged commit fb433bc into main Mar 25, 2026
4 checks passed
@MonteYin MonteYin deleted the GTST-18.fix/untracked-files-staging branch March 25, 2026 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant