Skip to content

GTST-21 feat: add hunk-level checkout and diff --staged support#18

Merged
MonteYin merged 2 commits intomainfrom
GTST-21.feat/checkout
Mar 30, 2026
Merged

GTST-21 feat: add hunk-level checkout and diff --staged support#18
MonteYin merged 2 commits intomainfrom
GTST-21.feat/checkout

Conversation

@MonteYin
Copy link
Copy Markdown
Owner

Summary

  • Add gitsift checkout subcommand for selective hunk-level discard of changes
    • --hunk-ids to discard unstaged hunks (workdir → index)
    • --staged --hunk-ids to discard staged hunks (index → HEAD)
    • --from-stdin for JSON input
  • Add gitsift diff --staged to view staged changes (HEAD vs index)
  • Add checkout method to JSON-lines protocol with staged param
  • Add staged param to protocol diff method
  • Update README, skill, and documentation

Implementation

  • New src/git/checkout.rs — reverse-patch reconstruction (swap +/- lines), apply via Diff::from_buffer() + repo.apply()
  • New diff_staged() in src/git/diff.rs — uses diff_tree_to_index()
  • New model types: CheckoutRequest, CheckoutResult, CheckoutParams
  • Handles edge cases: untracked files (deletion), staged new files (index removal), invalid IDs

Test plan

  • Unit tests in src/git/checkout.rs (8 tests: unstaged/staged, single/all hunks, untracked, invalid ID, empty request, content restoration)
  • Unit tests in src/git/diff.rs (4 tests: staged diff show/empty/filter/new-file)
  • Model serde tests in src/models.rs (5 tests: checkout request/result roundtrips, protocol parse)
  • CLI integration tests in tests/cli.rs (7 tests: checkout unstaged/staged, toon output, diff --staged, full E2E workflow)
  • Edge case tests in tests/edge_cases.rs (3 tests: untracked file delete, from-stdin, protocol checkout)
  • All 125 existing + new tests pass
  • cargo clippy --all-targets — zero warnings
  • cargo fmt --check — clean

Add selective discard of changes at hunk granularity:
- `gitsift checkout --hunk-ids` discards unstaged hunks (workdir → index)
- `gitsift checkout --staged --hunk-ids` discards staged hunks (index → HEAD)
- `gitsift diff --staged` shows staged changes (HEAD vs index)
- Protocol support: checkout method with staged param, diff staged param
- Handles untracked files (deletion) and staged new files (index removal)

New file: src/git/checkout.rs (reverse-patch reconstruction + apply)
Updated: cli, models, diff, output, toon, protocol, main, docs, skill, tests
@MonteYin
Copy link
Copy Markdown
Owner Author

Review -- Opus 4.6

Overall Verdict: HAS_ISSUES


1. Correctness

Verdict: HAS_ISSUES

Issues

Important
  • Multi-hunk reverse-patch application may fail on second hunk when hunks are applied sequentially (src/git/checkout.rs:285-309): In checkout_unstaged, when multiple tracked hunks are requested, the code collects all hunks from a single diff snapshot, then applies reverse patches one at a time in a for hunk_id in &tracked_ids loop. After the first repo.apply() modifies the working tree, subsequent reverse patches may no longer apply cleanly because the working directory has changed but the patches were reconstructed from the original diff state. This is the same issue that stage.rs handles for line-level staging by re-collecting hunks before each apply (see stage.rs:348-405). The hunk-level staging in stage.rs avoids this by using ApplyOptions with hunk_callback to apply all hunks atomically in one repo.apply() call -- but checkout.rs cannot use that approach since it needs reverse patches. The same issue exists in checkout_staged at lines 381-409.

    The tests pass because: (1) The checkout_all_unstaged_hunks_restores_file test works because git2's ApplyLocation::WorkDir is generous with context matching when hunks are far apart (20-line file, changes at lines 2 and 19). (2) There is no test with hunks that are close together where one reverse patch would shift the context for the next.

Minor
  • checkout_unstaged uses diff_opts_with_untracked() at line 279 when generating the diff for tracked hunks, but only tracked hunks remain at that point. This is functionally harmless (untracked hunks just won't match any tracked_ids), but diff_opts_tracked_only() would be more precise, matching the pattern in stage.rs:274.

Fix Instructions

For the Important issue: Either (a) apply all reverse patches in a single concatenated patch string (one Diff::from_buffer + one repo.apply), or (b) re-generate the diff and re-collect hunks after each individual apply, similar to how stage.rs handles line-level staging. Option (a) is more efficient and avoids the re-diff cost:

// Concatenate all reverse patches into one unified diff string
let mut combined_patch = String::new();
let mut patch_count = 0usize;
for hunk_id in &tracked_ids {
    match hunks.get(hunk_id) {
        None => { /* error handling */ }
        Some(raw) => {
            combined_patch.push_str(&reconstruct_reverse_patch(raw));
            patch_count += 1;
        }
    }
}
if !combined_patch.is_empty() {
    let patch_diff = Diff::from_buffer(combined_patch.as_bytes())
        .context("failed to parse combined reverse patch")?;
    repo.apply(&patch_diff, ApplyLocation::WorkDir, None)
        .context("failed to apply combined reverse patch")?;
    discarded += patch_count;
}

For the Minor issue: Change line 279 from super::diff_opts_with_untracked() to super::diff_opts_tracked_only().


2. Architecture

Verdict: PASS

The PR follows established patterns well:

  • checkout.rs mirrors stage.rs structure (scan metadata, separate special cases, apply patches)
  • Shared helpers from git/mod.rs are properly reused (delta_path, hunk_header, is_binary_delta, diff_opts_*)
  • diff_staged in diff.rs reuses the DiffState pattern from diff_unstaged
  • Models (CheckoutRequest, CheckoutResult, CheckoutParams) follow the same serde patterns as their staging counterparts
  • CLI subcommand mirrors the Stage pattern with --hunk-ids, --from-stdin, and adds --staged
  • Protocol integration cleanly extends the existing dispatch pattern
Minor
  • There is significant code duplication between checkout.rs::CollectState/collect_hunks_from_diff and stage.rs::CollectState/collect_hunks. The only difference is that checkout.rs::RawHunk includes new_start. A shared implementation could be extracted to git/mod.rs, but this is a reasonable tradeoff for now given the minor structural difference -- flagging for awareness, not as a blocker.
  • Similarly, diff_staged is a near-copy of diff_unstaged (different only in the diff source). A shared helper taking a git2::Diff could reduce duplication, but again this is a style preference.

3. Security

Verdict: HAS_ISSUES

Issues

Important
  • Path traversal via untracked file deletion (src/git/checkout.rs:268-274): The checkout_unstaged function joins a user-influenced file_path (from git diff metadata) with repo_path and calls std::fs::remove_file. While the path comes from git2's diff output rather than direct user input, a maliciously crafted git repository with symlinks or path components like ../ in filenames could cause deletion of files outside the repository. The stage.rs equivalent (stage_untracked_files) uses index.add_path() which is bounded by libgit2's index operations. Direct filesystem remove_file does not have that safety boundary.

Fix Instructions

Canonicalize the path and verify it is within the repository before deletion:

for path in &untracked_paths {
    let full_path = repo_path.join(path);
    let canonical = full_path.canonicalize()
        .with_context(|| format!("failed to resolve path: {path}"))?;
    let repo_canonical = repo_path.canonicalize()
        .context("failed to resolve repo path")?;
    if !canonical.starts_with(&repo_canonical) {
        errors.push(format!("path escapes repository: {path}"));
        failed += 1;
        continue;
    }
    if canonical.exists() {
        std::fs::remove_file(&canonical)
            .with_context(|| format!("failed to delete untracked file: {path}"))?;
    }
}

Note: For extra defense-in-depth, also check that the path does not contain .. components before joining, since canonicalize may fail on non-existent files.


4. Performance

Verdict: PASS

Minor
  • checkout_unstaged generates the diff twice: once in scan_unstaged_hunk_metadata and once at line 279-281 for collect_hunks_from_diff. These could be combined into a single diff pass. However, this matches the two-pass pattern in stage.rs (metadata scan then data collection), and the diff operation is fast for the typical use case (subprocess-invoked, not hot path). Not a blocker.
  • untracked_paths.contains() at lines 256 and 347 is O(n) per check. Using a HashSet would be technically better, but untracked file counts are tiny in practice. Not worth changing.

5. Simplicity

Verdict: PASS

The implementation is appropriately scoped:

  • No over-engineering: checkout is hunk-level only (no line-level checkout), which is the right call since line-level discard is a rare use case
  • Naming is clear and consistent (checkout_unstaged/checkout_staged, discarded/failed)
  • The CheckoutRequest type is intentionally simpler than StageRequest (no line_selections) -- YAGNI applied correctly
  • Protocol integration reuses CheckoutParams with a staged boolean rather than inventing a new dispatch mechanism
  • Documentation (SKILL.md, README) is updated appropriately
Minor
  • In toon.rs:76, format_checkout_result creates a Response::success(result) just to read resp.version, same pattern as format_stage_result. This is fine and consistent, just noting the resp is otherwise unused.

Summary

This is a well-structured PR that cleanly mirrors the existing stage.rs patterns for a new destructive operation (checkout/discard). The reverse-patch logic is correct in its line-swapping approach, and edge cases (untracked files, newly-added staged files, invalid IDs, empty requests) are all handled. Test coverage is thorough with 27 new tests across unit and integration levels.

Two issues warrant attention before merge: (1) Sequential application of multiple reverse patches may fail when hunks are close together, because the working tree changes between applies but patches are from the pre-apply state -- this should be fixed by concatenating patches and applying atomically. (2) The untracked file deletion path should validate that the resolved path stays within the repository boundary, since this is a destructive filesystem operation that stage.rs doesn't need to worry about (it uses index operations instead).

- Concatenate all reverse patches into a single string and apply
  atomically via one repo.apply() call, preventing failures when
  hunks are close together (both unstaged and staged paths).
- Add canonicalize + starts_with guard before deleting untracked
  files to prevent path traversal outside the repository boundary.
@MonteYin MonteYin merged commit da0f6ba into main Mar 30, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant