fix(staleness): read changed files with -z so non-ASCII/quoted paths are detected#452
fix(staleness): read changed files with -z so non-ASCII/quoted paths are detected#452tirth8205 wants to merge 2 commits into
Conversation
…are detected git diff --name-only C-quotes paths with non-ASCII/control bytes (e.g. uni-café.txt becomes "uni-caf\303\251.txt"), so getChangedFiles returned the escaped string and the file's stale nodes were never matched/refreshed. Pass -z (NUL-terminated, unquoted) and split on NUL instead of newline. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
thejesh23
left a comment
There was a problem hiding this comment.
1. line.trim() defeats part of the -z fix.
The whole point of -z is to preserve raw bytes (including whitespace) in path tokens; git diff --name-only -z will never emit surrounding whitespace itself, so the only thing trim() can do here is silently corrupt legitimate filenames that contain leading/trailing spaces or tabs. Drop the .trim() and rely on the length > 0 filter to drop the trailing empty token after the final NUL.
2. Paths with embedded newlines aren't exercised in tests.
The PR description calls out that -z "tolerates paths that contain literal newlines," but neither new test case actually contains a \n inside a path — the only failure mode currently covered is non-ASCII and a space. Add a "weird\nname.txt\0" fixture so the regression guard matches the stated motivation, otherwise a future refactor reintroducing split("\n") would still pass.
3. Rename/copy handling silently regresses on the first commit hash with a rename.
git diff --name-only emits a single post-rename path, but if anything downstream ever swaps in --name-status or -M (common when extending staleness to track rename edges), each rename entry under -z is three NUL-separated tokens (R100\0old\0new\0), and the current split("\0").filter(length>0) would treat R100 and old as changed file paths. Worth a comment on getChangedFiles pinning the --name-only assumption, or a parser that consumes status prefixes properly.
Nit: the inline comment example uni-café.txt uses the post-decoding form; consider showing the actual bytes (\"uni-caf\\303\\251.txt\") git emits, which is what made this hard to spot.
git diff --name-only -z emits raw NUL-terminated path bytes and never adds surrounding whitespace, so trim() could only corrupt legitimate filenames with leading/trailing spaces or tabs. Drop it and rely on the length filter to discard the trailing empty token. Add regression tests for embedded-newline paths (the stated -z motivation) and for whitespace-preserving tokens. Document the --name-only single-token invariant so a future switch to --name-status/-M does not silently break the parser. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
1. 2. No regression test for embedded newlines — Added. New case mocks 3. Rename/copy Nit — The branch comment already shows the actual git-emitted bytes ( Full core suite green (695 passed); |
Problem
git diff <hash>..HEAD --name-onlyC-quotes any path containing non-ASCII bytes (or quotes/backslashes/control chars): it wraps the path in double quotes and octal-escapes the bytes. For a fileuni-café.txt, git emits the literal line"uni-caf\303\251.txt". getChangedFiles splits on\n, trims, and returns the raw lines verbatim with NO unquoting, so it returns the string"uni-caf\303\251.txt"instead ofuni-café.txt. Verified against a real git repo:git diff BASE..HEAD --name-onlyon a repo containinguni-café.txtprints"uni-caf\303\251.txt". Downstream, mergeGraphUpdate doeschangedSet.has(node.filePath)against the real stored pathuni-café.txt, which never equals the quoted/escaped string — so the changed file's stale nodes are silently kept and never refreshed (and isStale's changedFiles list contains a bogus path). Repos with CJK/accented/emoji filenames are…Fix
-zflag (NUL-terminated, no quoting) and parse accordingly, or pass-c core.quotePath=false. With-z:execFileSync('git', ['diff',${lastCommitHash}..HEAD, '--name-only', '-z'], {cwd: projectDir, encoding: 'utf-8'})thenoutput.split('\0').map(l => l.trim()).filter(l => l.length > 0). The-zform also tolerates paths that contain literal newlines. (~3 LOC change.)Testing
Adds unit test(s) that fail before the change and pass after. The full core test suite,
eslint, andtsc --noEmitall pass locally on this branch.Found via a static correctness audit of the incremental staleness detection.
🤖 Generated with Claude Code