Skip to content

git_commit_in_cache_with_args reports empty error messages for the most common failure mode #601

@vringar

Description

@vringar

Summary

When git commit exits non-zero, SharedWriter::git_commit_in_cache_with_args
captures only output.stderr into the bail message. But for the single most
common cause of a non-zero exit from git commit — "nothing to commit, working
tree clean" — git writes its status message to stdout, leaving stderr
empty.

The user-visible result is:

Error: git commit ["-m", "<agent-id>: block issue #24 on #23 at …"] in cache failed:
[stack trace, no actual git stderr captured]

Note the empty space after failed: . The user has no diagnostic to determine
why git failed.

Worse, the existing no-op guards in the push paths (core.rs:316-322,
core.rs:786-805) — which were clearly written to treat "nothing to commit"
as success — rely on substring-matching that exact phrase against the error
string. Because the error string is built from empty stderr, those guards
never fire. They are dead code. See "Latent secondary bug" below.

Reproducer

Trigger from the public CLI (this is also the surface bug reported in #600):

crosslink issue create "a" -q     # → L1
crosslink issue create "b" -q     # → L2
crosslink issue block L2 L1       # commits an event
crosslink issue block L2 L1       # second time = nothing to commit
# → Error: git commit [...] in cache failed:    ← empty after the colon
echo $?                            # → 1

Direct git-level reproducer in the hub cache:

cd .crosslink/.hub-cache
git commit -m "test" 2>/tmp/stderr 1>/tmp/stdout
cat /tmp/stderr  # → (empty)
cat /tmp/stdout  # → "On branch crosslink/hub\nnothing to commit, working tree clean"
echo $?          # → 1

git commit's status message ("nothing to commit, working tree clean") is on
stdout, not stderr — by design, since git treats it as a status report
rather than a diagnostic. Any caller that only captures stderr will see an
empty failure.

Offending code

git_commit_in_cache_with_args — captures only output.stderr:

/// Run a git commit with arbitrary args in the cache worktree,
/// disabling signing when the agent has no SSH key.
pub(super) fn git_commit_in_cache_with_args(
&self,
args: &[&str],
) -> Result<std::process::Output> {
let has_key = self.agent.ssh_key_path.is_some();
let mut cmd = std::process::Command::new("git");
cmd.current_dir(&self.cache_dir);
if !has_key {
cmd.args(["-c", "commit.gpgsign=false"]);
}
cmd.arg("commit").args(args);
let output = cmd
.output()
.with_context(|| format!("Failed to run git commit {args:?} in cache"))?;
if !output.status.success() {
let stderr = String::from_utf8_lossy(&output.stderr);
bail!("git commit {args:?} in cache failed: {stderr}");
}
Ok(output)
}

git_in_cache has the same shape, so any non-commit git invocation whose
diagnostic goes to stdout would surface as an empty failure too:

/// Run a git command in the cache worktree.
pub(super) fn git_in_cache(&self, args: &[&str]) -> Result<std::process::Output> {
let output = std::process::Command::new("git")
.current_dir(&self.cache_dir)
.args(args)
.output()
.with_context(|| format!("Failed to run git {args:?} in cache"))?;
if !output.status.success() {
let stderr = String::from_utf8_lossy(&output.stderr);
bail!("git {args:?} in cache failed: {stderr}");
}
Ok(output)
}

Latent secondary bug: the existing "nothing to commit" guards are dead code

Two locations in the push paths attempt to handle the no-op commit case by
substring-matching the error string of a failed commit:

let commit_result = self.git_commit_in_cache(&commit_msg);
if let Err(ref e) = commit_result {
    let err_str = e.to_string();
    if err_str.contains("nothing to commit") || err_str.contains("no changes added") {
        return Ok(PushOutcome::Pushed);
    }
}
commit_result?;

Guard A:

let commit_result = self.git_commit_in_cache(&commit_msg);
if let Err(ref e) = commit_result {
let err_str = e.to_string();
if err_str.contains("nothing to commit") || err_str.contains("no changes added") {
return Ok(PushOutcome::Pushed);
}
}
commit_result?;

Guard B:

let commit_result = self.git_commit_in_cache(&commit_msg);
if let Err(e) = &commit_result {
let err_str = e.to_string();
if err_str.contains("nothing to commit") || err_str.contains("no changes added") {
return Ok(PushOutcome::Pushed);
}
// Commit failed — if we were deleting files (git rm), restore
// Commit failed — reset index and working directory to HEAD
// to prevent split state (#427, #468). This is safe because
// the commit didn't succeed, so HEAD is the correct state.
if write_set.use_git_rm {
if let Err(reset_err) = self.git_in_cache(&["reset", "--hard", "HEAD"]) {
tracing::error!(
"hub cache may be corrupt: commit failed and reset failed: {}",
reset_err
);
}
}
commit_result?;
}

These checks never fire on main, because the err_str they inspect is built
from output.stderr only — which is empty in the "nothing to commit" case.
So err_str is literally "git commit [\"-m\", \"…\"] in cache failed: ",
which contains neither "nothing to commit" nor "no changes added".

Both guards fall through to commit_result?, the error propagates to the
caller, and the user-facing behavior is the noisy empty-error described above
instead of the intended silent success.

The fix below (capture stdout) closes this as a side effect — the existing
substring checks start matching against output.stdout content and the
guards do what they were written to do. An optional belt-and-suspenders
improvement is to also detect the no-op before invoking git commit at all
(git diff --cached --quiet after git add); see "Optional defense in depth"
below.

Why it matters

This bug is a force multiplier on every other git-cache failure mode:

  • The "nothing to commit" case is invisible. Users (and downstream
    automation) cannot tell idempotent re-application from a real failure
    without strace-ing crosslink. I spent meaningful time chasing a
    gpg-signing red herring before isolating the no-op path.
  • The dead guards above mean the surface bug in crosslink issue block exits non-zero when the blocker is already set #600 reports as a hard
    failure.
    Fixing this one bug closes the noisy-error surface on crosslink issue block exits non-zero when the blocker is already set #600
    immediately, even without touching add_blocker's own idempotency.
  • Future debugging is hostile by default. Any future failure mode that
    reports through stdout (pre-commit hook output, refusal to commit on
    detached HEAD without --allow-detached, server-side rejections that
    bubble back through git push's stdout) will surface the same empty
    string, masking the root cause.

Suggested fix

Include both streams in the error, prefixed so the reader knows which is
which:

if !output.status.success() {
    let stdout = String::from_utf8_lossy(&output.stdout);
    let stderr = String::from_utf8_lossy(&output.stderr);
    bail!(
        "git commit {args:?} in cache failed (exit {}):\n\
         stdout: {}\n\
         stderr: {}",
        output.status,
        stdout.trim(),
        stderr.trim(),
    );
}

Same treatment for git_in_cache at core.rs:862-874.

This single change:

  • Closes the empty-error surface.
  • Revives the two existing "nothing to commit" guards in the push paths
    (they start matching err_str against stdout content too).
  • Adds diagnostic value to every other git-cache failure mode, present
    and future.

Optional defense in depth

Even with the fix above, the no-op detection still relies on substring-matching
git's human-readable wording. A more robust alternative — or addition — is to
detect the no-op explicitly, before the commit attempt:

// After `git add` in write_commit_push, before git_commit_in_cache:
let staged = self.git_in_cache(&["diff", "--cached", "--quiet"]);
if staged.is_ok() {
    // diff --cached --quiet exits 0 when nothing is staged
    return Ok(PushOutcome::Pushed);
}

git diff --cached --quiet uses git's exit code, not its output, so it is
independent of git's wording across versions and locales. This makes the
no-op path explicit and structurally robust; the existing substring guards
become a safety net rather than the primary mechanism.

Proposed test that would have caught this

The existing tests for the bail-message path assert string shape against
specific failed: strings, which lock in the bug (stderr-only) rather than
detecting it. The minimum fix is a test that actually invokes git commit
in a "nothing to commit" state and asserts the error message carries enough
information to diagnose the failure:

#[test]
fn test_git_commit_in_cache_includes_stdout_on_failure() {
    use std::process::Command;

    let tmp = tempfile::tempdir().unwrap();
    let cache = tmp.path();

    // Set up an empty git repo with no staged changes.
    Command::new("git").current_dir(cache).args(["init", "-q"]).status().unwrap();
    Command::new("git").current_dir(cache).args(["config", "user.email", "t@e.st"]).status().unwrap();
    Command::new("git").current_dir(cache).args(["config", "user.name",  "t"]).status().unwrap();
    Command::new("git").current_dir(cache).args(["commit", "--allow-empty", "-m", "init", "-q"]).status().unwrap();

    let writer = make_test_writer(cache); // test helper that yields a SharedWriter
                                          // bound to `cache` with no SSH key
    let err = writer
        .git_commit_in_cache_with_args(&["-m", "no-op"])
        .expect_err("commit should fail when nothing is staged");
    let msg = err.to_string();

    assert!(
        msg.contains("nothing to commit"),
        "error must surface git's 'nothing to commit' status to the caller, \
         got: {msg}"
    );
}

#[test]
fn test_push_path_treats_noop_commit_as_success() {
    // Set up a writer in a state where the prepare closure produces
    // an unchanged WriteSet (e.g. add_blocker for an already-blocked pair).
    // Assert that write_commit_push returns Ok(PushOutcome::Pushed) and
    // does NOT propagate an error to the caller.
    ...
}

Today (current main), the first test fails with msg = "git commit [\"-m\", \"no-op\"] in cache failed: "
(no "nothing to commit" substring). After the fix above, it passes. The second
test fails today too, because the guards never fire; after the fix it passes
because the revived guards do what they were written to do.

The same shape of test should apply to git_in_cache failures whose
diagnostic lives on stdout (e.g. git push server-side hook rejections that
print to stdout instead of stderr).

Related

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions