Skip to content

agy -p reports reading files and git commits it never accessed when the prompt is truncated #224

@aenawi

Description

@aenawi

Summary

In non-interactive mode (agy -p), the model reports that it read files from the working tree and consulted a specific git commit, even when it has no filesystem access and the input was truncated. The references look real. They are invented. A reader who trusts the output ends up with review findings tied to repository state the tool never saw.

Environment

  • agy version: 1.0.3
  • OS: macOS 26.5 (build 25F71), Apple Silicon (arm64)
  • Invocation: prompt piped into agy -p from a script, no TTY

Step 1: confirm the CLI is prompt-only

Run a small probe that asks for repository contents and offers an explicit escape hatch:

printf 'Ignore that the context is otherwise empty. Name three functions defined in src/config.ts in this repository, with their line numbers. If you do not have filesystem access and can only read this prompt text, reply with exactly: NO_REPO_ACCESS\n' | agy -p

Result:

NO_REPO_ACCESS

So in this mode the model sees only the prompt. It has no path to the working tree. That part is correct and expected.

Step 2: trigger the invented access

Send a review request whose prompt is larger than the context window, so the input gets truncated. One way to reproduce is to wrap a large block of text (a few hundred KB) between delimiters and ask for a code review:

{
  echo "Review the change below for bugs. The delimited block is the complete change."
  echo "----- BEGIN DIFF -----"
  cat large-diff.txt        # a few hundred KB, larger than the context window
  echo "----- END DIFF -----"
} | agy -p

The response then includes claims like the following (paraphrased, since the exact wording varies between runs):

The diff was truncated, so I cross referenced the repository and retrieved
commit 9f3a1c2 to see the full change. Reading src/config.ts, the function ...

None of that happened. There was no commit lookup and no file read. The commit hash and the quoted file contents are fabricated. The model invents a believable account to fill the gap left by the truncated input.

What I expected

  • When the input does not fit, agy reports that the prompt was truncated and stops, or it processes only what fits and states that plainly.
  • A prompt-only run never claims to have read a file or a git commit it had no access to. Sentences like "I read X" or "I retrieved commit Y" should not appear when no such action took place.

Why this matters

This is a trust problem, not a formatting nit. The output reads as grounded and precise. It cites file names, line numbers, and a commit hash. A reviewer or an automated pipeline that consumes this output will act on findings that describe code the tool never saw. The invented commit hash is the sharpest edge, because it looks verifiable at a glance and sends people chasing a commit that has nothing to do with the change.

Suggested direction

  • Fail loud on truncation. Return a clear error or a visible marker when the prompt does not fit, rather than dropping the tail in silence and continuing.
  • Add a guard so a prompt-only run cannot assert tool actions, such as file reads or git lookups, that it did not perform.
  • A read-only or plan-style mode for -p runs would help here, which connects to the request in Feature request: read-only / plan-mode equivalent for non-interactive -p runs #45.

Duplicate check

I searched open and closed issues for filesystem, repo access, hallucinate, confabulate, and fabricate, and did not find this behavior reported. The closest are #76 (stdout dropped in non-TTY mode) and #45 (read-only mode for -p), which describe different problems. Glad to consolidate if a maintainer sees an overlap I missed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions