Fix JSON parse failure when LLM returns unescaped control chars in sampling responses by m-nash · Pull Request #50 · m-nash/pr-copilot

m-nash · 2026-04-14T20:17:16Z

Problem

During PR #133 monitoring (Azure/azure-cdk), the \�xplain_comment\ sampling step repeatedly failed with:

\
Internal error: Sampling failed to explain comment — LLM returned response that could not be parsed
\\

The debug log revealed the root cause: the LLM returns JSON with literal newline characters (0x0A) inside string values (the
ecommendation\ field), which \System.Text.Json\ correctly rejects per RFC 8259.

Fix

1. Root cause fix: \SanitizeJsonControlChars\ in \SampleStructuredAsync\

Added a new \SanitizeJsonControlChars()\ method that walks JSON text, tracks string boundaries (respecting escape sequences), and replaces bare control characters (U+0000–U+001F) with their proper JSON escape sequences (\\n, \\t, \\r, etc.).

This runs in \SampleStructuredAsync\ after \StripCodeFences, so it fixes the issue for all structured sampling calls, not just \�xplain_comment.

2. Added retry to \�xplain_comment\

The \compose_reply\ path already retried once on parse failure. The \�xplain_comment\ path did not — it threw immediately. Now both paths have consistent retry behavior.

Testing

6 new unit tests covering the sanitization logic and the end-to-end sampling pipeline with literal newlines
All 345 tests pass

When a user asks for clarification about a CI failure recommendation and the agent arrives at a different conclusion, the updated recommendation was being lost. ProcessTaskComplete re-called ProcessInvestigationComplete which always read the stale SuggestedFix from state. Now: propagate LastRecommendation to SuggestedFix before re-presenting choices, and instruct the agent to include data.recommendation when the analysis changes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Include the PR title and description in the ExplainCommentAsync sampling prompt so the LLM can assess whether review suggestions fall within the PR's scope before defaulting to implement. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Mark PR title/body, review comments, and diffs as untrusted data in the sampling system prompt. Wrap untrusted sections with BEGIN/END delimiters to prevent embedded instructions from being followed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Verify PR title, body, truncation, empty body handling, and untrusted content guardrail in ExplainCommentAsync prompts. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Replace three duplicate private Truncate methods and inline truncation with a single extension method. Unifies suffix to use '...' consistently. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…mpling responses The LLM frequently emits literal newline characters (0x0A) inside JSON string values, which System.Text.Json correctly rejects. Add a SanitizeJsonControlChars step in SampleStructuredAsync that escapes bare control characters within JSON strings before parsing. Also add a retry to explain_comment to match the existing compose_reply pattern, as a fallback for any remaining transient parse failures. Fixes repeated 'Sampling failed to explain comment' errors observed during PR #133 monitoring. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Fixes structured sampling failures when the LLM returns invalid JSON containing unescaped control characters inside string values, and improves robustness/consistency of sampling-driven flows.

Changes:

Add JSON sanitization to escape bare control characters inside JSON string values before System.Text.Json deserialization.
Add a one-time retry for explain_comment sampling (to match existing compose_reply retry behavior).
Extend prompt/context used for explain_comment (include PR title/body) and propagate updated recommendations in CI failure flow; add/expand unit tests for these behaviors.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
PrCopilot/src/PrCopilot/Tools/SamplingHelper.cs	Adds `SanitizeJsonControlChars` and uses it in structured sampling; expands explain-comment prompt and includes PR context (title/body).
PrCopilot/src/PrCopilot/Tools/MonitorFlowTools.cs	Retries `explain_comment` once on null/parse failure; updates truncation usage; records PR body into monitor state.
PrCopilot/src/PrCopilot/StringExtensions.cs	Introduces shared `Truncate()` extension used across multiple components for consistent logging/prompt building.
PrCopilot/src/PrCopilot/StateMachine/PrStatusFetcher.cs	Fetches PR body via `gh api` and populates `PrInfo.Body`.
PrCopilot/src/PrCopilot/StateMachine/PrInfo.cs	Adds `Body` field to PR info model.
PrCopilot/src/PrCopilot/StateMachine/MonitorState.cs	Adds `PrBody` to persisted monitor state for downstream prompt/context usage.
PrCopilot/src/PrCopilot/StateMachine/MonitorTransitions.cs	Propagates updated `LastRecommendation` into `SuggestedFix` in CI failure flow; switches truncation helper to extension.
PrCopilot/src/PrCopilot/StateMachine/GitHubCliExecutor.cs	Switches local truncation helper to shared `Truncate()` extension.
PrCopilot/tests/PrCopilot.Tests/SamplingHelperTests.cs	Adds tests for PR context injection, sanitization behavior, and end-to-end structured sampling parse with literal newlines.
PrCopilot/tests/PrCopilot.Tests/StateMachineTests.cs	Adds CI failure flow tests ensuring updated recommendations propagate correctly and affect available choices.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Scan for any char < 0x20 before allocating StringBuilder and doing the full walk. Most JSON responses have no control chars, so this avoids unnecessary allocation in the common case. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 9 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-14T21:05:40Z

                    state.PendingReplyText = reply.ReplyText;
                    state.SamplingCompletionEvent = completionEvent;
-                    DebugLogger.Log("Sampling", $"Reply composed: {Truncate(reply.ReplyText, 100)}");
+                    DebugLogger.Log("Sampling", $"Reply composed: {reply.ReplyText.Truncate(100)}");
                    return true;


reply.ReplyText.Truncate(100) relies on the PrCopilot.StringExtensions extension method, but this file is in namespace PrCopilot.Tools; and doesn’t import using PrCopilot; (and there are no global usings). This will not compile unless the extension namespace is in scope.

Copilot · 2026-04-14T21:05:40Z


    private static string Truncate(string text, int maxLength) =>
-        text.Length <= maxLength ? text : text[..maxLength] + "...";
+        text.Truncate(maxLength);


text.Truncate(maxLength) relies on the PrCopilot.StringExtensions extension method, but this file is in namespace PrCopilot.StateMachine; and doesn’t import using PrCopilot; (and there are no global usings). This will not compile unless the extension namespace is in scope.

Suggested change

text.Truncate(maxLength);

text.Length <= maxLength ? text : text[..maxLength];

Copilot · 2026-04-14T21:05:40Z

+            return text ?? "";
+        return text[..maxLength] + "...";
+    }
+}


StringExtensions.Truncate is declared in the PrCopilot namespace, but it’s used as an extension method from PrCopilot.Tools and PrCopilot.StateMachine without a using PrCopilot; (and there are no global usings in this project). As-is, calls like text.Truncate(...) won’t compile. Consider either adding using PrCopilot; where it’s used, or introducing a GlobalUsings.cs with global using PrCopilot; so the extension is in scope project-wide.

Suggested change

}

}

namespace PrCopilot.Tools;

internal static class StringExtensions

{

/// <summary>

/// Truncates a string to the specified maximum length, appending "..." if truncated.

/// Returns empty string for null input.

/// </summary>

internal static string Truncate(this string? text, int maxLength)

{

if (string.IsNullOrEmpty(text) || text.Length <= maxLength)

return text ?? "";

return text[..maxLength] + "...";

}

}

namespace PrCopilot.StateMachine;

internal static class StringExtensions

{

/// <summary>

/// Truncates a string to the specified maximum length, appending "..." if truncated.

/// Returns empty string for null input.

/// </summary>

internal static string Truncate(this string? text, int maxLength)

{

if (string.IsNullOrEmpty(text) || text.Length <= maxLength)

return text ?? "";

return text[..maxLength] + "...";

}

}

Copilot · 2026-04-14T21:05:41Z

+        // Quick scan: if no control chars exist, return as-is (common fast path).
+        // Structural whitespace (\n, \r, \t) outside strings is valid JSON,
+        // but inside strings they must be escaped — so only inputs containing
+        // control chars need the full walk.
+        for (int i = 0; i < json.Length; i++)
+        {
+            if (json[i] < 0x20)
+                goto NeedsSanitization;
+        }
+
+        return json;
+
+    NeedsSanitization:
+        var sb = new StringBuilder(json.Length);


The current fast-path in SanitizeJsonControlChars triggers on any control char anywhere in the JSON, including structural newlines/tabs outside strings (which are valid JSON). That means pretty-printed JSON (very common for LLM output) will always allocate a StringBuilder and copy the entire payload even when there are no invalid characters inside string values. Consider delaying allocation until you actually encounter a control character while inString == true (e.g., build only when needed, otherwise return the original input).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

m-nash and others added 9 commits April 2, 2026 10:26

Add tests for PR context in comment analysis sampling

2c66aec

Verify PR title, body, truncation, empty body handling, and untrusted content guardrail in ExplainCommentAsync prompts. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Centralize Truncate into shared StringExtensions

021aca2

Replace three duplicate private Truncate methods and inline truncation with a single extension method. Unifies suffix to use '...' consistently. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add missing StringExtensions.cs file

e896063

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Update FetchPrInfoAsync doc comment to list all fields

3e9de49

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Include file content in untrusted content guardrail

e8ffd68

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

m-nash requested a review from Copilot April 14, 2026 20:30

Copilot AI reviewed Apr 14, 2026

View reviewed changes

Comment thread PrCopilot/src/PrCopilot/Tools/SamplingHelper.cs Outdated

Comment thread PrCopilot/src/PrCopilot/Tools/SamplingHelper.cs

m-nash requested a review from Copilot April 14, 2026 21:01

Copilot started reviewing on behalf of m-nash April 14, 2026 21:02 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

Merge main and resolve conflict in SamplingHelperTests

9996da9

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix JSON parse failure when LLM returns unescaped control chars in sampling responses#50

Fix JSON parse failure when LLM returns unescaped control chars in sampling responses#50
m-nash wants to merge 11 commits into
mainfrom
fix/ci-freeform-recommendation-update

m-nash commented Apr 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	text.Truncate(maxLength);
	text.Length <= maxLength ? text : text[..maxLength];

-}
+}
+namespace PrCopilot.Tools;
+internal static class StringExtensions
+{
+    /// <summary>
+    /// Truncates a string to the specified maximum length, appending "..." if truncated.
+    /// Returns empty string for null input.
+    /// </summary>
+    internal static string Truncate(this string? text, int maxLength)
+    {
+        if (string.IsNullOrEmpty(text) || text.Length <= maxLength)
+            return text ?? "";
+        return text[..maxLength] + "...";
+    }
+}
+namespace PrCopilot.StateMachine;
+internal static class StringExtensions
+{
+    /// <summary>
+    /// Truncates a string to the specified maximum length, appending "..." if truncated.
+    /// Returns empty string for null input.
+    /// </summary>
+    internal static string Truncate(this string? text, int maxLength)
+    {
+        if (string.IsNullOrEmpty(text) || text.Length <= maxLength)
+            return text ?? "";
+        return text[..maxLength] + "...";
+    }
+}

Conversation

m-nash commented Apr 14, 2026

Problem

Fix

1. Root cause fix: \SanitizeJsonControlChars\ in \SampleStructuredAsync\

2. Added retry to \�xplain_comment\

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants