Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
15 commits
Select commit Hold shift + click to select a range
8fb9e4e
feat: persistent inline comments to prevent cross-run duplicates
IsmaelMartinez Jun 4, 2026
b8b6383
fix: harden inline-comment dedup fingerprint and add provider-level t…
IsmaelMartinez Jun 4, 2026
f699e7e
fix: address review - record-after-success, stable GitHub anchor, laz…
IsmaelMartinez Jun 4, 2026
f30ccba
fix: scan GitLab notes so fallback-comment dedup markers persist acro…
IsmaelMartinez Jun 4, 2026
424387a
fix: round 2 review - GitLab deletion anchor, clip-safe markers, accu…
IsmaelMartinez Jun 4, 2026
885ec20
test: dry up repeated settings stub into a helper to fix overlong lines
IsmaelMartinez Jun 4, 2026
194244f
test: wrap remaining long lines in dedup tests
IsmaelMartinez Jun 4, 2026
a0869e0
fix: GitHub fallback re-publish keeps a dedup marker; wrap log; drop …
IsmaelMartinez Jun 4, 2026
b487346
fix: re-raise with bare raise in inline-comment fallback to preserve …
IsmaelMartinez Jun 4, 2026
3dc8454
fix: record GitHub dedup fingerprints only on bulk publish success, n…
IsmaelMartinez Jun 4, 2026
a2bac5f
fix: keep code fingerprint case-sensitive (code identity must not low…
IsmaelMartinez Jun 4, 2026
a749828
fix: make dedup fingerprints marker-invariant (strip markers before h…
IsmaelMartinez Jun 4, 2026
c9d7a17
fix: only treat a comment as marked when it carries a well-formed ded…
IsmaelMartinez Jun 4, 2026
ef9a46b
Merge remote-tracking branch 'upstream/main' into sync/2424
IsmaelMartinez Jun 9, 2026
f6070fb
fix: preserve code fingerprint through hunk validation
avidspartan1 Jun 10, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions docs/docs/tools/improve.md
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,19 @@ dual_publishing_score_threshold = x

Where x represents the minimum score threshold (>=) for suggestions to be presented as committable PR comments in addition to the table. Default is -1 (disabled).

### Persistent inline comments

By default, PR-Agent re-posts identical inline code comments on every run, which clutters the discussion, particularly on GitLab. The persistent inline comments feature prevents this by skipping the re-posting of comments that are already present from an earlier run. This is achieved by embedding a hidden HTML-comment marker with a short fingerprint in each posted comment, allowing PR-Agent to scan existing comment bodies on later runs to identify and skip duplicates.

Two fingerprints are used and matched with OR logic: one over the comment text (file, line, normalised text) and one over the proposed code block when present. This approach catches a re-emitted finding even when the model rephrases the prose or slightly changes the code. The feature is opt-in and off by default, and is implemented for the GitHub and GitLab providers; other providers are unaffected.

To enable it, use the following setting:

```toml
[config]
persistent_inline_comments = true
```

### Self-review

`Platforms supported: GitHub, GitLab`
Expand Down
183 changes: 183 additions & 0 deletions pr_agent/algo/inline_comment_dedup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
"""Cross-run deduplication of inline (line-anchored) comments.

Implements the feature requested in issue #2037: when the agent runs more
than once on the same PR/MR, it re-posts identical inline suggestions on
every run, cluttering the discussion (observed in particular on GitLab).
This module fingerprints each inline comment and embeds the fingerprint as
an HTML-comment marker in the posted body. On later runs the existing
comment bodies are scanned for those markers to rebuild the set of
already-posted fingerprints, and any suggestion whose fingerprint is already
present is skipped.

Two fingerprints are computed per comment and matched with OR semantics:

- Body fingerprint: SHA-256 over (relevant_file, anchor line, normalised
first 80 characters of the body). The category/importance tag and the
``**Suggestion:**`` lead are stripped and whitespace is collapsed first.
- Code fingerprint: SHA-256 over (relevant_file, anchor line, normalised
contents of the first ```suggestion fenced block). Returns None when the
body has no suggestion block, in which case matching falls back to the
body fingerprint alone.

The OR-match catches both "same prose, different code" and "same code,
different prose" re-emissions of the same defect, which are the two ways an
LLM tends to restate a finding across runs.

The feature is opt-in via ``config.persistent_inline_comments`` (default
false) and is wired into the GitHub and GitLab providers. The marker-scan
store needs no external infrastructure; a different backend (database,
cache) could populate the same load/seen/add interface.
"""

from __future__ import annotations

import hashlib
import re
from typing import Iterator, Optional

BODY_MARKER_RE = re.compile(r"<!-- pr-agent-dedup: ([a-f0-9]{12}) -->")
CODE_MARKER_RE = re.compile(r"<!-- pr-agent-dedup-code: ([a-f0-9]{12}) -->")

_LEAD_RE = re.compile(r"^\*\*Suggestion:\*\*\s*", re.IGNORECASE)
_TAG_RE = re.compile(r"\[[^\]]+?,\s*importance:\s*\d+\]", re.IGNORECASE)
_WS_RE = re.compile(r"\s+")
_CODE_BLOCK_RE = re.compile(r"```suggestion[^\n]*\n(.*?)```", re.DOTALL)


def has_marker(body: str) -> bool:
"""True only if the body carries a well-formed dedup marker (12-hex),
so incidental text mentioning the marker syntax is not mistaken for one."""
return bool(BODY_MARKER_RE.search(body or "") or CODE_MARKER_RE.search(body or ""))


def _strip_markers(body: str) -> str:
"""Remove embedded dedup markers so a pre-marked body fingerprints the
same as its original (markers are appended after marking)."""
body = BODY_MARKER_RE.sub("", body or "")
body = CODE_MARKER_RE.sub("", body)
return body


def body_fingerprint(relevant_file: str, target_line_no, body: str) -> str:
normalised = _LEAD_RE.sub("", _strip_markers(body))
normalised = _TAG_RE.sub("", normalised)
normalised = _WS_RE.sub(" ", normalised).strip()[:80].lower()
key = f"{relevant_file}|{target_line_no}|{normalised}"
return hashlib.sha256(key.encode("utf-8")).hexdigest()[:12]


def code_fingerprint(relevant_file: str, target_line_no, body: str) -> Optional[str]:
m = _CODE_BLOCK_RE.search(_strip_markers(body))
if not m:
return None
# Do not lower-case: code is case-sensitive, so case-only differences
# must produce distinct fingerprints.
code = _WS_RE.sub(" ", m.group(1)).strip()
if not code:
return None
key = f"{relevant_file}|{target_line_no}|code|{code}"
return hashlib.sha256(key.encode("utf-8")).hexdigest()[:12]


def build_markers(body_fp: str, code_fp: Optional[str]) -> str:
markers = [f"<!-- pr-agent-dedup: {body_fp} -->"]
if code_fp is not None:
markers.append(f"<!-- pr-agent-dedup-code: {code_fp} -->")
return "\n".join(markers)


def body_with_markers(body: str, body_fp: str, code_fp: "Optional[str]",
max_chars: "Optional[int]" = None) -> str:
"""Append the dedup marker(s) to a comment body. If max_chars is given and
body + markers would exceed it, the body is clipped (never the markers) so
the fingerprint marker always survives for the next run's scan."""
suffix = f"\n\n{build_markers(body_fp, code_fp)}"
if max_chars and len(body) + len(suffix) > max_chars:
body = body[: max(0, max_chars - len(suffix))]
return f"{body}{suffix}"


def inline_comment_line(comment: dict):
"""Best-effort anchor line for a GitHub inline-comment dict."""
for key in ("line", "position", "start_line"):
if comment.get(key) is not None:
return comment[key]
return None


def iter_existing_inline_comment_bodies(git_provider) -> Iterator[str]:
"""Yield the body of every existing comment on the current PR/MR.

Dispatch is by provider class name so this module needs no provider
import. Unsupported providers raise NotImplementedError, which the store
treats as "cannot dedup here" and degrades to within-run dedup only.
"""
provider_name = type(git_provider).__name__
if provider_name == "GithubProvider":
for comment in git_provider.pr.get_comments():
yield getattr(comment, "body", "") or ""
elif provider_name == "GitLabProvider":
for discussion in git_provider.mr.discussions.list(get_all=True):
attrs = getattr(discussion, "attributes", None) or {}
for note in attrs.get("notes", []) or []:
if isinstance(note, dict):
yield note.get("body", "") or ""
# The committable-suggestion fallback posts via mr.notes.create, which
# may not surface as a discussion; scan plain notes too so their markers
# are seen on later runs.
for note in git_provider.mr.notes.list(get_all=True):
yield getattr(note, "body", "") or ""
else:
Comment thread
qodo-free-for-open-source-projects[bot] marked this conversation as resolved.
raise NotImplementedError(
f"inline-comment dedup not implemented for {provider_name}"
)


class InlineCommentStore:
"""Set of already-posted inline-comment fingerprints for one PR/MR.

The existing comment bodies are scanned lazily on first lookup and the
seen-set is held in memory for the rest of the run. A failure to list
existing comments degrades to within-run dedup only and never raises
into the publish path.
"""

def __init__(self, git_provider):
self._git_provider = git_provider
self._keys: set = set()
self._loaded = False

def load(self) -> set:
if self._loaded:
return self._keys
try:
for body in iter_existing_inline_comment_bodies(self._git_provider):
for marker_re in (BODY_MARKER_RE, CODE_MARKER_RE):
for match in marker_re.finditer(body or ""):
self._keys.add(match.group(1))
except Exception as e:
from pr_agent.log import get_logger
get_logger().info(
f"Persistent inline comments: could not load existing comments, "
f"within-run dedup only. error={e}"
)
self._loaded = True
Comment thread
IsmaelMartinez marked this conversation as resolved.
Comment thread
IsmaelMartinez marked this conversation as resolved.
return self._keys

def seen(self, fingerprint: Optional[str]) -> bool:
if fingerprint is None:
return False
return fingerprint in self.load()

def add(self, fingerprint: Optional[str]) -> None:
if fingerprint is not None:
self._keys.add(fingerprint)


def get_inline_comment_store(git_provider) -> InlineCommentStore:
"""Return the per-provider store, creating and caching it on first use."""
store = getattr(git_provider, "_inline_comment_store", None)
if store is None:
store = InlineCommentStore(git_provider)
git_provider._inline_comment_store = store
return store
76 changes: 73 additions & 3 deletions pr_agent/git_providers/github_provider.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@

from ..algo.file_filter import filter_ignored
from ..algo.git_patch_processing import extract_hunk_headers
from ..algo.inline_comment_dedup import (body_fingerprint, body_with_markers,
code_fingerprint,
get_inline_comment_store, has_marker)
Comment thread
qodo-free-for-open-source-projects[bot] marked this conversation as resolved.
from ..algo.language_handler import is_valid_file
from ..algo.types import EDIT_TYPE
from ..algo.utils import (PRReviewHeader, Range, clip_tokens,
Expand Down Expand Up @@ -416,9 +419,70 @@ def create_inline_comment(self, body: str, relevant_file: str, relevant_line_in_
return dict(body=body, path=path, position=position) if subject_type == "LINE" else {}

def publish_inline_comments(self, comments: list[dict], disable_fallback: bool = False):
store = None
pending_fingerprints = []
dedup_code_fp_key = "_dedup_code_fp"
if get_settings().get("config.persistent_inline_comments", False):
store = get_inline_comment_store(self)
local_seen = set()
deduped = []
skipped = 0
for comment in comments:
if not comment:
deduped.append(comment)
continue
path = comment.get("path", "")
body = comment.get("body", "")
# GitHub committable comments are anchored by diff position, which
# shifts as the PR gains commits; anchor the fingerprint on the file
# path and comment content instead so it stays stable across runs.
body_fp = body_fingerprint(path, None, body)
pre_transform_code_fp = comment.get(dedup_code_fp_key)
code_fp = pre_transform_code_fp or code_fingerprint(path, None, body)
# A fallback re-publish (disable_fallback=True) is for a comment
# that has not been posted yet, so do not filter it; only the
# top-level call drops duplicates. The fallback still gets marked
# and recorded below so it dedups on later runs.
if not disable_fallback and (
store.seen(body_fp) or store.seen(code_fp)
or body_fp in local_seen or (code_fp and code_fp in local_seen)):
skipped += 1
continue
marked = dict(comment)
marked.pop(dedup_code_fp_key, None)
if has_marker(body):
pass # already carries a marker from the first pass
else:
marked["body"] = body_with_markers(
body, body_fp, code_fp, getattr(self, "max_comment_chars", None))
deduped.append(marked)
Comment thread
qodo-free-for-open-source-projects[bot] marked this conversation as resolved.
local_seen.add(body_fp)
if code_fp:
local_seen.add(code_fp)
pending_fingerprints.append((body_fp, code_fp))
Comment thread
IsmaelMartinez marked this conversation as resolved.
Comment thread
qodo-free-for-open-source-projects[bot] marked this conversation as resolved.
if skipped and not any(deduped):
get_logger().info(
f"Persistent inline comments: all {skipped} suggestion(s) "
f"already posted; nothing to publish")
return
comments = deduped
else:
comments = [
{key: value for key, value in comment.items() if key != dedup_code_fp_key}
if comment else comment
for comment in comments
]
try:
# publish all comments in a single message
self.pr.create_review(commit=self.last_commit_id, comments=comments)
# The whole batch posted; record its fingerprints so the rest of this
# run dedups against them. Cross-run dedup relies on the markers in the
# posted bodies, so comments the fallback below drops stay unrecorded
# and can be retried on a later run.
if store is not None:
for body_fp, code_fp in pending_fingerprints:
store.add(body_fp)
store.add(code_fp)
except Exception as e:
get_logger().info(f"Initially failed to publish inline comments as committable")

Expand All @@ -431,8 +495,8 @@ def publish_inline_comments(self, comments: list[dict], disable_fallback: bool =
self._publish_inline_comments_fallback_with_verification(comments)
except Exception as e:
get_logger().error(f"Failed to publish inline code comments fallback, error: {e}")
raise e
raise

def get_review_thread_comments(self, comment_id: int) -> list[dict]:
"""
Retrieves all comments in the same thread as the given comment.
Expand Down Expand Up @@ -558,7 +622,11 @@ def publish_code_suggestions(self, code_suggestions: list) -> bool:
"""
post_parameters_list = []

code_suggestions_validated = self.validate_comments_inside_hunks(code_suggestions)
code_suggestions_with_fingerprints = copy.deepcopy(code_suggestions)
for suggestion in code_suggestions_with_fingerprints:
suggestion["_dedup_code_fp"] = code_fingerprint(
suggestion.get("relevant_file", ""), None, suggestion.get("body", ""))
code_suggestions_validated = self.validate_comments_inside_hunks(code_suggestions_with_fingerprints)

for suggestion in code_suggestions_validated:
body = suggestion['body']
Expand All @@ -584,13 +652,15 @@ def publish_code_suggestions(self, code_suggestions: list) -> bool:
"line": relevant_lines_end,
"start_line": relevant_lines_start,
"start_side": "RIGHT",
"_dedup_code_fp": suggestion.get("_dedup_code_fp"),
}
else: # API is different for single line comments
post_parameters = {
"body": body,
"path": relevant_file,
"line": relevant_lines_start,
"side": "RIGHT",
"_dedup_code_fp": suggestion.get("_dedup_code_fp"),
}
post_parameters_list.append(post_parameters)

Expand Down
29 changes: 29 additions & 0 deletions pr_agent/git_providers/gitlab_provider.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@

from ..algo.file_filter import filter_ignored
from ..algo.git_patch_processing import decode_if_bytes
from ..algo.inline_comment_dedup import (body_fingerprint, body_with_markers,
code_fingerprint,
get_inline_comment_store)
from ..algo.language_handler import is_valid_file
from ..algo.utils import (clip_tokens,
find_line_number_of_relevant_line_in_file,
Expand Down Expand Up @@ -559,6 +562,23 @@ def send_inline_comment(self, body: str, edit_type: str, found: bool, relevant_f
if not found:
get_logger().info(f"Could not find position for {relevant_file} {relevant_line_in_file}")
else:
store = None
body_fp = code_fp = None
if get_settings().get("config.persistent_inline_comments", False):
store = get_inline_comment_store(self)
# Anchor the fingerprint on the line the comment is actually
# attached to: deletions anchor on the old line (source), all
# other edits on the new line (target).
anchor_line = source_line_no if edit_type == "deletion" else target_line_no
body_fp = body_fingerprint(relevant_file, anchor_line, body)
code_fp = code_fingerprint(relevant_file, anchor_line, body)
if store.seen(body_fp) or store.seen(code_fp):
get_logger().info(
f"Persistent inline comments: skipping duplicate inline "
f"comment on {relevant_file}:{anchor_line}")
return
body = body_with_markers(
body, body_fp, code_fp, getattr(self, "max_comment_chars", None))
# in order to have exact sha's we have to find correct diff for this change
diff = self.get_relevant_diff(relevant_file, relevant_line_in_file)
if diff is None:
Comment thread
qodo-free-for-open-source-projects[bot] marked this conversation as resolved.
Expand All @@ -578,6 +598,9 @@ def send_inline_comment(self, body: str, edit_type: str, found: bool, relevant_f
get_logger().debug(f"Creating comment in MR {self.id_mr} with body {body} and position {pos_obj}")
try:
self.mr.discussions.create({'body': body, 'position': pos_obj})
if store is not None:
store.add(body_fp)
store.add(code_fp)
except Exception as e:
try:
# fallback - create a general note on the file in the MR
Expand Down Expand Up @@ -617,6 +640,9 @@ def send_inline_comment(self, body: str, edit_type: str, found: bool, relevant_f
diff_code = f"\n\n```diff\n{patch.rstrip()}\n```"
body_fallback += diff_code

if store is not None:
body_fallback = body_with_markers(
body_fallback, body_fp, code_fp, getattr(self, "max_comment_chars", None))
# Create a general note on the file in the MR
self.mr.notes.create({
'body': body_fallback,
Expand All @@ -629,6 +655,9 @@ def send_inline_comment(self, body: str, edit_type: str, found: bool, relevant_f
}
})
get_logger().debug(f"Created fallback comment in MR {self.id_mr} with position {pos_obj}")
if store is not None:
store.add(body_fp)
store.add(code_fp)

# get_logger().debug(
# f"Failed to create comment in MR {self.id_mr} with position {pos_obj} (probably not a '+' line)")
Expand Down
4 changes: 4 additions & 0 deletions pr_agent/settings/configuration.toml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,10 @@ cli_mode=false
output_relevant_configurations=false
large_patch_policy = "clip" # "clip", "skip"
duplicate_prompt_examples = false
# Persistent inline comments (issue #2037): when true, the GitHub and GitLab providers
# fingerprint each inline comment, embed the fingerprint as an HTML marker, and skip
# re-posting suggestions already present on the PR/MR across runs.
persistent_inline_comments = false
Comment thread
IsmaelMartinez marked this conversation as resolved.
# seed
seed=-1 # set positive value to fix the seed (and ensure temperature=0)
temperature=0.2
Expand Down
Loading
Loading