Skip to content

feat: persistent inline comments to prevent cross-run duplicates#2424

Open
IsmaelMartinez wants to merge 15 commits into
The-PR-Agent:mainfrom
IsmaelMartinez:feature/persistent-inline-comments
Open

feat: persistent inline comments to prevent cross-run duplicates#2424
IsmaelMartinez wants to merge 15 commits into
The-PR-Agent:mainfrom
IsmaelMartinez:feature/persistent-inline-comments

Conversation

@IsmaelMartinez

@IsmaelMartinez IsmaelMartinez commented Jun 4, 2026

Copy link
Copy Markdown

What

Implements the feature requested in #2037: optional persistent inline comments, so the agent stops re-posting identical inline comments each time it runs on the same PR/MR (reported in particular on GitLab). Opt-in via a new config.persistent_inline_comments flag (default false); behaviour is unchanged when off.

How it works

When enabled, each inline comment carries a hidden HTML marker with a short fingerprint, e.g. <!-- pr-agent-dedup: a1b2c3d4e5f6 -->. On a later run the provider scans the existing comment bodies for those markers to rebuild the set of already-posted fingerprints, and skips any suggestion whose fingerprint is already present.

Two fingerprints are matched with OR semantics:

  • body: SHA-256 over (file, line, normalised first 80 chars) — the **Suggestion:** lead and [category, importance: N] tag are stripped and whitespace collapsed
  • code: SHA-256 over (file, line, normalised first ```suggestion block), or None when there is no code block

The OR-match catches a re-emitted finding whether the model rephrases the prose or slightly changes the proposed code. The marker-scan store needs no external infrastructure; a database/cache backend could implement the same load/seen/add interface.

Scope

  • GitHub (publish_inline_comments) and GitLab (send_inline_comment, including the general-note fallback). Other providers are unaffected: the listing adapter raises NotImplementedError, which degrades to no dedup.
  • New module pr_agent/algo/inline_comment_dedup.py; config flag in configuration.toml; docs in docs/docs/tools/improve.md.

Tests

tests/unittest/test_inline_comment_dedup.py (15 tests): fingerprint normalisation/stability, code-block extraction, marker building, the store (including load-failure degradation and unsupported provider), and GitHub/GitLab integration (filters already-seen, marks new, within-batch dedup, all-duplicates skips publish, flag-off leaves bodies unmarked). All pass; existing provider tests unaffected.

Notes

This is a native port of an approach we have been running in production on self-hosted GitLab (AWS Bedrock backend) since Monday this week (a few days, not a long soak yet). The GitLab integration here mirrors what is running there. The GitHub publish_inline_comments path is implemented and unit-tested but has not yet been exercised in production, so testing it, including on GHES, is especially valuable. @avidspartan1 — that GitHub path is the one to try.

Credits

The GitHub code-fingerprint fix is from @avidspartan1, who tested the publish_inline_comments path on GHES, diagnosed that validate_comments_inside_hunks rewrites a suggestion body when it has no valid hunk (so the code fingerprint recomputed afterwards no longer matched), and contributed the fix that captures the fingerprint before validation and threads it through. Folded in here as commit aefce47 with original authorship preserved, adding 3 GitHub-path tests (suite now 30).

Adds an opt-in config flag, persistent_inline_comments (default false). When
enabled, the GitHub and GitLab providers fingerprint each inline comment and
embed a hidden HTML marker in the posted body. On later runs the existing
comment bodies are scanned for those markers, so a suggestion already posted
is skipped instead of re-posted. This removes the duplicate inline comments
that accumulate when the agent runs repeatedly on the same PR/MR (reported
in particular on GitLab).

Two fingerprints are matched with OR semantics: a body fingerprint over
(file, line, normalised text) and a code fingerprint over the first
` ```suggestion ` block. The OR-match catches a re-emitted finding whether the
model rephrases the prose or slightly changes the proposed code. The
marker-scan store needs no external infrastructure; a database/cache backend
could implement the same load/seen/add interface.

Default behaviour is unchanged when the flag is off. Implements the feature
requested in The-PR-Agent#2037.
@github-actions github-actions Bot added the feature 💡 label Jun 4, 2026
@qodo-free-for-open-source-projects

Copy link
Copy Markdown
Contributor

Review Summary by Qodo

Persistent inline comments to prevent cross-run duplicates

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Implements persistent inline comments to prevent duplicate suggestions across runs
• Fingerprints comments with SHA-256 hashes (body and code) embedded as HTML markers
• Scans existing PR/MR comments to skip already-posted suggestions using OR-match logic
• Integrated into GitHub and GitLab providers with opt-in config flag (default false)
• Includes comprehensive unit tests and documentation
Diagram
flowchart LR
  A["Inline Comment"] --> B["Compute Fingerprints"]
  B --> C["Body FP: file+line+text"]
  B --> D["Code FP: file+line+code block"]
  C --> E["Build Markers"]
  D --> E
  E --> F["Embed in Comment Body"]
  F --> G["Post to PR/MR"]
  H["Later Run"] --> I["Scan Existing Comments"]
  I --> J["Extract Markers"]
  J --> K["Check Store"]
  K --> L{Already Posted?}
  L -->|Yes| M["Skip Suggestion"]
  L -->|No| N["Post with Markers"]

Loading

Grey Divider

File Changes

1. pr_agent/algo/inline_comment_dedup.py ✨ Enhancement +152/-0

Core deduplication logic and fingerprinting implementation

• New module implementing cross-run deduplication logic for inline comments
• Computes body and code fingerprints using SHA-256 hashing with normalization
• Provides InlineCommentStore class to track seen fingerprints across runs
• Implements provider-agnostic marker scanning via iter_existing_inline_comment_bodies()
• Supports GitHub and GitLab providers with graceful degradation for unsupported providers

pr_agent/algo/inline_comment_dedup.py


2. pr_agent/git_providers/github_provider.py ✨ Enhancement +26/-0

GitHub provider persistent inline comments integration

• Imports deduplication utilities and integrates into publish_inline_comments() method
• Filters comments through store to skip already-posted suggestions
• Appends fingerprint markers to new comments before publishing
• Tracks posted fingerprints in store for within-run and cross-run dedup
• Skips publish entirely if all suggestions are duplicates

pr_agent/git_providers/github_provider.py


3. pr_agent/git_providers/gitlab_provider.py ✨ Enhancement +20/-0

GitLab provider persistent inline comments integration

• Integrates deduplication into send_inline_comment() method
• Checks store before posting and skips duplicates with logging
• Appends markers to both primary discussion comments and fallback general notes
• Tracks fingerprints after successful comment creation
• Handles both inline discussion and fallback note posting paths

pr_agent/git_providers/gitlab_provider.py


View more (3)
4. tests/unittest/test_inline_comment_dedup.py 🧪 Tests +226/-0

Unit tests for inline comment deduplication feature

• Comprehensive unit test suite with 15+ test cases covering all dedup functionality
• Tests fingerprint normalization, code block extraction, and marker building
• Validates store behavior including load failures and unsupported providers
• Tests GitHub and GitLab provider integration with mocked API calls
• Verifies within-batch dedup, cross-run dedup, and flag-off behavior

tests/unittest/test_inline_comment_dedup.py


5. docs/docs/tools/improve.md 📝 Documentation +13/-0

Documentation for persistent inline comments feature

• Documents persistent inline comments feature and its purpose
• Explains fingerprinting approach with OR-match logic for body and code
• Provides configuration example with TOML syntax
• Notes provider support (GitHub, GitLab) and opt-in nature

docs/docs/tools/improve.md


6. pr_agent/settings/configuration.toml ⚙️ Configuration changes +4/-0

Configuration flag for persistent inline comments

• Adds new persistent_inline_comments configuration flag (default false)
• Includes explanatory comment referencing issue #2037
• Placed in config section alongside other publishing-related settings

pr_agent/settings/configuration.toml


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects

qodo-free-for-open-source-projects Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Code Review by Qodo

🐞 Bugs (6) 📘 Rule violations (2) 📎 Requirement gaps (0)

Context used

Grey Divider


Action required

1. Non-isort import inline_comment_dedup ✓ Resolved 📘 Rule violation ⚙ Maintainability
Description
The new multiline import from ..algo.inline_comment_dedup is not formatted in the repository’s
isort/Ruff style, which can cause lint failures and inconsistent import formatting across the
codebase.
Code

pr_agent/git_providers/github_provider.py[R20-22]

+from ..algo.inline_comment_dedup import (body_fingerprint, body_with_markers,
+                                         code_fingerprint,
+                                         get_inline_comment_store, has_marker)
Evidence
PR Compliance ID 10 requires imports to be grouped/ordered and formatted as isort/Ruff expects. The
added from ..algo.inline_comment_dedup import (...) block is split/indented in a non-isort format,
which can trigger Ruff isort violations (e.g., I001/I002) and thus fail CI/lint.

AGENTS.md: Python Code Must Conform to Ruff Style (Line Length 120, isort Import Ordering, Double Quotes): AGENTS.md: Python Code Must Conform to Ruff Style (Line Length 120, isort Import Ordering, Double Quotes): AGENTS.md: Python Code Must Conform to Ruff Style (Line Length 120, isort Import Ordering, Double Quotes): AGENTS.md: Python Code Must Conform to Ruff Style (Line Length 120, isort Import Ordering, Double Quotes)
pr_agent/git_providers/github_provider.py[20-22]
pr_agent/git_providers/gitlab_provider.py[17-19]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The newly added multiline import is not formatted in the repository’s isort/Ruff style.
## Issue Context
Ruff is configured to enforce isort checks (`I001/I002`). Import formatting that deviates from isort’s standard output may fail CI/lint.
## Fix Focus Areas
- pr_agent/git_providers/github_provider.py[20-22]
- pr_agent/git_providers/gitlab_provider.py[17-19]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Marker-included fingerprints ✓ Resolved 🐞 Bug ≡ Correctness
Description
GithubProvider.publish_inline_comments computes fingerprints from the full comment body before
checking whether it already contains dedup markers, so a pre-marked body can hash differently than
the embedded marker. This can cause later unmarked re-emissions (in the same run or later) not to be
recognized as duplicates, undermining dedup in fallback/repair paths.
Code

pr_agent/git_providers/github_provider.py[R430-455]

+                body = comment.get("body", "")
+                # GitHub committable comments are anchored by diff position, which
+                # shifts as the PR gains commits; anchor the fingerprint on the file
+                # path and comment content instead so it stays stable across runs.
+                body_fp = body_fingerprint(path, None, body)
+                code_fp = code_fingerprint(path, None, body)
+                # A fallback re-publish (disable_fallback=True) is for a comment
+                # that has not been posted yet, so do not filter it; only the
+                # top-level call drops duplicates. The fallback still gets marked
+                # and recorded below so it dedups on later runs.
+                if not disable_fallback and (
+                        store.seen(body_fp) or store.seen(code_fp)
+                        or body_fp in local_seen or (code_fp and code_fp in local_seen)):
+                    skipped += 1
+                    continue
+                if "<!-- pr-agent-dedup:" in body:
+                    marked = comment  # already carries a marker from the first pass
+                else:
+                    marked = dict(comment)
+                    marked["body"] = body_with_markers(
+                        body, body_fp, code_fp, getattr(self, "max_comment_chars", None))
+                deduped.append(marked)
+                local_seen.add(body_fp)
+                if code_fp:
+                    local_seen.add(code_fp)
+                pending_fingerprints.append((body_fp, code_fp))
Evidence
Fingerprints are computed from body before the code checks if the body already contains a dedup
marker, yet later logic explicitly allows passing already-marked bodies through unchanged. The
GitHub fallback repair path re-invokes publish_inline_comments(..., disable_fallback=True) with
bodies that can carry markers, so the pre-check hashing can include marker text and diverge from the
embedded marker fingerprint.

pr_agent/git_providers/github_provider.py[417-456]
pr_agent/git_providers/github_provider.py[521-546]
pr_agent/git_providers/github_provider.py[581-604]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`GithubProvider.publish_inline_comments()` computes `body_fp`/`code_fp` from `body` before it checks whether `body` already contains a `pr-agent-dedup` marker. When `publish_inline_comments()` is called with a body that already includes markers (possible in the invalid-comment repair / fallback republish flow), the computed fingerprint can include marker text (especially for short bodies), diverging from the fingerprint embedded in the marker. That breaks dedup because future unmarked re-emissions will compute the unmarked fingerprint and won’t match what was recorded.
## Issue Context
This is most likely to occur on the GitHub fallback path that re-publishes fixed invalid inline comments via `publish_inline_comments(..., disable_fallback=True)`, where `_try_fix_invalid_inline_comments()` only strips the suggestion block when present—so a comment without a suggestion block can retain markers.
## Fix Focus Areas
- pr_agent/git_providers/github_provider.py[417-456]
- pr_agent/git_providers/github_provider.py[521-604]
- pr_agent/algo/inline_comment_dedup.py[68-84]
### Implementation guidance
- Add a helper in `inline_comment_dedup.py` to remove existing dedup markers from a body before hashing (e.g., remove both BODY/CODE marker patterns).
- In `publish_inline_comments()`, compute fingerprints from the *stripped* body, not the raw body.
- Alternatively (or additionally), if markers are present, parse and reuse the marker fingerprints rather than re-hashing.
- Add a unit test covering the scenario: input comment already contains markers + no suggestion block; ensure dedup remains stable and store keys correspond to embedded markers.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. InlineCommentStore.load() swallows exceptions 📘 Rule violation ☼ Reliability
Description
InlineCommentStore.load() uses a broad except Exception and then continues with an empty store,
which can silently hide real provider/API failures and reduces debuggability. The log also omits
traceback context, making operational triage harder when comment listing breaks.
Code

pr_agent/algo/inline_comment_dedup.py[R137-148]

+        try:
+            for body in iter_existing_inline_comment_bodies(self._git_provider):
+                for marker_re in (BODY_MARKER_RE, CODE_MARKER_RE):
+                    for match in marker_re.finditer(body or ""):
+                        self._keys.add(match.group(1))
+        except Exception as e:
+            from pr_agent.log import get_logger
+            get_logger().info(
+                f"Persistent inline comments: could not load existing comments, "
+                f"within-run dedup only. error={e}"
+            )
+        self._loaded = True
Evidence
PR Compliance ID 18 requires narrow/explicit exception handling and logging with captured exception
context. The new code in InlineCommentStore.load() catches Exception broadly and proceeds
without re-raising, only logging a formatted message, which can swallow unexpected failures and
loses traceback context.

pr_agent/algo/inline_comment_dedup.py[137-148]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`InlineCommentStore.load()` currently catches `Exception` broadly and suppresses it, only emitting an `info` log with the stringified exception. This violates the requirement to use narrow, explicit exception handling and to capture exception objects in a way that preserves context for logging/debugging.
## Issue Context
This code runs in provider/API listing paths. When listing fails for reasons other than an expected/handled case (e.g., unsupported provider), suppressing the exception can mask real integration problems and make troubleshooting difficult.
## Fix Focus Areas
- pr_agent/algo/inline_comment_dedup.py[137-148]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (3)
4. GitHub dedup misses line 🐞 Bug ≡ Correctness
Description
In GithubProvider.publish_inline_comments the dedup fingerprints are computed with
target_line_no=None, so different inline comments on different lines in the same file can collide
and be incorrectly skipped. This can silently drop valid review suggestions when
persistent_inline_comments is enabled.
Code

pr_agent/git_providers/github_provider.py[R429-435]

+                path = comment.get("path", "")
+                body = comment.get("body", "")
+                # GitHub committable comments are anchored by diff position, which
+                # shifts as the PR gains commits; anchor the fingerprint on the file
+                # path and comment content instead so it stays stable across runs.
+                body_fp = body_fingerprint(path, None, body)
+                code_fp = code_fingerprint(path, None, body)
Evidence
The GitHub provider fingerprints comments with a None anchor line even though it already computes an
absolute line number during inline comment creation; this makes same-text comments in the same file
indistinguishable to the dedup layer.

pr_agent/git_providers/github_provider.py[402-415]
pr_agent/git_providers/github_provider.py[417-455]
pr_agent/algo/utils.py[1125-1197]
pr_agent/algo/inline_comment_dedup.py[84-89]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
GitHub inline-comment dedup currently fingerprints comments with `target_line_no=None`, which can make distinct inline comments in the same file dedupe incorrectly and be dropped.
## Issue Context
`create_inline_comment()` already computes an `absolute_position` (stable file line number derived from the patch) but the returned comment dict drops it. The dedup code also has an `inline_comment_line()` helper that can extract line-like anchors from comment dicts.
## Fix Focus Areas
- pr_agent/git_providers/github_provider.py[402-455]
- pr_agent/algo/utils.py[1125-1197]
- pr_agent/algo/inline_comment_dedup.py[84-89]
## Suggested implementation notes
- Include `absolute_position` in the dict returned by `create_inline_comment`.
- In `publish_inline_comments`, compute an anchor (prefer `absolute_position`, else fall back to `inline_comment_line(comment)`), and pass it to `body_fingerprint` / `code_fingerprint`.
- Add/adjust unit tests to ensure two comments with the same body in the same file but different `absolute_position` do not collide/dedupe.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. Unposted comments marked seen ✓ Resolved 🐞 Bug ≡ Correctness
Description
In GithubProvider.publish_inline_comments, all pending fingerprints are added to the
InlineCommentStore after the publish flow returns without raising, even though the fallback
verification path can swallow failures and drop comments. This can prevent those comments from being
retried later in the same run, resulting in missing inline suggestions when
persistent_inline_comments is enabled.
Code

pr_agent/git_providers/github_provider.py[R479-484]

+        # Record fingerprints only after a publish path has run without raising,
+        # so a failed publish does not block a retry of the same comment this run.
+        if store is not None:
+            for body_fp, code_fp in pending_fingerprints:
+                store.add(body_fp)
+                store.add(code_fp)
Evidence
The store is updated for every pending fingerprint after the publish attempt, regardless of whether
the fallback actually posted those comments. The fallback path can swallow publish failures and
continue, so the outer method may record fingerprints for comments that were never published.

pr_agent/git_providers/github_provider.py[417-485]
pr_agent/git_providers/github_provider.py[520-545]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`GithubProvider.publish_inline_comments` unconditionally records all `pending_fingerprints` after the publish path completes without raising. When the method enters the fallback verification flow, publishing can partially/fully fail without raising (due to swallowed exceptions), yet fingerprints are still recorded as seen.
### Issue Context
The fallback method `_publish_inline_comments_fallback_with_verification` can:
- swallow exceptions when publishing `verified_comments` in a batch
- swallow exceptions when publishing fixed one-liners
This means the outer method cannot assume that all comments corresponding to `pending_fingerprints` were actually posted.
### Fix Focus Areas
- pr_agent/git_providers/github_provider.py[417-485]
- pr_agent/git_providers/github_provider.py[520-545]
### Suggested fix approach
1. Only add fingerprints to the store for comments that are confirmed published.
2. Implement this by either:
- Moving fingerprint recording into the “success” path immediately after `self.pr.create_review(...)` succeeds; and
- Updating `_publish_inline_comments_fallback_with_verification` to return the subset of comments it actually posted (or the fingerprints), so the caller records only those; and/or
- Recording fingerprints inside the fallback method at the point of successful publish (batch publish: only after the create_review call returns without exception; one-by-one: only after the per-comment publish call returns without exception).
3. Avoid recording fingerprints for `verified_comments` if the batch create_review fails (currently swallowed).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


6. raise e loses traceback 📘 Rule violation ☼ Reliability
Description
In GithubProvider.publish_inline_comments, the fallback error handler re-raises with raise e,
which resets the traceback context and reduces debuggability. Use a bare raise to preserve the
original exception context when re-throwing.
Code

pr_agent/git_providers/github_provider.py[477]

+                raise e
Evidence
PR Compliance ID 16 requires exception handling that preserves context when re-raising; the modified
line raise e in the fallback handler discards the original traceback context instead of preserving
it via a bare raise.

pr_agent/git_providers/github_provider.py[474-477]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The code uses `raise e` inside an `except Exception as e:` block, which re-raises without preserving the original traceback context.
## Issue Context
The compliance standard requires preserving context when re-raising, and avoiding patterns that reduce diagnosability.
## Fix Focus Areas
- pr_agent/git_providers/github_provider.py[474-477]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

7. Markers can exceed max_chars 🐞 Bug ☼ Reliability ⭐ New
Description
body_with_markers() clips only the body and always appends the full marker suffix, so if the suffix
itself is longer than max_chars the returned string still exceeds max_chars and can be rejected by
provider APIs enforcing length limits.
Code

pr_agent/algo/inline_comment_dedup.py[R89-97]

+def body_with_markers(body: str, body_fp: str, code_fp: "Optional[str]",
+                      max_chars: "Optional[int]" = None) -> str:
+    """Append the dedup marker(s) to a comment body. If max_chars is given and
+    body + markers would exceed it, the body is clipped (never the markers) so
+    the fingerprint marker always survives for the next run's scan."""
+    suffix = f"\n\n{build_markers(body_fp, code_fp)}"
+    if max_chars and len(body) + len(suffix) > max_chars:
+        body = body[: max(0, max_chars - len(suffix))]
+    return f"{body}{suffix}"
Evidence
The implementation truncates body to max_chars - len(suffix) but always appends suffix
afterwards; when len(suffix) > max_chars, the slice becomes empty yet the full suffix is still
appended, so the result length remains greater than max_chars.

pr_agent/algo/inline_comment_dedup.py[89-97]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`body_with_markers()` is intended to keep dedup markers while respecting a provider comment-length limit (`max_chars`). Today it only truncates `body`, but never the marker suffix. If `len(suffix) > max_chars`, the function returns a string longer than `max_chars`, which can cause the downstream provider API call to fail (length-limit violation).

### Issue Context
This is an edge case because current GitHub/GitLab providers set `max_comment_chars = 65000`, but it becomes relevant if a provider changes its limit, a tighter limit is introduced, or a future provider reuses this helper with a smaller cap.

### Fix Focus Areas
- pr_agent/algo/inline_comment_dedup.py[89-97]

### Implementation notes
- Add an explicit guard for `max_chars is not None and len(suffix) > max_chars`.
- In that case, degrade gracefully, e.g.:
 - Try dropping the code marker first (only keep body marker) if that makes `suffix` fit.
 - If even the body marker cannot fit, return a clipped body without markers (and optionally log once) rather than returning an oversized string.
- Ensure the returned string length never exceeds `max_chars` when `max_chars` is provided.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


8. Code fingerprint flattens newlines 🐞 Bug ≡ Correctness
Description
code_fingerprint() collapses all whitespace (including newlines/indentation) into single spaces
before hashing, so distinct multi-line suggestion blocks can produce the same fingerprint and be
treated as duplicates. Because providers skip when either the body OR code fingerprint is already
seen, this can suppress legitimate new inline suggestions when config.persistent_inline_comments
is enabled.
Code

pr_agent/algo/inline_comment_dedup.py[R43-79]

+_WS_RE = re.compile(r"\s+")
+_CODE_BLOCK_RE = re.compile(r"```suggestion[^\n]*\n(.*?)```", re.DOTALL)
+
+
+def has_marker(body: str) -> bool:
+    """True only if the body carries a well-formed dedup marker (12-hex),
+    so incidental text mentioning the marker syntax is not mistaken for one."""
+    return bool(BODY_MARKER_RE.search(body or "") or CODE_MARKER_RE.search(body or ""))
+
+
+def _strip_markers(body: str) -> str:
+    """Remove embedded dedup markers so a pre-marked body fingerprints the
+    same as its original (markers are appended after marking)."""
+    body = BODY_MARKER_RE.sub("", body or "")
+    body = CODE_MARKER_RE.sub("", body)
+    return body
+
+
+def body_fingerprint(relevant_file: str, target_line_no, body: str) -> str:
+    normalised = _LEAD_RE.sub("", _strip_markers(body))
+    normalised = _TAG_RE.sub("", normalised)
+    normalised = _WS_RE.sub(" ", normalised).strip()[:80].lower()
+    key = f"{relevant_file}|{target_line_no}|{normalised}"
+    return hashlib.sha256(key.encode("utf-8")).hexdigest()[:12]
+
+
+def code_fingerprint(relevant_file: str, target_line_no, body: str) -> Optional[str]:
+    m = _CODE_BLOCK_RE.search(_strip_markers(body))
+    if not m:
+        return None
+    # Do not lower-case: code is case-sensitive, so case-only differences
+    # must produce distinct fingerprints.
+    code = _WS_RE.sub(" ", m.group(1)).strip()
+    if not code:
+        return None
+    key = f"{relevant_file}|{target_line_no}|code|{code}"
+    return hashlib.sha256(key.encode("utf-8")).hexdigest()[:12]
Evidence
The code fingerprint explicitly replaces all whitespace (\s+) with a single space before hashing,
erasing newlines/indentation. Both GitHub and GitLab integrations use store.seen(code_fp) as an OR
condition for skipping, so any unintended collision in code_fp can suppress a comment even if the
prose/body fingerprint differs.

pr_agent/algo/inline_comment_dedup.py[43-79]
pr_agent/git_providers/github_provider.py[433-447]
pr_agent/git_providers/gitlab_provider.py[565-579]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`code_fingerprint()` currently normalizes the extracted ```suggestion block by applying `re.compile(r"\s+")`, which removes newlines and indentation. This can cause different code suggestions (especially multi-line and whitespace-significant ones) to collide to the same fingerprint, and then be incorrectly skipped due to OR-based dedup.
## Issue Context
Providers treat the code fingerprint as an OR-match key (`store.seen(code_fp)`), so a collision in `code_fp` alone is sufficient to suppress a comment.
## Fix Focus Areas
- pr_agent/algo/inline_comment_dedup.py[43-79]
- pr_agent/git_providers/github_provider.py[433-447]
- pr_agent/git_providers/gitlab_provider.py[565-579]
## Implementation notes
- Change code normalization to preserve `\n` line breaks (and ideally leading indentation), while still being robust to inconsequential differences.
- Suggested approach:
- Normalize line endings (`\r\n` -> `\n`).
- Strip trailing whitespace per line.
- Optionally collapse runs of spaces/tabs *within* a line, but do not collapse across newlines.
- Update/extend unit tests to cover that newline/indentation-only differences do (or do not) change the fingerprint according to the chosen policy, and that semantically different multi-line blocks don’t collide.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


9. Fallback not stored 🐞 Bug ≡ Correctness
Description
In GithubProvider.publish_inline_comments, when the initial create_review fails and the code falls
back to _publish_inline_comments_fallback_with_verification, fingerprints for the
successfully-posted fallback comments are not added to InlineCommentStore. This can allow duplicates
later in the same run because the store is already loaded and won’t rescan newly-posted markers
until a later run.
Code

pr_agent/git_providers/github_provider.py[R462-475]

    try:
        # publish all comments in a single message
        self.pr.create_review(commit=self.last_commit_id, comments=comments)
+            # The whole batch posted; record its fingerprints so the rest of this
+            # run dedups against them. Cross-run dedup relies on the markers in the
+            # posted bodies, so comments the fallback below drops stay unrecorded
+            # and can be retried on a later run.
+            if store is not None:
+                for body_fp, code_fp in pending_fingerprints:
+                    store.add(body_fp)
+                    store.add(code_fp)
    except Exception as e:
        get_logger().info(f"Initially failed to publish inline comments as committable")
Evidence
Fingerprints are added to the store only after the initial batch create_review succeeds, but the
fallback path calls _publish_inline_comments_fallback_with_verification without any corresponding
store.add(...), so later calls in the same run won’t see those fingerprints in the already-loaded
store.

pr_agent/git_providers/github_provider.py[417-486]
pr_agent/git_providers/github_provider.py[521-545]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
When GitHub inline comment publishing falls back to `_publish_inline_comments_fallback_with_verification`, successfully-published comments are not recorded in the `InlineCommentStore`. This breaks *within-run* dedup after a fallback because `InlineCommentStore.load()` is already cached for the run.
### Issue Context
- The dedup/marking pass runs before the initial `create_review` attempt and the marked comments are passed into the fallback.
- The store is only updated on the initial batch success path.
- `_publish_inline_comments_fallback_with_verification` can publish verified comments via `self.pr.create_review(...)` and can also republish fixed comments by calling `publish_inline_comments(..., disable_fallback=True)`, but the successful verified batch publish is not reflected in the store.
### Fix Focus Areas
- pr_agent/git_providers/github_provider.py[417-486]
- pr_agent/git_providers/github_provider.py[521-545]
### Suggested implementation approach
1. In `_publish_inline_comments_fallback_with_verification`, after successfully calling `self.pr.create_review(..., comments=verified_comments)`, if `config.persistent_inline_comments` is enabled:
- obtain the store via `get_inline_comment_store(self)`
- for each published `comment` in `verified_comments`, recompute `body_fp = body_fingerprint(comment.get('path',''), None, comment.get('body',''))` and `code_fp = code_fingerprint(...)` and call `store.add(...)` for each.
2. Only add fingerprints when the fallback publish call succeeds (i.e., no exception). If the fallback publish is wrapped in a broad `try/except: pass`, adjust it to detect success so you don’t mark unseen comments as seen.
3. Add/adjust a unit test to cover: initial batch fails -> fallback publishes verified comment -> subsequent `publish_inline_comments` in same provider instance skips duplicate.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (3)
10. Brittle provider dispatch 🐞 Bug ⚙ Maintainability
Description
iter_existing_inline_comment_bodies dispatches by exact provider class name string, so
subclasses/wrappers of GithubProvider/GitLabProvider won’t be recognized and cross-run dedup will be
disabled for them. This makes the feature fragile to refactors and to provider
instrumentation/proxying patterns.
Code

pr_agent/algo/inline_comment_dedup.py[R108-133]

+def iter_existing_inline_comment_bodies(git_provider) -> Iterator[str]:
+    """Yield the body of every existing comment on the current PR/MR.
+
+    Dispatch is by provider class name so this module needs no provider
+    import. Unsupported providers raise NotImplementedError, which the store
+    treats as "cannot dedup here" and degrades to within-run dedup only.
+    """
+    provider_name = type(git_provider).__name__
+    if provider_name == "GithubProvider":
+        for comment in git_provider.pr.get_comments():
+            yield getattr(comment, "body", "") or ""
+    elif provider_name == "GitLabProvider":
+        for discussion in git_provider.mr.discussions.list(get_all=True):
+            attrs = getattr(discussion, "attributes", None) or {}
+            for note in attrs.get("notes", []) or []:
+                if isinstance(note, dict):
+                    yield note.get("body", "") or ""
+        # The committable-suggestion fallback posts via mr.notes.create, which
+        # may not surface as a discussion; scan plain notes too so their markers
+        # are seen on later runs.
+        for note in git_provider.mr.notes.list(get_all=True):
+            yield getattr(note, "body", "") or ""
+    else:
+        raise NotImplementedError(
+            f"inline-comment dedup not implemented for {provider_name}"
+        )
Evidence
Provider support is determined solely by type(git_provider).__name__ matching hard-coded strings,
which fails for subclasses/wrappers and routes them to the unsupported-provider path.

pr_agent/algo/inline_comment_dedup.py[108-133]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`iter_existing_inline_comment_bodies` uses `type(git_provider).__name__` equality checks to detect providers. Any subclass (e.g., `InstrumentedGithubProvider(GithubProvider)`) or wrapper/proxy type will not match and will be treated as unsupported, disabling cross-run dedup unexpectedly.
### Issue Context
This module avoided provider imports, but class-name checks are overly strict and create hidden coupling between the dedup module and provider class names.
### Fix Focus Areas
- pr_agent/algo/inline_comment_dedup.py[108-133]
### Suggested implementation approach
Replace class-name string comparisons with a more robust approach, e.g.:
- **Duck-typing:**
- GitHub: if `hasattr(git_provider, 'pr')` and `hasattr(git_provider.pr, 'get_comments')`
- GitLab: if `hasattr(git_provider, 'mr')` and `hasattr(git_provider.mr, 'discussions')` and `hasattr(git_provider.mr, 'notes')`
- Or, if acceptable, use `isinstance` with optional imports guarded to avoid hard dependency cycles.
Add a unit test for a subclassed provider to ensure dedup still works (i.e., markers are discovered instead of raising NotImplementedError).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


11. False marker detection skip ✓ Resolved 🐞 Bug ≡ Correctness
Description
In GithubProvider.publish_inline_comments, a comment is treated as “already marked” if the body
contains the substring "<!-- pr-agent-dedup:" anywhere, so a body that merely references that text
will skip marker insertion and won’t persistently dedup on later runs.
Code

pr_agent/git_providers/github_provider.py[R445-451]

+                if "<!-- pr-agent-dedup:" in body:
+                    marked = comment  # already carries a marker from the first pass
+                else:
+                    marked = dict(comment)
+                    marked["body"] = body_with_markers(
+                        body, body_fp, code_fp, getattr(self, "max_comment_chars", None))
+                deduped.append(marked)
Evidence
The GitHub provider only checks for a raw substring before deciding to skip marker insertion, while
the actual marker format is a stricter regex-defined HTML comment; this mismatch enables false
positives that leave comments unmarked.

pr_agent/git_providers/github_provider.py[429-451]
pr_agent/algo/inline_comment_dedup.py[38-45]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`GithubProvider.publish_inline_comments` uses a substring check to decide whether a comment body is already marked. If the suggestion body contains `<!-- pr-agent-dedup:` for unrelated reasons (example text, HTML snippet, etc.), the code skips adding the *actual* dedup markers for that suggestion, undermining cross-run dedup.
### Issue Context
Markers have a specific format (including a 12-hex fingerprint and closing `-->`). The decision should be based on those actual patterns (or on whether the computed fingerprint markers are present), not just the prefix substring.
### Fix Focus Areas
- pr_agent/git_providers/github_provider.py[445-451]
- pr_agent/algo/inline_comment_dedup.py[38-45]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


12. Untrusted marker suppression 🐞 Bug ⛨ Security
Description
InlineCommentStore.load() accepts dedup markers from all existing comment bodies without verifying
the author, so any user who can comment can inject a marker and cause PR-Agent to skip future inline
suggestions matching that fingerprint.
Code

pr_agent/algo/inline_comment_dedup.py[R102-123]

+def iter_existing_inline_comment_bodies(git_provider) -> Iterator[str]:
+    """Yield the body of every existing comment on the current PR/MR.
+
+    Dispatch is by provider class name so this module needs no provider
+    import. Unsupported providers raise NotImplementedError, which the store
+    treats as "cannot dedup here" and degrades to within-run dedup only.
+    """
+    provider_name = type(git_provider).__name__
+    if provider_name == "GithubProvider":
+        for comment in git_provider.pr.get_comments():
+            yield getattr(comment, "body", "") or ""
+    elif provider_name == "GitLabProvider":
+        for discussion in git_provider.mr.discussions.list(get_all=True):
+            attrs = getattr(discussion, "attributes", None) or {}
+            for note in attrs.get("notes", []) or []:
+                if isinstance(note, dict):
+                    yield note.get("body", "") or ""
+        # The committable-suggestion fallback posts via mr.notes.create, which
+        # may not surface as a discussion; scan plain notes too so their markers
+        # are seen on later runs.
+        for note in git_provider.mr.notes.list(get_all=True):
+            yield getattr(note, "body", "") or ""
Evidence
The implementation scans every existing comment/note body and unconditionally adds any matching
marker fingerprints to the seen-set, with no author/bot validation, enabling marker injection from
non-agent users.

pr_agent/algo/inline_comment_dedup.py[102-127]
pr_agent/algo/inline_comment_dedup.py[144-158]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The persistent-inline-comment store rebuilds the “seen fingerprints” set by scanning *all* existing comment bodies for marker patterns. Because it doesn’t filter to PR-Agent-authored comments (or otherwise authenticate markers), an untrusted commenter can inject markers and suppress future agent inline suggestions.
### Issue Context
`iter_existing_inline_comment_bodies` yields bodies from all GitHub PR comments / GitLab discussion notes and MR notes. `InlineCommentStore.load` trusts any marker it finds.
### Fix Focus Areas
- pr_agent/algo/inline_comment_dedup.py[102-127]
- pr_agent/algo/inline_comment_dedup.py[144-158]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Previous review results

Review updated until commit f6070fb

Results up to commit ef9a46b


🐞 Bugs (5) 📘 Rule violations (3) 📎 Requirement gaps (0)

Context used

Action required
1. Non-isort import inline_comment_dedup 📘 Rule violation ⚙ Maintainability ⭐ New
Description
The new multiline import from ..algo.inline_comment_dedup is not formatted in the repository’s
isort/Ruff style, which can cause lint failures and inconsistent import formatting across the
codebase.
Code

pr_agent/git_providers/github_provider.py[R20-22]

+from ..algo.inline_comment_dedup import (body_fingerprint, body_with_markers,
+                                         code_fingerprint,
+                                         get_inline_comment_store, has_marker)
Evidence
PR Compliance ID 10 requires imports to be grouped/ordered and formatted as isort/Ruff expects. The
added from ..algo.inline_comment_dedup import (...) block is split/indented in a non-isort format,
which can trigger Ruff isort violations (e.g., I001/I002) and thus fail CI/lint.

AGENTS.md: Python Code Must Conform to Ruff Style (Line Length 120, isort Import Ordering, Double Quotes)
pr_agent/git_providers/github_provider.py[20-22]
pr_agent/git_providers/gitlab_provider.py[17-19]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The newly added multiline import is not formatted in the repository’s isort/Ruff style.

## Issue Context
Ruff is configured to enforce isort checks (`I001/I002`). Import formatting that deviates from isort’s standard output may fail CI/lint.

## Fix Focus Areas
- pr_agent/git_providers/github_provider.py[20-22]
- pr_agent/git_providers/gitlab_provider.py[17-19]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Marker-included fingerprints ✓ Resolved 🐞 Bug ≡ Correctness
Description
GithubProvider.publish_inline_comments computes fingerprints from the full comment body before
checking whether it already contains dedup markers, so a pre-marked body can hash differently than
the embedded marker. This can cause later unmarked re-emissions (in the same run or later) not to be
recognized as duplicates, undermining dedup in fallback/repair paths.
Code

pr_agent/git_providers/github_provider.py[R430-455]

+                body = comment.get("body", "")
+                # GitHub committable comments are anchored by diff position, which
+                # shifts as the PR gains commits; anchor the fingerprint on the file
+                # path and comment content instead so it stays stable across runs.
+                body_fp = body_fingerprint(path, None, body)
+                code_fp = code_fingerprint(path, None, body)
+                # A fallback re-publish (disable_fallback=True) is for a comment
+                # that has not been posted yet, so do not filter it; only the
+                # top-level call drops duplicates. The fallback still gets marked
+                # and recorded below so it dedups on later runs.
+                if not disable_fallback and (
+                        store.seen(body_fp) or store.seen(code_fp)
+                        or body_fp in local_seen or (code_fp and code_fp in local_seen)):
+                    skipped += 1
+                    continue
+                if "<!-- pr-agent-dedup:" in body:
+                    marked = comment  # already carries a marker from the first pass
+                else:
+                    marked = dict(comment)
+                    marked["body"] = body_with_markers(
+                        body, body_fp, code_fp, getattr(self, "max_comment_chars", None))
+                deduped.append(marked)
+                local_seen.add(body_fp)
+                if code_fp:
+                    local_seen.add(code_fp)
+                pending_fingerprints.append((body_fp, code_fp))
Evidence
Fingerprints are computed from body before the code checks if the body already contains a dedup
marker, yet later logic explicitly allows passing already-marked bodies through unchanged. The
GitHub fallback repair path re-invokes publish_inline_comments(..., disable_fallback=True) with
bodies that can carry markers, so the pre-check hashing can include marker text and diverge from the
embedded marker fingerprint.

pr_agent/git_providers/github_provider.py[417-456]
pr_agent/git_providers/github_provider.py[521-546]
pr_agent/git_providers/github_provider.py[581-604]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`GithubProvider.publish_inline_comments()` computes `body_fp`/`code_fp` from `body` before it checks whether `body` already contains a `pr-agent-dedup` marker. When `publish_inline_comments()` is called with a body that already includes markers (possible in the invalid-comment repair / fallback republish flow), the computed fingerprint can include marker text (especially for short bodies), diverging from the fingerprint embedded in the marker. That breaks dedup because future unmarked re-emissions will compute the unmarked fingerprint and won’t match what was recorded.
## Issue Context
This is most likely to occur on the GitHub fallback path that re-publishes fixed invalid inline comments via `publish_inline_comments(..., disable_fallback=True)`, where `_try_fix_invalid_inline_comments()` only strips the suggestion block when present—so a comment without a suggestion block can retain markers.
## Fix Focus Areas
- pr_agent/git_providers/github_provider.py[417-456]
- pr_agent/git_providers/github_provider.py[521-604]
- pr_agent/algo/inline_comment_dedup.py[68-84]
### Implementation guidance
- Add a helper in `inline_comment_dedup.py` to remove existing dedup markers from a body before hashing (e.g., remove both BODY/CODE marker patterns).
- In `publish_inline_comments()`, compute fingerprints from the *stripped* body, not the raw body.
- Alternatively (or additionally), if markers are present, parse and reuse the marker fingerprints rather than re-hashing.
- Add a unit test covering the scenario: input comment already contains markers + no suggestion block; ensure dedup remains stable and store keys correspond to embedded markers.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. InlineCommentStore.load() swallows exceptions 📘 Rule violation ☼ Reliability
Description
InlineCommentStore.load() uses a broad except Exception and then continues with an empty store,
which can silently hide real provider/API failures and reduces debuggability. The log also omits
traceback context, making operational triage harder when comment listing breaks.
Code

pr_agent/algo/inline_comment_dedup.py[R137-148]

+        try:
+            for body in iter_existing_inline_comment_bodies(self._git_provider):
+                for marker_re in (BODY_MARKER_RE, CODE_MARKER_RE):
+                    for match in marker_re.finditer(body or ""):
+                        self._keys.add(match.group(1))
+        except Exception as e:
+            from pr_agent.log import get_logger
+            get_logger().info(
+                f"Persistent inline comments: could not load existing comments, "
+                f"within-run dedup only. error={e}"
+            )
+        self._loaded = True
Evidence
PR Compliance ID 18 requires narrow/explicit exception handling and logging with captured exception
context. The new code in InlineCommentStore.load() catches Exception broadly and proceeds
without re-raising, only logging a formatted message, which can swallow unexpected failures and
loses traceback context.

pr_agent/algo/inline_comment_dedup.py[137-148]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`InlineCommentStore.load()` currently catches `Exception` broadly and suppresses it, only emitting an `info` log with the stringified exception. This violates the requirement to use narrow, explicit exception handling and to capture exception objects in a way that preserves context for logging/debugging.
## Issue Context
This code runs in provider/API listing paths. When listing fails for reasons other than an expected/handled case (e.g., unsupported provider), suppressing the exception can mask real integration problems and make troubleshooting difficult.
## Fix Focus Areas
- pr_agent/algo/inline_comment_dedup.py[137-148]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (3)
4. GitHub dedup misses line 🐞 Bug ≡ Correctness
Description
In GithubProvider.publish_inline_comments the dedup fingerprints are computed with
target_line_no=None, so different inline comments on different lines in the same file can collide
and be incorrectly skipped. This can silently drop valid review suggestions when
persistent_inline_comments is enabled.
Code

pr_agent/git_providers/github_provider.py[R429-435]

+                path = comment.get("path", "")
+                body = comment.get("body", "")
+                # GitHub committable comments are anchored by diff position, which
+                # shifts as the PR gains commits; anchor the fingerprint on the file
+                # path and comment content instead so it stays stable across runs.
+                body_fp = body_fingerprint(path, None, body)
+                code_fp = code_fingerprint(path, None, body)
Evidence
The GitHub provider fingerprints comments with a None anchor line even though it already computes an
absolute line number during inline comment creation; this makes same-text comments in the same file
indistinguishable to the dedup layer.

pr_agent/git_providers/github_provider.py[402-415]
pr_agent/git_providers/github_provider.py[417-455]
pr_agent/algo/utils.py[1125-1197]
pr_agent/algo/inline_comment_dedup.py[84-89]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
GitHub inline-comment dedup currently fingerprints comments with `target_line_no=None`, which can make distinct inline comments in the same file dedupe incorrectly and be dropped.
## Issue Context
`create_inline_comment()` already computes an `absolute_position` (stable file line number derived from the patch) but the returned comment dict drops it. The dedup code also has an `inline_comment_line()` helper that can extract line-like anchors from comment dicts.
## Fix Focus Areas
- pr_agent/git_providers/github_provider.py[402-455]
- pr_agent/algo/utils.py[1125-1197]
- pr_agent/algo/inline_comment_dedup.py[84-89]
## Suggested implementation notes
- Include `absolute_position` in the dict returned by `create_inline_comment`.
- In `publish_inline_comments`, compute an anchor (prefer `absolute_position`, else fall back to `inline_comment_line(comment)`), and pass it to `body_fingerprint` / `code_fingerprint`.
- Add/adjust unit tests to ensure two comments with the same body in the same file but different `absolute_position` do not collide/dedupe.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. Unposted comments marked seen ✓ Resolved 🐞 Bug ≡ Correctness
Description
In GithubProvider.publish_inline_comments, all pending fingerprints are added to the
InlineCommentStore after the publish flow returns without raising, even though the fallback
verification path can swallow failures and drop comments. This can prevent those comments from being
retried later in the same run, resulting in missing inline suggestions when
persistent_inline_comments is enabled.
Code

pr_agent/git_providers/github_provider.py[R479-484]

+        # Record fingerprints only after a publish path has run without raising,
+        # so a failed publish does not block a retry of the same comment this run.
+        if store is not None:
+            for body_fp, code_fp in pending_fingerprints:
+                store.add(body_fp)
+                store.add(code_fp)
Evidence
The store is updated for every pending fingerprint after the publish attempt, regardless of whether
the fallback actually posted those comments. The fallback path can swallow publish failures and
continue, so the outer method may record fingerprints for comments that were never published.

pr_agent/git_providers/github_provider.py[417-485]
pr_agent/git_providers/github_provider.py[520-545]

Agent prompt
The issue below was found during a code review. Follow the provided context and guid...

…ests

Addresses self-review findings on the persistent-inline-comments change:

- Broaden the importance-tag regex so non-standard labels (hyphens, digits,
  capitals) are stripped from the body fingerprint, keeping it stable across
  runs instead of only for the lowercase-letter label set.
- Move the inline_comment_dedup import into the sorted ..algo group in the
  GitLab provider, matching the GitHub provider.
- Add provider-level tests: the cross-run code-fingerprint OR-match through
  GitHub (different prose, same suggestion block), the GitLab general-note
  fallback (marker append + fingerprint record), cross-run dedup via an
  existing-discussion marker scan, unsupported-provider degradation, and a
  non-standard-label fingerprint regression.
@qodo-free-for-open-source-projects

qodo-free-for-open-source-projects Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Code review by qodo was updated up to the latest commit b8b6383

Comment thread pr_agent/algo/inline_comment_dedup.py
Comment thread pr_agent/git_providers/github_provider.py Outdated
Comment thread pr_agent/settings/configuration.toml
Comment thread pr_agent/algo/inline_comment_dedup.py
…y logger import

Addresses the Qodo review on PR The-PR-Agent#2424:

- GitHub publish_inline_comments now records fingerprints only after a publish
  path runs without raising, tracks within-batch duplicates in a local set, and
  skips dedup on the disable_fallback re-publish. Previously the store was
  populated before create_review succeeded, so a 422 fallback retry could be
  wrongly skipped and silently drop a comment.
- GitHub fingerprints are anchored on (path, content) instead of the diff
  position, which shifts as the PR gains commits; this keeps the fingerprint
  stable across runs (the persistent behaviour the feature is about).
- inline_comment_dedup imports get_logger lazily inside the failure path, so the
  module no longer imports pr_agent.log at import time and can be imported
  standalone without the pre-existing log/config circular-import fragility. The
  test no longer needs an import-order workaround.
- isort-format the new multi-line imports in both providers.

The broad except in InlineCommentStore.load() is kept deliberately (fail-open:
dedup must never break comment publishing), matching existing provider patterns.
@qodo-free-for-open-source-projects

qodo-free-for-open-source-projects Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Code review by qodo was updated up to the latest commit f699e7e

@IsmaelMartinez

Copy link
Copy Markdown
Author

Thanks for the review. Addressed in f699e7ea:

  • GitHub anchor uses position: GitHub fingerprints are now anchored on (path, content) rather than the diff position, which shifts as the PR gains commits. This keeps them stable across runs, which is the cross-run behaviour the feature depends on.
  • Import-order circular risk: inline_comment_dedup now imports get_logger lazily inside the failure path, so it no longer imports pr_agent.log at module load. It imports standalone and the test dropped its import-order workaround.
  • Non-isort import formatting: the new multi-line imports are isort-formatted.

The two inline comments are answered in their threads. All 41 unit tests pass locally.

…ss runs

The committable-suggestion fallback posts via mr.notes.create, which may not
surface as a discussion. InlineCommentStore now also scans mr.notes.list, so a
fallback comment's marker is found on later runs and not re-posted. Adds a
regression test.
@qodo-free-for-open-source-projects

qodo-free-for-open-source-projects Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Code review by qodo was updated up to the latest commit f30ccba

Comment thread pr_agent/git_providers/gitlab_provider.py Outdated
Comment thread tests/unittest/test_inline_comment_dedup.py
Comment thread pr_agent/git_providers/github_provider.py
Comment thread pr_agent/git_providers/gitlab_provider.py
…rate log

- GitLab fingerprints anchor on the line the comment is attached to (source
  line for deletions, target line otherwise), so deletion comments dedup
  against their real anchor instead of always the target line.
- Dedup markers are appended via a clip-aware helper, so the marker survives
  when max_comment_chars would otherwise truncate the body and drop it.
- The all-duplicates log only fires when a comment was actually skipped as a
  duplicate, not when the deduped batch was merely empty.
- Wrap the long log f-string and overlong test lines.
@IsmaelMartinez

Copy link
Copy Markdown
Author

Addressed the latest review batch in 424387a5:

  • GitLab deletion line mismatch: fingerprints now anchor on the line the comment is actually attached to (source line for deletions, target line otherwise).
  • Markers added after clipping: markers are appended via a clip-aware helper, so the fingerprint marker survives even when max_comment_chars would truncate the body.
  • Misleading all-duplicates log: that log now only fires when a comment was actually skipped as a duplicate, not when the batch was empty.
  • Non-isort / overlong lines: wrapped the flagged log f-string and test lines.

On "GitHub anchor ignored": this is a deliberate trade-off and I have left it as is. The previous round flagged anchoring on the diff position as a cross-run-stability bug (position shifts as the PR gains commits), and the absolute file line is not present in the committable-comment payload without API-risky changes. Committable /improve suggestions almost always carry a ```suggestion block, so the code fingerprint disambiguates them; a collision needs two distinct suggestions on the same file with no code block and an identical normalised 80-char body prefix, which is rare. Happy to revisit if you would prefer position-based anchoring with the stability caveat.

44 unit tests pass locally.

@qodo-free-for-open-source-projects

qodo-free-for-open-source-projects Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Code Review by Qodo

Grey Divider

New Review Started

This review has been superseded by a new analysis

Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects

qodo-free-for-open-source-projects Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Code Review by Qodo

Grey Divider

New Review Started

This review has been superseded by a new analysis

Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects

qodo-free-for-open-source-projects Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Code review by qodo was updated up to the latest commit 194244f

Comment thread pr_agent/git_providers/github_provider.py Outdated
Comment thread pr_agent/git_providers/github_provider.py Outdated
Comment thread pr_agent/git_providers/github_provider.py
…trailing ws

- The invalid-comment fixer strips the ```suggestion block, which also dropped
  the dedup markers appended after it; the fallback then re-published via
  disable_fallback=True with dedup skipped, so those comments duplicated on
  later runs. Dedup now separates filtering from marking: the fallback path is
  not filtered (a not-yet-posted comment must not be skipped) but is still
  marked and recorded, and a body that already carries a marker is left as is.
- Wrap the all-duplicates info log to stay within the line-length limit.
- Remove trailing whitespace on the blank lines around the record block.
@qodo-free-for-open-source-projects

qodo-free-for-open-source-projects Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Code review by qodo was updated up to the latest commit a0869e0

Comment thread pr_agent/git_providers/github_provider.py Outdated
@qodo-free-for-open-source-projects

qodo-free-for-open-source-projects Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Code review by qodo was updated up to the latest commit b487346

Comment thread pr_agent/git_providers/github_provider.py Outdated
@qodo-free-for-open-source-projects

qodo-free-for-open-source-projects Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Code review by qodo was updated up to the latest commit 3dc8454

Comment thread pr_agent/algo/inline_comment_dedup.py
Comment thread pr_agent/git_providers/github_provider.py Outdated
@qodo-free-for-open-source-projects

qodo-free-for-open-source-projects Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Code review by qodo was updated up to the latest commit a2bac5f

Comment thread pr_agent/git_providers/github_provider.py
@qodo-free-for-open-source-projects

qodo-free-for-open-source-projects Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Code review by qodo was updated up to the latest commit a749828

@qodo-free-for-open-source-projects

qodo-free-for-open-source-projects Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Code review by qodo was updated up to the latest commit c9d7a17

@qodo-free-for-open-source-projects

qodo-free-for-open-source-projects Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Code review by qodo was updated up to the latest commit ef9a46b

Comment thread pr_agent/git_providers/github_provider.py
@qodo-free-for-open-source-projects

qodo-free-for-open-source-projects Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Code review by qodo was updated up to the latest commit f6070fb

@IsmaelMartinez

Copy link
Copy Markdown
Author

@naorpeled when you get a chance to look at this, one scope question. I have a follow-up in progress that adds a second, optional dedup tier for GitLab only: on top of the content fingerprinting in this PR, it also suppresses reworded duplicate inline comments by anchoring on the file and line position, so a finding that comes back slightly rephrased on a later push still gets recognised. It does not touch the GitHub path at all, that side keeps the line-independent behaviour from this PR unchanged.

It is a fair chunk of code though (most of it GitLab provider logic and its tests), so folding it into this PR would make it noticeably bigger and harder to review. Would you rather I keep it as a separate follow-up PR once this one lands, or include it here? I am happy either way, so whichever is easier for you to review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants