-
Notifications
You must be signed in to change notification settings - Fork 1.6k
feat: persistent inline comments to prevent cross-run duplicates #2424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
IsmaelMartinez
wants to merge
15
commits into
The-PR-Agent:main
Choose a base branch
from
IsmaelMartinez:feature/persistent-inline-comments
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
8fb9e4e
feat: persistent inline comments to prevent cross-run duplicates
IsmaelMartinez b8b6383
fix: harden inline-comment dedup fingerprint and add provider-level t…
IsmaelMartinez f699e7e
fix: address review - record-after-success, stable GitHub anchor, laz…
IsmaelMartinez f30ccba
fix: scan GitLab notes so fallback-comment dedup markers persist acro…
IsmaelMartinez 424387a
fix: round 2 review - GitLab deletion anchor, clip-safe markers, accu…
IsmaelMartinez 885ec20
test: dry up repeated settings stub into a helper to fix overlong lines
IsmaelMartinez 194244f
test: wrap remaining long lines in dedup tests
IsmaelMartinez a0869e0
fix: GitHub fallback re-publish keeps a dedup marker; wrap log; drop …
IsmaelMartinez b487346
fix: re-raise with bare raise in inline-comment fallback to preserve …
IsmaelMartinez 3dc8454
fix: record GitHub dedup fingerprints only on bulk publish success, n…
IsmaelMartinez a2bac5f
fix: keep code fingerprint case-sensitive (code identity must not low…
IsmaelMartinez a749828
fix: make dedup fingerprints marker-invariant (strip markers before h…
IsmaelMartinez c9d7a17
fix: only treat a comment as marked when it carries a well-formed ded…
IsmaelMartinez ef9a46b
Merge remote-tracking branch 'upstream/main' into sync/2424
IsmaelMartinez f6070fb
fix: preserve code fingerprint through hunk validation
avidspartan1 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,183 @@ | ||
| """Cross-run deduplication of inline (line-anchored) comments. | ||
|
|
||
| Implements the feature requested in issue #2037: when the agent runs more | ||
| than once on the same PR/MR, it re-posts identical inline suggestions on | ||
| every run, cluttering the discussion (observed in particular on GitLab). | ||
| This module fingerprints each inline comment and embeds the fingerprint as | ||
| an HTML-comment marker in the posted body. On later runs the existing | ||
| comment bodies are scanned for those markers to rebuild the set of | ||
| already-posted fingerprints, and any suggestion whose fingerprint is already | ||
| present is skipped. | ||
|
|
||
| Two fingerprints are computed per comment and matched with OR semantics: | ||
|
|
||
| - Body fingerprint: SHA-256 over (relevant_file, anchor line, normalised | ||
| first 80 characters of the body). The category/importance tag and the | ||
| ``**Suggestion:**`` lead are stripped and whitespace is collapsed first. | ||
| - Code fingerprint: SHA-256 over (relevant_file, anchor line, normalised | ||
| contents of the first ```suggestion fenced block). Returns None when the | ||
| body has no suggestion block, in which case matching falls back to the | ||
| body fingerprint alone. | ||
|
|
||
| The OR-match catches both "same prose, different code" and "same code, | ||
| different prose" re-emissions of the same defect, which are the two ways an | ||
| LLM tends to restate a finding across runs. | ||
|
|
||
| The feature is opt-in via ``config.persistent_inline_comments`` (default | ||
| false) and is wired into the GitHub and GitLab providers. The marker-scan | ||
| store needs no external infrastructure; a different backend (database, | ||
| cache) could populate the same load/seen/add interface. | ||
| """ | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| import hashlib | ||
| import re | ||
| from typing import Iterator, Optional | ||
|
|
||
| BODY_MARKER_RE = re.compile(r"<!-- pr-agent-dedup: ([a-f0-9]{12}) -->") | ||
| CODE_MARKER_RE = re.compile(r"<!-- pr-agent-dedup-code: ([a-f0-9]{12}) -->") | ||
|
|
||
| _LEAD_RE = re.compile(r"^\*\*Suggestion:\*\*\s*", re.IGNORECASE) | ||
| _TAG_RE = re.compile(r"\[[^\]]+?,\s*importance:\s*\d+\]", re.IGNORECASE) | ||
| _WS_RE = re.compile(r"\s+") | ||
| _CODE_BLOCK_RE = re.compile(r"```suggestion[^\n]*\n(.*?)```", re.DOTALL) | ||
|
|
||
|
|
||
| def has_marker(body: str) -> bool: | ||
| """True only if the body carries a well-formed dedup marker (12-hex), | ||
| so incidental text mentioning the marker syntax is not mistaken for one.""" | ||
| return bool(BODY_MARKER_RE.search(body or "") or CODE_MARKER_RE.search(body or "")) | ||
|
|
||
|
|
||
| def _strip_markers(body: str) -> str: | ||
| """Remove embedded dedup markers so a pre-marked body fingerprints the | ||
| same as its original (markers are appended after marking).""" | ||
| body = BODY_MARKER_RE.sub("", body or "") | ||
| body = CODE_MARKER_RE.sub("", body) | ||
| return body | ||
|
|
||
|
|
||
| def body_fingerprint(relevant_file: str, target_line_no, body: str) -> str: | ||
| normalised = _LEAD_RE.sub("", _strip_markers(body)) | ||
| normalised = _TAG_RE.sub("", normalised) | ||
| normalised = _WS_RE.sub(" ", normalised).strip()[:80].lower() | ||
| key = f"{relevant_file}|{target_line_no}|{normalised}" | ||
| return hashlib.sha256(key.encode("utf-8")).hexdigest()[:12] | ||
|
|
||
|
|
||
| def code_fingerprint(relevant_file: str, target_line_no, body: str) -> Optional[str]: | ||
| m = _CODE_BLOCK_RE.search(_strip_markers(body)) | ||
| if not m: | ||
| return None | ||
| # Do not lower-case: code is case-sensitive, so case-only differences | ||
| # must produce distinct fingerprints. | ||
| code = _WS_RE.sub(" ", m.group(1)).strip() | ||
| if not code: | ||
| return None | ||
| key = f"{relevant_file}|{target_line_no}|code|{code}" | ||
| return hashlib.sha256(key.encode("utf-8")).hexdigest()[:12] | ||
|
|
||
|
|
||
| def build_markers(body_fp: str, code_fp: Optional[str]) -> str: | ||
| markers = [f"<!-- pr-agent-dedup: {body_fp} -->"] | ||
| if code_fp is not None: | ||
| markers.append(f"<!-- pr-agent-dedup-code: {code_fp} -->") | ||
| return "\n".join(markers) | ||
|
|
||
|
|
||
| def body_with_markers(body: str, body_fp: str, code_fp: "Optional[str]", | ||
| max_chars: "Optional[int]" = None) -> str: | ||
| """Append the dedup marker(s) to a comment body. If max_chars is given and | ||
| body + markers would exceed it, the body is clipped (never the markers) so | ||
| the fingerprint marker always survives for the next run's scan.""" | ||
| suffix = f"\n\n{build_markers(body_fp, code_fp)}" | ||
| if max_chars and len(body) + len(suffix) > max_chars: | ||
| body = body[: max(0, max_chars - len(suffix))] | ||
| return f"{body}{suffix}" | ||
|
|
||
|
|
||
| def inline_comment_line(comment: dict): | ||
| """Best-effort anchor line for a GitHub inline-comment dict.""" | ||
| for key in ("line", "position", "start_line"): | ||
| if comment.get(key) is not None: | ||
| return comment[key] | ||
| return None | ||
|
|
||
|
|
||
| def iter_existing_inline_comment_bodies(git_provider) -> Iterator[str]: | ||
| """Yield the body of every existing comment on the current PR/MR. | ||
|
|
||
| Dispatch is by provider class name so this module needs no provider | ||
| import. Unsupported providers raise NotImplementedError, which the store | ||
| treats as "cannot dedup here" and degrades to within-run dedup only. | ||
| """ | ||
| provider_name = type(git_provider).__name__ | ||
| if provider_name == "GithubProvider": | ||
| for comment in git_provider.pr.get_comments(): | ||
| yield getattr(comment, "body", "") or "" | ||
| elif provider_name == "GitLabProvider": | ||
| for discussion in git_provider.mr.discussions.list(get_all=True): | ||
| attrs = getattr(discussion, "attributes", None) or {} | ||
| for note in attrs.get("notes", []) or []: | ||
| if isinstance(note, dict): | ||
| yield note.get("body", "") or "" | ||
| # The committable-suggestion fallback posts via mr.notes.create, which | ||
| # may not surface as a discussion; scan plain notes too so their markers | ||
| # are seen on later runs. | ||
| for note in git_provider.mr.notes.list(get_all=True): | ||
| yield getattr(note, "body", "") or "" | ||
| else: | ||
| raise NotImplementedError( | ||
| f"inline-comment dedup not implemented for {provider_name}" | ||
| ) | ||
|
|
||
|
|
||
| class InlineCommentStore: | ||
| """Set of already-posted inline-comment fingerprints for one PR/MR. | ||
|
|
||
| The existing comment bodies are scanned lazily on first lookup and the | ||
| seen-set is held in memory for the rest of the run. A failure to list | ||
| existing comments degrades to within-run dedup only and never raises | ||
| into the publish path. | ||
| """ | ||
|
|
||
| def __init__(self, git_provider): | ||
| self._git_provider = git_provider | ||
| self._keys: set = set() | ||
| self._loaded = False | ||
|
|
||
| def load(self) -> set: | ||
| if self._loaded: | ||
| return self._keys | ||
| try: | ||
| for body in iter_existing_inline_comment_bodies(self._git_provider): | ||
| for marker_re in (BODY_MARKER_RE, CODE_MARKER_RE): | ||
| for match in marker_re.finditer(body or ""): | ||
| self._keys.add(match.group(1)) | ||
| except Exception as e: | ||
| from pr_agent.log import get_logger | ||
| get_logger().info( | ||
| f"Persistent inline comments: could not load existing comments, " | ||
| f"within-run dedup only. error={e}" | ||
| ) | ||
| self._loaded = True | ||
|
IsmaelMartinez marked this conversation as resolved.
IsmaelMartinez marked this conversation as resolved.
|
||
| return self._keys | ||
|
|
||
| def seen(self, fingerprint: Optional[str]) -> bool: | ||
| if fingerprint is None: | ||
| return False | ||
| return fingerprint in self.load() | ||
|
|
||
| def add(self, fingerprint: Optional[str]) -> None: | ||
| if fingerprint is not None: | ||
| self._keys.add(fingerprint) | ||
|
|
||
|
|
||
| def get_inline_comment_store(git_provider) -> InlineCommentStore: | ||
| """Return the per-provider store, creating and caching it on first use.""" | ||
| store = getattr(git_provider, "_inline_comment_store", None) | ||
| if store is None: | ||
| store = InlineCommentStore(git_provider) | ||
| git_provider._inline_comment_store = store | ||
| return store | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.