Skip to content

Fix duplicate attachment sync upserts#108

Open
agent-eli wants to merge 1 commit into
openclaw:mainfrom
garibong-labs:fix/attachment-upsert-sync
Open

Fix duplicate attachment sync upserts#108
agent-eli wants to merge 1 commit into
openclaw:mainfrom
garibong-labs:fix/attachment-upsert-sync

Conversation

@agent-eli

Copy link
Copy Markdown

Summary

  • upsert message_attachments on duplicate attachment_id instead of failing sync
  • preserve previously fetched media cache fields when a duplicate attachment row moves to another message
  • add a regression test for duplicate attachment IDs across messages

Verification

  • GOWORK=off go test ./...
  • locally rebuilt discrawl and verified discrawl sync --source discord no longer fails on the duplicate attachment constraint

@clawsweeper

clawsweeper Bot commented Jun 22, 2026

Copy link
Copy Markdown

Codex review: needs real behavior proof before merge. Reviewed June 22, 2026, 10:05 AM ET / 14:05 UTC.

Summary
The PR changes attachment inserts to upsert duplicate attachment IDs, preserve media cache fields during that update, and add a duplicate-ID regression test.

Reproducibility: yes. from source inspection: current main declares message_attachments.attachment_id as a primary key and uses a plain insert, so a second row with the same attachment ID will fail the insert path.

Review metrics: 2 noteworthy metrics.

  • Stored Write Surface: 1 SQL insert changed. The PR changes durable attachment write semantics, so duplicate-row behavior needs maintainer review before merge.
  • Regression Coverage: 1 test added. The test proves the new last-writer-wins behavior, including removal of the earlier message association.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🧂 unranked krab
Patch quality: 🧂 unranked krab
Result: blocked until real behavior proof is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P1] Add real after-fix proof such as redacted terminal output or logs from discrawl sync --source discord showing the duplicate attachment case no longer fails.
  • Resolve the duplicate-row behavior so each affected message keeps its attachment association, or get explicit maintainer approval for last-writer-wins storage.

Proof guidance:

  • [P1] Needs real behavior proof before merge: The PR body lists local verification, but it does not include inspectable after-fix proof; add redacted terminal output or logs to the PR body so a fresh ClawSweeper review can verify the real sync path. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Risk before merge

  • [P1] Merging this PR changes duplicate attachment handling from fail-fast to last-writer-wins, which can silently remove the earlier message's attachment association from the local archive.
  • [P1] The PR body reports local verification but provides no inspectable after-fix terminal output, logs, screenshot, recording, or artifact.

Maintainer options:

  1. Preserve Per-Message Attachment Rows
    Change the storage model or row identity so duplicate attachment IDs can be associated with every message that contains them while still reusing cached media metadata when appropriate.
  2. Approve Last-Writer-Wins Explicitly
    A maintainer can accept the current behavior only if Discrawl intentionally treats attachment_id as the canonical singleton and documents that earlier duplicate associations are dropped.
  3. Fold Into Failure-Ledger Design
    Pause or close this PR if maintainers want duplicate attachment insert errors handled first by the broader row-level failure diagnostics work.

Next step before merge

  • [P1] A maintainer should decide whether duplicate attachment IDs mean last-writer-wins rows are acceptable or whether the schema needs a compatibility-sensitive row identity change.

Security
Cleared: The diff only changes local SQLite store SQL and a regression test; no concrete security or supply-chain concern was found.

Review findings

  • [P1] Preserve both message attachment associations — internal/store/sqlc/queries.sql:387-389
Review details

Best possible solution:

Preserve per-message attachment associations while reusing media cache by attachment ID, or land an explicit maintainer-approved last-writer-wins policy with focused upgrade and regression coverage.

Do we have a high-confidence way to reproduce the issue?

Yes from source inspection: current main declares message_attachments.attachment_id as a primary key and uses a plain insert, so a second row with the same attachment ID will fail the insert path.

Is this the best way to solve the issue?

No. The upsert is narrow, but it avoids the constraint failure by moving the only attachment row to the later message, which is not the safest archive behavior without an explicit maintainer decision.

Full review comments:

  • [P1] Preserve both message attachment associations — internal/store/sqlc/queries.sql:387-389
    On the duplicate IDs this PR targets, the conflict update rewrites message_id and channel_id on the only row, and the new test asserts the first message has no attachment afterward. That turns a sync failure into silent archive metadata loss for the earlier message; store each message association, or get explicit maintainer approval for last-writer-wins semantics, instead of moving the row implicitly.
    Confidence: 0.87

Overall correctness: patch is incorrect
Overall confidence: 0.86

AGENTS.md: not found in the target repository.

Codex review notes: model internal, reasoning high; reviewed against 4c5630d48a1b.

Label changes

Label changes:

  • add P2: This is a normal-priority storage bug fix for a duplicate attachment sync failure with limited blast radius.
  • add merge-risk: 🚨 compatibility: The PR changes persisted attachment-row semantics for existing archives by reassigning duplicate attachment rows instead of preserving every message association.
  • add rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🧂 unranked krab.
  • add status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body lists local verification, but it does not include inspectable after-fix proof; add redacted terminal output or logs to the PR body so a fresh ClawSweeper review can verify the real sync path. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.

Label justifications:

  • P2: This is a normal-priority storage bug fix for a duplicate attachment sync failure with limited blast radius.
  • merge-risk: 🚨 compatibility: The PR changes persisted attachment-row semantics for existing archives by reassigning duplicate attachment rows instead of preserving every message association.
  • rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🧂 unranked krab.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body lists local verification, but it does not include inspectable after-fix proof; add redacted terminal output or logs to the PR body so a fresh ClawSweeper review can verify the real sync path. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, the PR author or someone with repository write access can comment @clawsweeper re-review.
Evidence reviewed

What I checked:

  • Current schema uses attachment_id as the row identity: message_attachments.attachment_id is the primary key on current main, so two rows with the same attachment ID cannot both be represented. (internal/store/sqlc/schema.sql:73, 4c5630d48a1b)
  • Current main inserts attachments without conflict handling: InsertMessageAttachment is a plain insert on current main, which makes a duplicate attachment_id fail during sync/import writes. (internal/store/sqlc/queries.sql:382, 4c5630d48a1b)
  • Attachment replacement path returns insert errors directly: replaceAttachmentsTx deletes rows for the current message and then calls InsertMessageAttachment; a duplicate ID already stored on another message is not removed first. (internal/store/write.go:399, 4c5630d48a1b)
  • PR upsert rewrites message ownership on conflict: The PR's conflict update assigns message_id, guild_id, and channel_id from the later row, so the single stored attachment row moves to the later message. (internal/store/sqlc/queries.sql:386, 80cf620a186c)
  • Added test confirms earlier message loses its attachment: The new regression test explicitly expects ListAttachments for the first message to be empty after syncing a duplicate ID on a second message. (internal/store/attachments_test.go:110, 80cf620a186c)
  • Related broader diagnostic issue is only partial overlap: The related issue discusses duplicate attachment insert errors as one example, but its request is a durable row-level failure ledger before broad behavior changes, not this upsert behavior.

Likely related people:

  • steipete: History and GitHub commit metadata connect this handle to the original attachment indexing, attachment media cache support, sqlc store write refactor, and current release snapshot for the affected files. (role: recent area contributor; confidence: high; commits: 408fb67b15b2, 724adb0088ae, bcd2929b8ac6; files: internal/store/sqlc/queries.sql, internal/store/write.go, internal/store/sqlc/schema.sql)
  • vincentkoc: Recent commits touched store write/query paths and attachment media behavior adjacent to this duplicate attachment storage path. (role: recent adjacent contributor; confidence: medium; commits: 0a247939e841, 87df3f45023b, a5526d2e7cd9; files: internal/discorddesktop/import.go, internal/media/cache.go, internal/store/write.go)
  • hannesrudolph: Recent syncer checkpoint work touched message sync and store write paths adjacent to how sync retries and progress state interact with write failures. (role: adjacent sync contributor; confidence: medium; commits: 265a76cb90e3; files: internal/syncer/message_sync.go, internal/store/write.go)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal priority bug or improvement with limited blast radius. merge-risk: 🚨 compatibility 🚨 Merging this PR could break existing users, config, migrations, defaults, or upgrades. labels Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 compatibility 🚨 Merging this PR could break existing users, config, migrations, defaults, or upgrades. P2 Normal priority bug or improvement with limited blast radius. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant