Skip to content

fix(project): map Project.encounters as a Set to stop PROJECT_ENCOUNTERS IDX race#1597

Open
JasonWildMe wants to merge 1 commit into
mainfrom
fix/project-encounters-set
Open

fix(project): map Project.encounters as a Set to stop PROJECT_ENCOUNTERS IDX race#1597
JasonWildMe wants to merge 1 commit into
mainfrom
fix/project-encounters-set

Conversation

@JasonWildMe

Copy link
Copy Markdown
Collaborator

Problem

Bulk imports on zebra failed at row 0 with the generic "Your task failed to process due to an error." The real cause (from the container log):

org.ecocean.Project.addEncounter (Project.java)
  → INSERT INTO PROJECT_ENCOUNTERS (ID_OID, CATALOGNUMBER_EID, IDX)
  → duplicate key value violates unique constraint "PROJECT_ENCOUNTERS_pkey"
    Detail: Key (ID_OID, IDX)=(d5d951af-…, 5890) already exists.

Project.encounters is a JDO ordered List, so PROJECT_ENCOUNTERS carries an IDX ordering column with PK (ID_OID, IDX). DataNucleus derives the next IDX from the project's collection snapshot, which is eagerly cached (jdoconfig: cache.collections.lazy = false). When two transactions write membership to the same project with overlapping lifetimes, both compute the same index and the second INSERT collides. The loser rolls back, leaving the table gap-free — which is exactly what production showed (COUNT=6075, MIN=0, MAX=6074, contiguous → race, not corruption).

This does not require two simultaneous bulk imports: any membership writers (another import, the detection/IA results path, a manual "add to project", StandardImport) whose transactions overlap will collide, because the index is computed from a stale cached snapshot rather than at flush time.

Fix

Project membership has no meaningful order, so map it as an unordered Set, removing the IDX column and the entire class of collision.

  • Project.encounters: List<Encounter>Set<Encounter> (LinkedHashSet). JDO then omits IDX and keys the join on (ID_OID, CATALOGNUMBER_EID). No package.jdo change needed — the Java field type drives List-vs-Set in DataNucleus 5.2.7.
  • addEncounter rejects null-id encounters (Encounter.hashCode() is random for a null catalogNumber) and relies on Set.add() dedup. Encounter inherits id-based equals() from Base, consistent with its id-based hashCode(), so dedup is correct and O(1).
  • getEncounters() still returns List<Encounter> (a detached copy), so callers that assign to List or index it are unaffected; no caller mutates the result.
  • New Project.containsEncounter() (O(1) membership); hot membership-check callers (ProjectUpdate, ProjectGet, iaResultsAnnotFeed.jsp) switched to it instead of copying the whole collection to call .contains().
  • ProjectTest: id-based dedup, null-id rejection, remove-by-id, copy semantics, batch dedup.

⚠️ Required DB migration (per instance, BEFORE deploying)

The old (List) code needs IDX; the new (Set) code never writes it — they must not run against the table at once. Stop the old WAR, migrate, then start the new WAR. Full SQL + prechecks + rollback in docs/plans/2026-06-01-project-encounters-list-to-set.md:

-- after stopping old code; backup first
ALTER TABLE "PROJECT_ENCOUNTERS" DROP CONSTRAINT "PROJECT_ENCOUNTERS_pkey";
ALTER TABLE "PROJECT_ENCOUNTERS" DROP COLUMN "IDX";
ALTER TABLE "PROJECT_ENCOUNTERS" ADD CONSTRAINT "PROJECT_ENCOUNTERS_pkey"
    PRIMARY KEY ("ID_OID", "CATALOGNUMBER_EID");
CREATE INDEX IF NOT EXISTS "PROJECT_ENCOUNTERS_N1" ON "PROJECT_ENCOUNTERS" ("CATALOGNUMBER_EID");

(Prechecks: no NULL CATALOGNUMBER_EID, no duplicate (ID_OID, CATALOGNUMBER_EID).)

Testing

  • mvn test -Dtest=ProjectTest6/6 pass (compiles + JDO enhancement).
  • Pre-merge on staging: run two concurrent bulk imports into one project; both complete with no PROJECT_ENCOUNTERS_pkey error.

Review

Design and implementation both reviewed by Codex (separate passes). Design review surfaced 5 findings (deploy-stop-first, migration prechecks/element-index/0-based rollback, null-id hashCode guard, copy-vs-allocation, ordering claim) — all incorporated. Implementation review: Go, only two LOW non-blocking notes (unit test can't prove schema/concurrency without staging; a pre-existing null-assumption that isn't a regression).

Follow-ups (out of scope)

  • Project.users has the identical List+join pattern and the same latent race.
  • Separate PR persists background bulk-import failure causes to ImportTask.ERRORS so this kind of failure is visible without log-diving.

🤖 Generated with Claude Code

…ERS IDX race

Project.encounters was a JDO ordered List, so the PROJECT_ENCOUNTERS join table
carried an IDX ordering column with PK (ID_OID, IDX). DataNucleus derives the next
IDX from the (eagerly cached — jdoconfig cache.collections.lazy=false) collection
snapshot, so when two transactions write membership for the same project with
overlapping lifetimes, both compute the same index and the second INSERT dies with
"duplicate key violates PROJECT_ENCOUNTERS_pkey". On zebra this failed bulk imports
at row 0 (Project.addEncounter) with the generic "task failed to process" message.

Project membership has no meaningful order, so this maps it as an unordered Set:

- Project.encounters: List<Encounter> -> Set<Encounter> (LinkedHashSet); JDO then
  omits the IDX column and keys the join on (ID_OID, CATALOGNUMBER_EID). No
  package.jdo change needed — the Java field type drives List-vs-Set in DataNucleus.
- addEncounter rejects null-id encounters (Encounter.hashCode() is random for a null
  catalogNumber) and relies on Set.add() dedup; Encounter inherits id-based equals()
  from Base, consistent with its id-based hashCode(), so dedup is correct and O(1).
- getEncounters() keeps returning List<Encounter> (a detached copy) so callers/JSPs
  that assign to List or index it are unaffected; no caller mutates the result.
- Add Project.containsEncounter() (O(1) Set membership) and switch the hot
  membership-check callers (ProjectUpdate, ProjectGet, iaResultsAnnotFeed.jsp) to it
  instead of allocating a full copy via getEncounters().contains().
- ProjectTest: id-based dedup, null-id rejection, remove-by-id, copy semantics, batch.

Requires a one-time DB migration per instance BEFORE deploying (the old List code
needs IDX; the new Set code never writes it) — see
docs/plans/2026-06-01-project-encounters-list-to-set.md: stop old WAR, backup, drop
PROJECT_ENCOUNTERS_pkey, drop IDX, add PK (ID_OID, CATALOGNUMBER_EID), add reverse
index on CATALOGNUMBER_EID, then start the new WAR.

Project.users has the identical List+join pattern and the same latent race; left as a
follow-up to keep this change focused.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codecov-commenter

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 51.17%. Comparing base (7085838) to head (80dff64).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1597   +/-   ##
=======================================
  Coverage   51.17%   51.17%           
=======================================
  Files         308      308           
  Lines       12073    12073           
  Branches     3919     3910    -9     
=======================================
  Hits         6178     6178           
- Misses       5586     5593    +7     
+ Partials      309      302    -7     
Flag Coverage Δ
backend 51.17% <ø> (ø)
frontend 51.17% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants