WildMeOrg · JasonWildMe · Jun 1, 2026
diff --git a/docs/plans/2026-06-01-project-encounters-list-to-set.md b/docs/plans/2026-06-01-project-encounters-list-to-set.md
@@ -0,0 +1,207 @@
+# Design: Project.encounters List → Set (eliminate PROJECT_ENCOUNTERS IDX race)
+
+**Date:** 2026-06-01
+**Branch:** `fix/project-encounters-set` (off `main`)
+**Status:** design — pending Codex review
+
+## Problem
+
+Bulk imports on zebra.wildme.org fail at row 0 with the generic "Your task failed
+to process due to an error." The real cause, from the container log:
+
+```
+org.ecocean.Project.addEncounter (Project.java:234)
+  → INSERT INTO PROJECT_ENCOUNTERS (ID_OID, CATALOGNUMBER_EID, IDX)
+  → ERROR: duplicate key value violates unique constraint "PROJECT_ENCOUNTERS_pkey"
+    Detail: Key (ID_OID, IDX)=(d5d951af-…, 5890) already exists.
+```
+
+`Project.encounters` is a Java `List` (`ArrayList`) mapped via JDO `<collection><join/>`.
+Because the field type is `List`, DataNucleus makes it an **indexed list**: the join
+table `PROJECT_ENCOUNTERS` gets an `IDX` ordering column and a primary key of
+`(ID_OID, IDX)`. When two threads add encounters to the same project concurrently
+(e.g. two bulk-import background threads, which we had in flight), each computes the
+same next index `N = current size`, both `INSERT … IDX = N`; one wins, the other dies
+on the PK. The loser's transaction rolls back, so the table stays gap-free — which is
+exactly what we observe in production: `SELECT COUNT(*), MIN(IDX), MAX(IDX)` returned
+`6075, 0, 6074` (contiguous → race, not corruption).
+
+Project membership has **no meaningful order**, so the `IDX` column buys nothing and
+creates this whole class of collision.
+
+## Goal / non-goals
+
+**Goal:** Remove the ordering column so concurrent adds of *distinct* encounters can no
+longer collide. Map `Project.encounters` as an unordered `Set`, with join-table PK
+`(ID_OID, CATALOGNUMBER_EID)`.
+
+**Non-goals:**
+- Not changing `Encounter.equals()`/`hashCode()` globally (high blast radius; see Risks).
+- Not changing the public `Project.getEncounters()` return type (keep caller churn ~0).
+- `Project.users` has the identical List+join mapping and the same latent race. **Out of
+  scope** for this PR to keep it small and reviewable; flagged as a recommended identical
+  follow-up.
+
+## Design
+
+### 1. Java: `Project.java`
+
+- Field: `private List<Encounter> encounters` → `private Set<Encounter> encounters`,
+  using `LinkedHashSet`. (Insertion order is deterministic within a session; note that
+  after the `IDX` column is dropped the *cross-load* DB order is unspecified — that is
+  correct set semantics, and no caller or test asserts project-encounter order. We do not
+  claim stable JSON ordering.)
+- `removeEncounter()`, `clearAllEncounters()`, `numEncounters()` and the iterating methods
+  (`asJSONObject`, `getAllACMIdsJSON`, `getAllAnnotIdsJSON`, `getPercentWithIncrementalIds`)
+  use `Collection` operations (`remove`, `size`, for-each) — unchanged on a `Set`. The
+  `null`-init sites change `new ArrayList<>()` → `new LinkedHashSet<>()`.
+- **Equality is sound for a hash `Set`.** `Encounter extends Base`; `Base.equals()`
+  (Base.java:465) compares by `getId()` (= `catalogNumber`), consistent with
+  `Encounter.hashCode()` (also `catalogNumber`-based). So `Set.add/contains/remove` dedup
+  correctly by id in O(1) — no explicit id-loop needed. The **one** hazard is a null
+  `catalogNumber`, which `Encounter.hashCode()` maps to a *random* value (defeating dedup),
+  so `addEncounter` rejects null-id encounters:
+  ```java
+  public void addEncounter(final Encounter enc) {
+      if (enc == null) return;
+      if (enc.getCatalogNumber() == null) { /* log WARN */ return; }
+      setTimeLastModified();
+      if (encounters == null) encounters = new LinkedHashSet<Encounter>();
+      if (!encounters.add(enc)) { /* log INFO: already a member */ }
+  }
+  ```
+  `removeEncounter` stays `encounters.remove(enc)` (id-based via Base.equals).
+- **Keep `getEncounters()` returning `List<Encounter>`**, returning a copy:
+  `return encounters == null ? null : new ArrayList<>(encounters);`
+  This preserves every caller (some assign to `List<Encounter>`; `projectManagement.jsp`
+  indexes with `.get(j)`), so no caller or JSP needs editing for compilation. Callers only
+  read; none mutate the returned collection (mutation goes through `addEncounter`/
+  `removeEncounter`), so a detached copy is safe.
+- **Add `boolean containsEncounter(Encounter)`** (Codex finding #4) so hot callers that
+  currently do `getEncounters().contains(enc)` don't allocate a ~6000-element copy first:
+  ```java
+  public boolean containsEncounter(final Encounter enc) {
+      if (enc == null || encounters == null) return false;
+      final String encId = enc.getCatalogNumber();
+      if (encId == null) return false;
+      for (final Encounter e : encounters) if (encId.equals(e.getCatalogNumber())) return true;
+      return false;
+  }
+  ```
+  Update the membership-check callers to use it: `ProjectUpdate.java:242`,
+  `ProjectGet.java:104`, `iaResultsAnnotFeed.jsp:207`. (Other callers iterate or index and
+  keep using `getEncounters()`.)
+
+### 2. JDO: `package.jdo`
+
+No XML change. DataNucleus chooses indexed-list vs. set purely from the Java field type:
+`List` ⇒ adds `IDX`; `Set` ⇒ no order column, join-table PK `(owner, element)`. The
+existing block already has no `<order>` element:
+
+```xml
+<field name="encounters" persistence-modifier="persistent" default-fetch-group="true">
+  <collection element-type="org.ecocean.Encounter" dependent-element="false" />
+  <join/>
+</field>
+```
+
+A `mvn clean install` re-runs DataNucleus enhancement against the new field type.
+
+### 3. Database migration (REQUIRED, run before deploying the new WAR)
+
+DataNucleus schema auto-update only **adds**; it will not drop `IDX` or alter the PK. The
+existing `IDX` column is `NOT NULL` (it is part of the current PK), so a `Set`-style
+`INSERT (ID_OID, CATALOGNUMBER_EID)` would violate `NOT NULL` until `IDX` is dropped.
+Therefore each instance must run, before deploying:
+
+```sql
+-- Backup first (core table):
+CREATE TABLE "PROJECT_ENCOUNTERS_bak_20260601" AS SELECT * FROM "PROJECT_ENCOUNTERS";
+
+BEGIN;
+-- Precheck 1: no NULL CATALOGNUMBER_EID (ADD PRIMARY KEY fails on nulls). Expect 0.
+SELECT COUNT(*) FROM "PROJECT_ENCOUNTERS" WHERE "CATALOGNUMBER_EID" IS NULL;
+
+-- Precheck 2: no duplicate (ID_OID, CATALOGNUMBER_EID) pairs. Expect 0 rows.
+SELECT "ID_OID", "CATALOGNUMBER_EID", COUNT(*)
+FROM "PROJECT_ENCOUNTERS"
+GROUP BY "ID_OID", "CATALOGNUMBER_EID"
+HAVING COUNT(*) > 1;
+-- If either precheck is non-empty, STOP and investigate (do not force the PK).
+
+-- Drop the index-based PK, drop the ordering column, add the membership PK.
+ALTER TABLE "PROJECT_ENCOUNTERS" DROP CONSTRAINT "PROJECT_ENCOUNTERS_pkey";
+ALTER TABLE "PROJECT_ENCOUNTERS" DROP COLUMN "IDX";
+ALTER TABLE "PROJECT_ENCOUNTERS" ADD CONSTRAINT "PROJECT_ENCOUNTERS_pkey"
+    PRIMARY KEY ("ID_OID", "CATALOGNUMBER_EID");
+
+-- Preserve an index on the element column for element-keyed lookups (the old PK led on
+-- ID_OID; queries that filter by encounter — e.g. Encounter/Shepherd project lookups —
+-- want CATALOGNUMBER_EID indexed). The new PK leads on ID_OID, so add the reverse index.
+CREATE INDEX IF NOT EXISTS "PROJECT_ENCOUNTERS_N1" ON "PROJECT_ENCOUNTERS" ("CATALOGNUMBER_EID");
+COMMIT;
+```
+
+(Confirm the exact constraint name per instance with `\d "PROJECT_ENCOUNTERS"`; the
+production error quoted it as `PROJECT_ENCOUNTERS_pkey`.)
+
+**Rollback** (only needed to revert to the old List code): restore from
+`PROJECT_ENCOUNTERS_bak_20260601`, which still has the original 0-based `IDX`. If
+rebuilding `IDX` from a backup that lacks it, use
+`row_number() OVER (PARTITION BY "ID_OID" ORDER BY "CATALOGNUMBER_EID") - 1` (old `IDX`
+is 0-based).
+
+### 4. Deploy sequence (per instance) — order matters
+
+The old (List) code requires `IDX`; the new (Set) code never writes it. The two must not
+run against the table at the same time. Therefore:
+
+1. **Stop** the old Wildbook WAR / Tomcat (no old code may touch `PROJECT_ENCOUNTERS`).
+2. Backup `PROJECT_ENCOUNTERS`.
+3. Run the migration SQL (both prechecks must return 0).
+4. Deploy and **start** the new WAR.
+5. Smoke test: a bulk import that assigns to a project completes; concurrent imports into
+   one project no longer raise `PROJECT_ENCOUNTERS_pkey`.
+
+## Caller impact (audited — all safe with Option A)
+
+| Caller | Use | Safe? |
+|---|---|---|
+| `ProjectUpdate.java:242` | `getEncounters().contains(enc)` | yes (List copy) |
+| `ProjectGet.java:101,104` | `List<Encounter> = getEncounters(); .contains` | yes |
+| `ProjectIA.java:72` | iterate | yes |
+| `ProjectDelete.java:61` | iterate | yes |
+| `projectList.jsp:96` | `.size()` | yes |
+| `projectManagement.jsp:187` | `.get(j)` index loop | yes (still a `List`) |
+| `project.jsp:63` | assign | yes |
+| `iaResultsAnnotFeed.jsp:207` | `.contains` → switch to `containsEncounter` | yes |
+
+No caller mutates the returned collection; none break.
+
+**Membership-mutation paths** (call `addEncounter`/`removeEncounter`/`clearAllEncounters`,
+which operate on the live `Set` — all continue to work; none depend on list index/order):
+`BulkImporter.java:333`, `StandardImport.java:1228`, `Encounter.java:3733`,
+`EncounterDelete.java:120`, `ImportTask.java:608`, `DeleteImportTask.java:97`. These all
+benefit from the stricter id-based `addEncounter` dedup. Verify during implementation that
+none read back an order-dependent result.
+
+## Risks & mitigations
+
+- **Equality:** `Encounter` inherits id-based `equals()` from `Base` (Base.java:465),
+  consistent with its id-based `hashCode()`, so a hash `Set` dedups correctly by id — there
+  is no identity-vs-value mismatch. The only hazard is a null `catalogNumber` (random
+  hashCode), which `addEncounter` now rejects up front. No change to `Encounter.equals()`
+  is needed or made.
+- **Residual concurrency:** two threads adding the *same* encounter to the *same* project at
+  the same instant could still collide on the new PK. This is far rarer than the IDX race
+  (which collided on *different* encounters) and is a true "added twice" no-op; acceptable,
+  and now surfaced via the [error-visibility PR] instead of being invisible.
+- **Multi-instance rollout:** the migration must run on every Wildbook before its WAR is
+  upgraded. Documented in Deploy sequence.
+
+## Test plan
+
+- Unit: `Project` add/remove/contains/size/JSON round-trip on the `Set` field.
+- Build: `mvn clean install` (DataNucleus enhancement) compiles.
+- Integration (staging): run two concurrent bulk imports into the same project; both
+  complete with no `PROJECT_ENCOUNTERS_pkey` error. Verify membership correctness.
diff --git a/src/main/java/org/ecocean/Project.java b/src/main/java/org/ecocean/Project.java
@@ -1,7 +1,9 @@
 package org.ecocean;
 
 import java.util.ArrayList;
+import java.util.LinkedHashSet;
 import java.util.List;
+import java.util.Set;
 
 import javax.servlet.http.HttpServletRequest;
 
@@ -15,7 +17,11 @@ public class Project implements java.io.Serializable {
 
     private String id;
 
-    private List<Encounter> encounters = null;
+    // Unordered Set (not List): an ordered List makes JDO add an IDX ordering column to
+    // PROJECT_ENCOUNTERS, whose index is computed from a (possibly stale, eagerly-cached)
+    // collection snapshot and collides under concurrent adds. Project membership has no
+    // meaningful order, so a Set keyed on (project, encounter) avoids that entirely.
+    private Set<Encounter> encounters = null;
 
     private String researchProjectName;
     private String projectIdPrefix;
@@ -47,7 +53,7 @@ public Project(final String projectIdPrefix, final String researchProjectName) {
 
     public Project(final String projectIdPrefix, final String researchProjectName,
         final List<Encounter> encs) {
-        this.encounters = new ArrayList<>();
+        this.encounters = new LinkedHashSet<>();
         this.id = Util.generateUUID();
         this.projectIdPrefix = projectIdPrefix;
         this.researchProjectName = researchProjectName;
@@ -221,20 +227,39 @@ public String getProjectIdPrefix() {
         return projectIdPrefix;
     }
 
+    // returns a detached copy as a List to preserve the historical API (callers assign to
+    // List and some index it); membership mutation goes through add/removeEncounter, so a
+    // copy is safe. Callers that only need a membership test should use containsEncounter().
     public List<Encounter> getEncounters() {
-        return encounters;
+        return encounters == null ? null : new ArrayList<>(encounters);
+    }
+
+    // membership test without allocating a copy of the whole collection. Encounter inherits
+    // an id-based equals() from Base, consistent with its id-based hashCode(), so Set
+    // contains() is correct and O(1).
+    public boolean containsEncounter(final Encounter enc) {
+        if (enc == null || encounters == null || enc.getCatalogNumber() == null) return false;
+        return encounters.contains(enc);
     }
 
     public void addEncounter(final Encounter enc) {
+        if (enc == null) return;
+        // Encounter.hashCode() is catalogNumber-based; a null catalogNumber yields a random
+        // hashCode, which would defeat Set dedup. Reject null-id encounters. For non-null
+        // ids, Base.equals()/Encounter.hashCode() are both id-based, so Set.add() dedups.
+        if (enc.getCatalogNumber() == null) {
+            System.out.println(
+                "[WARN] Project.addEncounter(): refusing encounter with null id, project id=" +
+                id);
+            return;
+        }
         setTimeLastModified();
         if (encounters == null) {
-            encounters = new ArrayList<Encounter>();
+            encounters = new LinkedHashSet<Encounter>();
         }
-        if (enc != null && encounters != null && !encounters.contains(enc)) {
-            encounters.add(enc);
-        } else {
+        if (!encounters.add(enc)) {
             System.out.println("[INFO]: Project.addEncounter(): The selected Project id=" + id +
-                " already contains encounter id=" + enc.getID() + ", skipping.");
+                " already contains encounter id=" + enc.getCatalogNumber() + ", skipping.");
         }
     }
 
@@ -245,13 +270,14 @@ public void addEncounters(final List<Encounter> encs) {
     }
 
     public void removeEncounter(final Encounter enc) {
+        if (enc == null || encounters == null) return;
         setTimeLastModified();
         encounters.remove(enc);
     }
 
     public void clearAllEncounters() {
         setTimeLastModified();
-        this.encounters = new ArrayList<>();
+        this.encounters = new LinkedHashSet<>();
     }
 
     public int numEncounters() {

diff --git a/src/main/java/org/ecocean/servlet/ProjectGet.java b/src/main/java/org/ecocean/servlet/ProjectGet.java
@@ -98,10 +98,9 @@ public void doPost(HttpServletRequest request, HttpServletResponse response)
                         projectIdPrefix.trim());
                     if (project != null) {
                         ArrayList<Annotation> anns = myShepherd.getAnnotationsWithACMId(acmId);
-                        List<Encounter> encs = project.getEncounters();
                         for (Annotation ann : anns) {
                             Encounter enc = ann.findEncounter(myShepherd);
-                            if (encs.contains(enc)) {
+                            if (project.containsEncounter(enc)) {
                                 res.put("inProject", "true");
                                 break;
                             }

diff --git a/src/main/java/org/ecocean/servlet/ProjectUpdate.java b/src/main/java/org/ecocean/servlet/ProjectUpdate.java
@@ -239,9 +239,9 @@ private void addOrRemoveEncountersFromProject(Project project, Shepherd myShephe
             try {
                 Encounter enc = myShepherd.getEncounter(encId);
                 if (enc != null) {
-                    if ("add".equals(action) && !project.getEncounters().contains(enc)) { // need project.getEncounters().contains(enc) check to
-                                                                                          // ensure additionCounter returns the correct number of
-                                                                                          // added encounters
+                    if ("add".equals(action) && !project.containsEncounter(enc)) { // membership check so additionCounter
+                                                                                  // returns the correct number of
+                                                                                  // added encounters
                         String comment = "<p><em>" + myShepherd.getUsername(request) + " on " +
                             (new java.util.Date()).toString() + "</em><br>" +
                             "added this encounter to Project " + project.getResearchProjectName() +

diff --git a/src/main/webapp/iaResultsAnnotFeed.jsp b/src/main/webapp/iaResultsAnnotFeed.jsp
@@ -204,7 +204,7 @@ if (request.getParameter("acmId") != null) {
 								if (project!=null) {
 									try {
 
-										if (project.getEncounters()!=null&&project.getEncounters().contains(enc)) {
+										if (project.containsEncounter(enc)) {
 											//System.out.println("num encounters in project: "+project.getEncounters().size());
 											//MarkedIndividual individual = enc.getIndividual();
 											if (individual!=null) {