Skip to content

Fix Shepherd/JDO connection leaks causing stuck dbconnections entries#1553

Open
JasonWildMe wants to merge 10 commits into
mainfrom
more-shepherd-leak-fixes
Open

Fix Shepherd/JDO connection leaks causing stuck dbconnections entries#1553
JasonWildMe wants to merge 10 commits into
mainfrom
more-shepherd-leak-fixes

Conversation

@JasonWildMe

@JasonWildMe JasonWildMe commented Apr 20, 2026

Copy link
Copy Markdown
Collaborator

Summary

Fixes a set of PersistenceManager / connection leaks observed on production Sharkbook and Flukebook via dbconnections.jsp (stuck entries at IBEISIA.processCallback:rollback, RestServlet:new, plus in-flight Encounter.opensearchDocumentSerializer:begin and api.Bulk.doGet:begin).

The single commit (993a81e) touches five files with +184 / -150 lines (functional changes only — ignoring CRLF→LF noise in the rest of the tree):

  • Shepherd.closeDBTransaction / rollbackDBTransaction — move the ShepherdState writes into finally blocks so state always advances past "begin" and stuck entries stop accumulating. If pm.close() throws, the entry is left as "close-failed" so the dashboard preserves evidence of the leak — the important case, a connection that won't return to the pool. (rollbackDBTransaction also sets "rollback-failed" on a failed rollback, but because callers use rollbackAndClose(), a subsequent successful close() clears the entry by design; so "rollback-failed" is only visible on the dashboard if the close also fails.)
  • IBEISIA.processCallback — wrap both Shepherds in try { ... } finally { rollbackAndClose(); }. The bare return rtn; on the no-log path (the source of the IBEISIA.processCallback:rollback stuck entries) now runs the finally. Setup moved inside the try for consistency.
  • RestServlet.doPost / doDelete / doHead — outer try/finally around each method body removes the RestServlet.class_<id> state entry on every return path, including doPost's empty-body 400 early return. Fixes the accumulating RestServlet:new entries.
  • BulkImport.doGet / doPost — Shepherd construction / setAction / beginDBTransaction moved inside the try with a null-safe finally. Background-thread inner class captures bgContext as final.
  • OpenSearch.setPermissionsNeeded / setActive / unsetActive / updatePermissionsIndex / updateEncounterIndexes — same try-with-null-safe-finally hardening.

Known follow-up (out of scope for this PR)

  • OpenSearch background reindex still holds a DB connection across HTTP. updatePermissionsIndex and updateEncounterIndexes keep their Shepherd transaction open for the full duration of the per-object OpenSearch HTTP loop (Encounter.opensearchIndexPermissions phase 2 os.indexUpdate(...), and Base.opensearchSyncIndex per-object os.index(...)). The Encounter phase 1/phase 2 refactor here releases the inner Shepherd, but the caller-held connection is not yet addressed. Tracking as a follow-up; this PR does not fully close the Encounter.opensearchDocumentSerializer:begin long-hold.

Test plan

  • mvn compile clean (verified locally)
  • IBEISIA.processCallback: exercise the no-log early-return path, the successful detect path, and the "no annotations suitable for identification" path; confirm no stuck IBEISIA.processCallback:* entries accumulate on dbconnections.jsp
  • RestServlet: POST (empty body + normal), DELETE, HEAD requests while watching dbconnections.jsp — confirm RestServlet:new no longer accumulates
  • BulkImport list + detail doGet and doPost/doDelete flows
  • Any reproducible pm.close() failure — confirm the dashboard shows close-failed rather than vanishing
  • Monitor dbconnections.jsp on staging for 24h post-deploy; confirm entry count stabilizes

@codecov-commenter

codecov-commenter commented Apr 20, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 51.17%. Comparing base (268f445) to head (7430517).
⚠️ Report is 89 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1553      +/-   ##
==========================================
- Coverage   51.61%   51.17%   -0.45%     
==========================================
  Files         308      308              
  Lines       11976    12073      +97     
  Branches     3824     3798      -26     
==========================================
- Hits         6182     6178       -4     
- Misses       5507     5593      +86     
- Partials      287      302      +15     
Flag Coverage Δ
backend 51.17% <ø> (-0.45%) ⬇️
frontend 51.17% <ø> (-0.45%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@JasonWildMe JasonWildMe requested a review from naknomum April 20, 2026 14:15
@JasonWildMe JasonWildMe self-assigned this Apr 20, 2026
Both Shepherds in scanEndApplet.jsp previously had their close calls
outside any try/finally. The `indShepherd` block (lines 603-786) was
the source of accumulating `scanEndApplet.jsp_displayNames:begin`
entries on Sharkbook dbconnections.jsp: any exception during
`xmlReader.read(file)` on a partially-written scan XML, or any
`getMarkedIndividual` failure, skipped the close, and the page
auto-refreshes every 15s during active scans.

Wraps both Shepherds with try { ... } finally { rollbackAndClose(); }
matching the pattern used elsewhere in this branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JasonWildMe JasonWildMe marked this pull request as draft April 28, 2026 20:28
JasonWildMe and others added 3 commits April 28, 2026 14:16
Most of the stuck "begin" entries on Sharkbook's dbconnections.jsp are
not individual Shepherd leaks but symptoms of one upstream cause:
threads waiting for a Postgres connection that is pinned by a
long-running operation. Two fixes:

1. Encounter.opensearchIndexPermissions() — the periodic permissions
   sweep was holding a Postgres tx open for the entire duration of
   per-encounter OpenSearch HTTP updates. On installs with hundreds
   of thousands of encounters this pinned a connection for tens of
   minutes per run, starving every concurrent request behind it.

   Refactored into two phases: phase 1 loads users/collab maps and
   all eligible encounter rows into in-memory structures under a
   short-lived Shepherd, which is then closed; phase 2 iterates the
   in-memory rows and issues OpenSearch updates with no DB connection
   held. Same OpenSearch updates, same filter logic.

2. jdoconfig.properties — datanucleus.connectionPool.maxWait was -1
   (wait forever). Threads blocked on the pool sat indefinitely and
   showed up as stuck "begin" entries. Set to 30000 ms so a request
   under contention fails fast with a clear error and pool slots
   free up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Earlier commits on this branch saved Encounter.java, scanEndApplet.jsp
and jdoconfig.properties with CRLF line endings, which made the GitHub
PR diff display each file as entirely changed. main has these files as
LF, so this commit normalizes back so reviewers see only the actual
code changes from this branch.

No functional change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The method opened a Shepherd at line 241 and only closed it on three
specific success/early-return paths (lines 290, 339, 375). Anything
that threw between begin and one of those closes leaked the PM:

- IBEISIAIdentificationMatchingState.allAsJSONArray() — a JDO query
- WildbookIAM.getIASpecies() and other lazy-loading getters in the
  qanns/tanns iteration loops
- qanns.get(0).getMatchingSet() — runs an OpenSearch HTTP call while
  the Postgres tx is still open; fails on slow/hung OpenSearch
- annotGetIndiv() — a JDO query
- Any unchecked Throwable

Wraps the body in try { ... } finally { rollbackAndClose() }, removes
the three explicit close pairs, and hoists the HashMap declaration
out of the try so the post-try RestClient.post() (which intentionally
runs after the DB connection is released) can still see it.

Same JSONObject results on every path; same "close DB before HTTP"
ordering preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JasonWildMe JasonWildMe force-pushed the more-shepherd-leak-fixes branch from eae68f9 to a6d9942 Compare April 29, 2026 16:30
@naknomum naknomum added this to the 10.10.5 milestone May 5, 2026
@JasonWildMe JasonWildMe marked this pull request as ready for review May 26, 2026 14:17
JasonWildMe and others added 2 commits May 28, 2026 18:00
…endIdentify

Addresses code-review findings on PR #1553.

processCallback (the stated headline fix, previously missing): wrap both
Shepherds in try/finally so the no-log early `return rtn` (after
beginDBTransaction) and any throw from the detect/identify or IA.intake
branches release the connection. These were the source of the
IBEISIA.processCallback:* stuck entries on dbconnections.jsp. `newAnns` is
hoisted above the try so the post-commit identification block still reads it.

sendIdentify: the earlier try/finally wrap held the DB connection across the
RestClient.post(...) call to IA. Move the POST to after the finally so the
pooled connection is released before the network round-trip (matching the
pre-PR behavior, which closed before posting). `map` is hoisted above the try.
Early-return paths stay inside the try, so they still close on the way out.

Compiles clean; reviewed by Codex.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants