Fix Shepherd/JDO connection leaks causing stuck dbconnections entries#1553
Open
JasonWildMe wants to merge 10 commits into
Open
Fix Shepherd/JDO connection leaks causing stuck dbconnections entries#1553JasonWildMe wants to merge 10 commits into
JasonWildMe wants to merge 10 commits into
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1553 +/- ##
==========================================
- Coverage 51.61% 51.17% -0.45%
==========================================
Files 308 308
Lines 11976 12073 +97
Branches 3824 3798 -26
==========================================
- Hits 6182 6178 -4
- Misses 5507 5593 +86
- Partials 287 302 +15
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Both Shepherds in scanEndApplet.jsp previously had their close calls
outside any try/finally. The `indShepherd` block (lines 603-786) was
the source of accumulating `scanEndApplet.jsp_displayNames:begin`
entries on Sharkbook dbconnections.jsp: any exception during
`xmlReader.read(file)` on a partially-written scan XML, or any
`getMarkedIndividual` failure, skipped the close, and the page
auto-refreshes every 15s during active scans.
Wraps both Shepherds with try { ... } finally { rollbackAndClose(); }
matching the pattern used elsewhere in this branch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Most of the stuck "begin" entries on Sharkbook's dbconnections.jsp are not individual Shepherd leaks but symptoms of one upstream cause: threads waiting for a Postgres connection that is pinned by a long-running operation. Two fixes: 1. Encounter.opensearchIndexPermissions() — the periodic permissions sweep was holding a Postgres tx open for the entire duration of per-encounter OpenSearch HTTP updates. On installs with hundreds of thousands of encounters this pinned a connection for tens of minutes per run, starving every concurrent request behind it. Refactored into two phases: phase 1 loads users/collab maps and all eligible encounter rows into in-memory structures under a short-lived Shepherd, which is then closed; phase 2 iterates the in-memory rows and issues OpenSearch updates with no DB connection held. Same OpenSearch updates, same filter logic. 2. jdoconfig.properties — datanucleus.connectionPool.maxWait was -1 (wait forever). Threads blocked on the pool sat indefinitely and showed up as stuck "begin" entries. Set to 30000 ms so a request under contention fails fast with a clear error and pool slots free up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Earlier commits on this branch saved Encounter.java, scanEndApplet.jsp and jdoconfig.properties with CRLF line endings, which made the GitHub PR diff display each file as entirely changed. main has these files as LF, so this commit normalizes back so reviewers see only the actual code changes from this branch. No functional change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The method opened a Shepherd at line 241 and only closed it on three
specific success/early-return paths (lines 290, 339, 375). Anything
that threw between begin and one of those closes leaked the PM:
- IBEISIAIdentificationMatchingState.allAsJSONArray() — a JDO query
- WildbookIAM.getIASpecies() and other lazy-loading getters in the
qanns/tanns iteration loops
- qanns.get(0).getMatchingSet() — runs an OpenSearch HTTP call while
the Postgres tx is still open; fails on slow/hung OpenSearch
- annotGetIndiv() — a JDO query
- Any unchecked Throwable
Wraps the body in try { ... } finally { rollbackAndClose() }, removes
the three explicit close pairs, and hoists the HashMap declaration
out of the try so the post-try RestClient.post() (which intentionally
runs after the DB connection is released) can still see it.
Same JSONObject results on every path; same "close DB before HTTP"
ordering preserved.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
eae68f9 to
a6d9942
Compare
…endIdentify Addresses code-review findings on PR #1553. processCallback (the stated headline fix, previously missing): wrap both Shepherds in try/finally so the no-log early `return rtn` (after beginDBTransaction) and any throw from the detect/identify or IA.intake branches release the connection. These were the source of the IBEISIA.processCallback:* stuck entries on dbconnections.jsp. `newAnns` is hoisted above the try so the post-commit identification block still reads it. sendIdentify: the earlier try/finally wrap held the DB connection across the RestClient.post(...) call to IA. Move the POST to after the finally so the pooled connection is released before the network round-trip (matching the pre-PR behavior, which closed before posting). `map` is hoisted above the try. Early-return paths stay inside the try, so they still close on the way out. Compiles clean; reviewed by Codex. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a set of PersistenceManager / connection leaks observed on production Sharkbook and Flukebook via
dbconnections.jsp(stuck entries atIBEISIA.processCallback:rollback,RestServlet:new, plus in-flightEncounter.opensearchDocumentSerializer:beginandapi.Bulk.doGet:begin).The single commit (993a81e) touches five files with +184 / -150 lines (functional changes only — ignoring CRLF→LF noise in the rest of the tree):
Shepherd.closeDBTransaction/rollbackDBTransaction— move theShepherdStatewrites intofinallyblocks so state always advances past"begin"and stuck entries stop accumulating. Ifpm.close()throws, the entry is left as"close-failed"so the dashboard preserves evidence of the leak — the important case, a connection that won't return to the pool. (rollbackDBTransactionalso sets"rollback-failed"on a failed rollback, but because callers userollbackAndClose(), a subsequent successfulclose()clears the entry by design; so"rollback-failed"is only visible on the dashboard if the close also fails.)IBEISIA.processCallback— wrap both Shepherds intry { ... } finally { rollbackAndClose(); }. The barereturn rtn;on the no-log path (the source of theIBEISIA.processCallback:rollbackstuck entries) now runs the finally. Setup moved inside the try for consistency.RestServlet.doPost/doDelete/doHead— outertry/finallyaround each method body removes theRestServlet.class_<id>state entry on every return path, includingdoPost's empty-body 400 early return. Fixes the accumulatingRestServlet:newentries.BulkImport.doGet/doPost— Shepherd construction /setAction/beginDBTransactionmoved inside the try with a null-safe finally. Background-thread inner class capturesbgContextas final.OpenSearch.setPermissionsNeeded/setActive/unsetActive/updatePermissionsIndex/updateEncounterIndexes— same try-with-null-safe-finally hardening.Known follow-up (out of scope for this PR)
updatePermissionsIndexandupdateEncounterIndexeskeep theirShepherdtransaction open for the full duration of the per-object OpenSearch HTTP loop (Encounter.opensearchIndexPermissionsphase 2os.indexUpdate(...), andBase.opensearchSyncIndexper-objectos.index(...)). TheEncounterphase 1/phase 2 refactor here releases the inner Shepherd, but the caller-held connection is not yet addressed. Tracking as a follow-up; this PR does not fully close theEncounter.opensearchDocumentSerializer:beginlong-hold.Test plan
mvn compileclean (verified locally)IBEISIA.processCallback: exercise the no-log early-return path, the successful detect path, and the "no annotations suitable for identification" path; confirm no stuckIBEISIA.processCallback:*entries accumulate ondbconnections.jspRestServlet: POST (empty body + normal), DELETE, HEAD requests while watchingdbconnections.jsp— confirmRestServlet:newno longer accumulatesBulkImportlist + detaildoGetanddoPost/doDeleteflowspm.close()failure — confirm the dashboard showsclose-failedrather than vanishingdbconnections.jspon staging for 24h post-deploy; confirm entry count stabilizes