Summary
During bulk import, encounter documents are written to the OpenSearch encounter index before their _master derivative MediaAsset has been generated. The encounter serializer builds the thumbnail/media URL from the _master child via MediaAsset.safeURL(..., "master"); at index time that child does not yet exist, so safeURL returns null and the surrounding try { ... } catch (Exception ex) {} silently swallows it. The resulting OpenSearch document has no mediaAssets[].url. Because there is no guaranteed re-index after the derivatives are generated, a meaningful fraction of encounters are left with URL-less documents and render as broken thumbnails on /react/encounter-search (and other OpenSearch-backed result pages).
This is a pre-existing defect on main (see "Pre-existing" below), surfaced while testing migrate-ml-service-v2.
Symptom
- Broken image icons for some imported encounters on
/react/encounter-search?individualIDExact=… and other search-result galleries.
- "Mixed" results: within the same individual, some thumbnails load and some are broken.
- The broken ones frequently appear doubled — the same broken image twice. This is because content-hash deduplication attaches a single
MediaAsset to two encounters; when that asset's document is missing the URL, it renders broken in both encounters.
Reproduction
- Run a bulk import.
- Open
/react/encounter-search filtered to an imported individual.
- Observe that a subset of encounters show broken thumbnails.
Root cause
The encounter → OpenSearch serializer:
// Encounter.java (main: ~line 4223; migrate-ml-service-v2: ~line 4308)
try {
// historic data might throw IllegalArgumentException: Path not under given root
URL url = ma.safeURL(myShepherd, null, "master");
if (url != null) jgen.writeStringField("url", url.toString());
} catch (Exception ex) {}
MediaAsset.safeURL(..., "master") → bestSafeAsset(...) resolves the URL by locating the child labeled _master (findChildrenByLabel(myShepherd, "_master")). During import the encounter is indexed at creation time, while the _master/_thumb/_mid/_watermark children are generated slightly later by MediaAsset.updateStandardChildren (background). So at index time:
- the
_master child does not exist yet → safeURL returns null (or throws, caught and discarded) → the document is written with no url.
Most encounters get re-indexed later (e.g. by a subsequent occurrence/individual indexing pass) after the child exists, which backfills the URL. Encounters that never receive that second pass stay stuck with a URL-less document.
The swallowing catch (Exception ex) {} also means the failure is completely silent — no log, no metric.
Evidence
Comparing the OpenSearch document's indexTimestamp against the _master child's creation revision for one import:
| Encounter |
doc indexTimestamp |
_master child created |
result |
| working |
…2426990 |
…2055897 |
indexed after child → URL present |
| broken |
…1914855 |
…1969934 |
indexed ~55s before child → no URL |
| broken |
…1903522 |
…1941115 |
indexed ~38s before child → no URL |
The DB rows and on-disk derivative files (-master.jpg, -thumb.jpg, …) are all present and serve HTTP 200; the parent assets are labeled _original and the children _master. The data is intact — only the OpenSearch document is stale.
Scope (one observed import): 81 of 235 indexed encounters (~34%) were missing mediaAssets[].url.
Pre-existing
This is not introduced by migrate-ml-service-v2:
- The serializer URL logic is byte-identical on
main (ma.safeURL(myShepherd, null, "master") + the swallowing catch (Exception ex) {}). The line was last modified by 0a571bd6e (2025-11-04), an ancestor of main that predates the branch merge-base (2025-12-28).
BulkImporter.java (the create → index → background-derivative ordering) has zero diff from the merge-base.
MediaAsset.safeURL / bestSafeAsset / updateStandardChildren / findChildrenByLabel are unchanged on the branch.
Caveat: while the defect is pre-existing, how often it manifests depends on indexing/background timing (whether an encounter is re-indexed after its derivative is generated). The migrate-ml-service-v2 detection/ML pipeline rework changes that cadence, so it may affect the frequency of stale documents even though it did not introduce the race.
Proposed fixes
- Guarantee a re-index after derivatives exist. Ensure
updateStandardChildren (or its completion callback) triggers an encounter re-index once _master (and friends) are available — so the URL is always backfilled.
- Order import indexing after derivative generation. Defer the encounter's OpenSearch index until standard children have been generated, so the first index already includes the URL.
- Stop silently swallowing the failure. At minimum, log when
safeURL("master") returns null/throws at index time, so this is observable rather than invisible.
(1) or (2) fixes the defect; (3) should be done regardless.
Workaround for already-affected data
Re-index the affected encounters (the derivatives now exist, so re-serialization populates the URL). No regeneration or data repair is needed — this is purely an index-staleness fix.
Summary
During bulk import, encounter documents are written to the OpenSearch
encounterindex before their_masterderivative MediaAsset has been generated. The encounter serializer builds the thumbnail/media URL from the_masterchild viaMediaAsset.safeURL(..., "master"); at index time that child does not yet exist, sosafeURLreturnsnulland the surroundingtry { ... } catch (Exception ex) {}silently swallows it. The resulting OpenSearch document has nomediaAssets[].url. Because there is no guaranteed re-index after the derivatives are generated, a meaningful fraction of encounters are left with URL-less documents and render as broken thumbnails on/react/encounter-search(and other OpenSearch-backed result pages).This is a pre-existing defect on
main(see "Pre-existing" below), surfaced while testingmigrate-ml-service-v2.Symptom
/react/encounter-search?individualIDExact=…and other search-result galleries.MediaAssetto two encounters; when that asset's document is missing the URL, it renders broken in both encounters.Reproduction
/react/encounter-searchfiltered to an imported individual.Root cause
The encounter → OpenSearch serializer:
MediaAsset.safeURL(..., "master")→bestSafeAsset(...)resolves the URL by locating the child labeled_master(findChildrenByLabel(myShepherd, "_master")). During import the encounter is indexed at creation time, while the_master/_thumb/_mid/_watermarkchildren are generated slightly later byMediaAsset.updateStandardChildren(background). So at index time:_masterchild does not exist yet →safeURLreturnsnull(or throws, caught and discarded) → the document is written with nourl.Most encounters get re-indexed later (e.g. by a subsequent occurrence/individual indexing pass) after the child exists, which backfills the URL. Encounters that never receive that second pass stay stuck with a URL-less document.
The swallowing
catch (Exception ex) {}also means the failure is completely silent — no log, no metric.Evidence
Comparing the OpenSearch document's
indexTimestampagainst the_masterchild's creation revision for one import:indexTimestamp_masterchild created…2426990…2055897…1914855…1969934…1903522…1941115The DB rows and on-disk derivative files (
-master.jpg,-thumb.jpg, …) are all present and serve HTTP 200; the parent assets are labeled_originaland the children_master. The data is intact — only the OpenSearch document is stale.Scope (one observed import): 81 of 235 indexed encounters (~34%) were missing
mediaAssets[].url.Pre-existing
This is not introduced by
migrate-ml-service-v2:main(ma.safeURL(myShepherd, null, "master")+ the swallowingcatch (Exception ex) {}). The line was last modified by0a571bd6e(2025-11-04), an ancestor ofmainthat predates the branch merge-base (2025-12-28).BulkImporter.java(the create → index → background-derivative ordering) has zero diff from the merge-base.MediaAsset.safeURL/bestSafeAsset/updateStandardChildren/findChildrenByLabelare unchanged on the branch.Caveat: while the defect is pre-existing, how often it manifests depends on indexing/background timing (whether an encounter is re-indexed after its derivative is generated). The
migrate-ml-service-v2detection/ML pipeline rework changes that cadence, so it may affect the frequency of stale documents even though it did not introduce the race.Proposed fixes
updateStandardChildren(or its completion callback) triggers an encounter re-index once_master(and friends) are available — so the URL is always backfilled.safeURL("master")returns null/throws at index time, so this is observable rather than invisible.(1) or (2) fixes the defect; (3) should be done regardless.
Workaround for already-affected data
Re-index the affected encounters (the derivatives now exist, so re-serialization populates the URL). No regeneration or data repair is needed — this is purely an index-staleness fix.