Skip to content

Bulk import: encounters indexed to OpenSearch before _master derivative exists → missing thumbnail URLs (broken thumbnails) #1590

@JasonWildMe

Description

@JasonWildMe

Summary

During bulk import, encounter documents are written to the OpenSearch encounter index before their _master derivative MediaAsset has been generated. The encounter serializer builds the thumbnail/media URL from the _master child via MediaAsset.safeURL(..., "master"); at index time that child does not yet exist, so safeURL returns null and the surrounding try { ... } catch (Exception ex) {} silently swallows it. The resulting OpenSearch document has no mediaAssets[].url. Because there is no guaranteed re-index after the derivatives are generated, a meaningful fraction of encounters are left with URL-less documents and render as broken thumbnails on /react/encounter-search (and other OpenSearch-backed result pages).

This is a pre-existing defect on main (see "Pre-existing" below), surfaced while testing migrate-ml-service-v2.

Symptom

  • Broken image icons for some imported encounters on /react/encounter-search?individualIDExact=… and other search-result galleries.
  • "Mixed" results: within the same individual, some thumbnails load and some are broken.
  • The broken ones frequently appear doubled — the same broken image twice. This is because content-hash deduplication attaches a single MediaAsset to two encounters; when that asset's document is missing the URL, it renders broken in both encounters.

Reproduction

  1. Run a bulk import.
  2. Open /react/encounter-search filtered to an imported individual.
  3. Observe that a subset of encounters show broken thumbnails.

Root cause

The encounter → OpenSearch serializer:

// Encounter.java (main: ~line 4223; migrate-ml-service-v2: ~line 4308)
try {
    // historic data might throw IllegalArgumentException: Path not under given root
    URL url = ma.safeURL(myShepherd, null, "master");
    if (url != null) jgen.writeStringField("url", url.toString());
} catch (Exception ex) {}

MediaAsset.safeURL(..., "master")bestSafeAsset(...) resolves the URL by locating the child labeled _master (findChildrenByLabel(myShepherd, "_master")). During import the encounter is indexed at creation time, while the _master/_thumb/_mid/_watermark children are generated slightly later by MediaAsset.updateStandardChildren (background). So at index time:

  • the _master child does not exist yet → safeURL returns null (or throws, caught and discarded) → the document is written with no url.

Most encounters get re-indexed later (e.g. by a subsequent occurrence/individual indexing pass) after the child exists, which backfills the URL. Encounters that never receive that second pass stay stuck with a URL-less document.

The swallowing catch (Exception ex) {} also means the failure is completely silent — no log, no metric.

Evidence

Comparing the OpenSearch document's indexTimestamp against the _master child's creation revision for one import:

Encounter doc indexTimestamp _master child created result
working …2426990 …2055897 indexed after child → URL present
broken …1914855 …1969934 indexed ~55s before child → no URL
broken …1903522 …1941115 indexed ~38s before child → no URL

The DB rows and on-disk derivative files (-master.jpg, -thumb.jpg, …) are all present and serve HTTP 200; the parent assets are labeled _original and the children _master. The data is intact — only the OpenSearch document is stale.

Scope (one observed import): 81 of 235 indexed encounters (~34%) were missing mediaAssets[].url.

Pre-existing

This is not introduced by migrate-ml-service-v2:

  • The serializer URL logic is byte-identical on main (ma.safeURL(myShepherd, null, "master") + the swallowing catch (Exception ex) {}). The line was last modified by 0a571bd6e (2025-11-04), an ancestor of main that predates the branch merge-base (2025-12-28).
  • BulkImporter.java (the create → index → background-derivative ordering) has zero diff from the merge-base.
  • MediaAsset.safeURL / bestSafeAsset / updateStandardChildren / findChildrenByLabel are unchanged on the branch.

Caveat: while the defect is pre-existing, how often it manifests depends on indexing/background timing (whether an encounter is re-indexed after its derivative is generated). The migrate-ml-service-v2 detection/ML pipeline rework changes that cadence, so it may affect the frequency of stale documents even though it did not introduce the race.

Proposed fixes

  1. Guarantee a re-index after derivatives exist. Ensure updateStandardChildren (or its completion callback) triggers an encounter re-index once _master (and friends) are available — so the URL is always backfilled.
  2. Order import indexing after derivative generation. Defer the encounter's OpenSearch index until standard children have been generated, so the first index already includes the URL.
  3. Stop silently swallowing the failure. At minimum, log when safeURL("master") returns null/throws at index time, so this is observable rather than invisible.

(1) or (2) fixes the defect; (3) should be done regardless.

Workaround for already-affected data

Re-index the affected encounters (the derivatives now exist, so re-serialization populates the URL). No regeneration or data repair is needed — this is purely an index-staleness fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions