Skip to content

Bulk import: reject corrupt images instead of stalling detection at 99%#1609

Open
JasonWildMe wants to merge 2 commits into
mainfrom
fix/bulk-import-corrupt-image
Open

Bulk import: reject corrupt images instead of stalling detection at 99%#1609
JasonWildMe wants to merge 2 commits into
mainfrom
fix/bulk-import-corrupt-image

Conversation

@JasonWildMe

Copy link
Copy Markdown
Collaborator

Summary

Fixes a bulk import stalling at 99% detection when a single corrupt JPEG is uploaded.

Root cause: AssetStore.isValidImage() declared a corrupt-but-not-truncated JPEG (e.g. IIOException: Unsupported marker type 0xNN) valid — its catch (IIOException) returned false only for EOFException-caused truncation and every other decode error fell through to return true. So the corrupt image became a MediaAsset, was sent to ml-service detection, hung the decoder past the client timeout, and left the asset stuck at processing-mlservice — which the import's detection-complete signal counts as never-finished, pinning the import at 99% and blocking re-ID.

Two minimal changes:

  • AssetStore.isValidImage() — reject any decode IIOException, not just EOFException-caused truncation. A corrupt image now fails validation at MediaAsset creation (UploadedFiles.makeMediaAsset), so it is never created and never reaches detection — it can't stall the pipeline.
  • BulkImporter.processRow() — when a row references an image that has no MediaAsset (because it failed the stricter validation), skip that image instead of throwing RuntimeException, so one bad file doesn't abort the whole import. The skip advances the positional offset so the corrupt column's keyword/quality slot is consumed and later valid images keep their own metadata.

Behavior by the existing failImportOnError tolerance flag:

  • true (UI default): the import fails fast with a clear per-image error (HTTP 400) — visible, not stuck.
  • false: the import completes with the decodable images; the corrupt one is skipped.

Test Plan

  • AssetStoreIsValidImageTest — a clean JPEG validates; a corrupt-marker JPEG (real ImageIO IIOException) is now rejected.
  • BulkImporterMissingAssetTest — a row referencing a missing/corrupt asset does not throw and imports the remaining valid image; a corrupt first image does not misalign the surviving image's keyword/quality.
  • mvn test -Dtest=AssetStoreIsValidImageTest,BulkImporterMissingAssetTest → 4/4 pass, BUILD SUCCESS.
  • Manual: bulk-import a set with one known-corrupt JPEG; confirm detection reaches 100% / the import doesn't stall, and the corrupt image is reported rather than silently lost.

Notes / known minor

  • The stricter isValidImage() also applies to single/API uploads (UploadedFiles.makeMediaAsset), which now cleanly reject a corrupt image with ApiException — intended hardening.
  • Edge case (matches existing no-media semantics, not a regression): if every image in an import is corrupt and failImportOnError=false, the encounter imports with zero annotations and the task ends imported rather than complete. No stall.

🤖 Generated with Claude Code

…detection

A bulk import could stall at 99% detection when a single corrupt JPEG was
uploaded: AssetStore.isValidImage() declared it valid, so it became a
MediaAsset, was sent to ml-service detection, hung the decoder past the
client timeout, and left the asset stuck at processing-mlservice — which
the import's detection-complete signal counts as never-finished.

Two minimal changes:

- AssetStore.isValidImage(): reject ANY decode IIOException, not only
  EOFException-caused truncation. Previously a corrupt-but-not-truncated
  JPEG (e.g. "Unsupported marker type 0xNN") fell through to return true.
  Now such an image fails validation at MediaAsset creation
  (UploadedFiles.makeMediaAsset), so it is never created and never reaches
  detection — it cannot stall the pipeline.

- BulkImporter.processRow(): when a row references an image that has no
  MediaAsset (because it failed the stricter validation above), skip just
  that image instead of throwing RuntimeException, so one bad file does not
  abort the whole import. The skip advances the positional offset so the
  corrupt column's keyword/quality slot is consumed and later valid images
  keep their own metadata.

Tests:
- AssetStoreIsValidImageTest: a clean JPEG validates; a corrupt-marker JPEG
  (real ImageIO IIOException) is now rejected.
- BulkImporterMissingAssetTest: a row referencing a missing/corrupt asset
  does not throw and imports the remaining valid image; a corrupt FIRST
  image does not misalign the surviving image's keyword/quality.

Behavior by existing tolerance flag: failImportOnError=true (UI default) ->
import fails fast with a clear per-image error (not stuck); =false -> import
completes with the decodable images, the corrupt one skipped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant