Bulk import: reject corrupt images instead of stalling detection at 99%#1609
Open
JasonWildMe wants to merge 2 commits into
Open
Bulk import: reject corrupt images instead of stalling detection at 99%#1609JasonWildMe wants to merge 2 commits into
JasonWildMe wants to merge 2 commits into
Conversation
…detection A bulk import could stall at 99% detection when a single corrupt JPEG was uploaded: AssetStore.isValidImage() declared it valid, so it became a MediaAsset, was sent to ml-service detection, hung the decoder past the client timeout, and left the asset stuck at processing-mlservice — which the import's detection-complete signal counts as never-finished. Two minimal changes: - AssetStore.isValidImage(): reject ANY decode IIOException, not only EOFException-caused truncation. Previously a corrupt-but-not-truncated JPEG (e.g. "Unsupported marker type 0xNN") fell through to return true. Now such an image fails validation at MediaAsset creation (UploadedFiles.makeMediaAsset), so it is never created and never reaches detection — it cannot stall the pipeline. - BulkImporter.processRow(): when a row references an image that has no MediaAsset (because it failed the stricter validation above), skip just that image instead of throwing RuntimeException, so one bad file does not abort the whole import. The skip advances the positional offset so the corrupt column's keyword/quality slot is consumed and later valid images keep their own metadata. Tests: - AssetStoreIsValidImageTest: a clean JPEG validates; a corrupt-marker JPEG (real ImageIO IIOException) is now rejected. - BulkImporterMissingAssetTest: a row referencing a missing/corrupt asset does not throw and imports the remaining valid image; a corrupt FIRST image does not misalign the surviving image's keyword/quality. Behavior by existing tolerance flag: failImportOnError=true (UI default) -> import fails fast with a clear per-image error (not stuck); =false -> import completes with the decodable images, the corrupt one skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a bulk import stalling at 99% detection when a single corrupt JPEG is uploaded.
Root cause:
AssetStore.isValidImage()declared a corrupt-but-not-truncated JPEG (e.g.IIOException: Unsupported marker type 0xNN) valid — itscatch (IIOException)returnedfalseonly forEOFException-caused truncation and every other decode error fell through toreturn true. So the corrupt image became aMediaAsset, was sent to ml-service detection, hung the decoder past the client timeout, and left the asset stuck atprocessing-mlservice— which the import's detection-complete signal counts as never-finished, pinning the import at 99% and blocking re-ID.Two minimal changes:
AssetStore.isValidImage()— reject any decodeIIOException, not justEOFException-caused truncation. A corrupt image now fails validation atMediaAssetcreation (UploadedFiles.makeMediaAsset), so it is never created and never reaches detection — it can't stall the pipeline.BulkImporter.processRow()— when a row references an image that has noMediaAsset(because it failed the stricter validation), skip that image instead of throwingRuntimeException, so one bad file doesn't abort the whole import. The skip advances the positionaloffsetso the corrupt column's keyword/quality slot is consumed and later valid images keep their own metadata.Behavior by the existing
failImportOnErrortolerance flag:true(UI default): the import fails fast with a clear per-image error (HTTP 400) — visible, not stuck.false: the import completes with the decodable images; the corrupt one is skipped.Test Plan
AssetStoreIsValidImageTest— a clean JPEG validates; a corrupt-marker JPEG (realImageIOIIOException) is now rejected.BulkImporterMissingAssetTest— a row referencing a missing/corrupt asset does not throw and imports the remaining valid image; a corrupt first image does not misalign the surviving image's keyword/quality.mvn test -Dtest=AssetStoreIsValidImageTest,BulkImporterMissingAssetTest→ 4/4 pass, BUILD SUCCESS.Notes / known minor
isValidImage()also applies to single/API uploads (UploadedFiles.makeMediaAsset), which now cleanly reject a corrupt image withApiException— intended hardening.failImportOnError=false, the encounter imports with zero annotations and the task endsimportedrather thancomplete. No stall.🤖 Generated with Claude Code