Return 4xx for undecodable/corrupt images instead of 500#27
Open
JasonWildMe wants to merge 1 commit into
Open
Conversation
A corrupt image (e.g. a JPEG with a broken entropy-coded scan stream: "broken data stream when reading image file" / "Unsupported marker type 0xNN") fails during PIL decode inside model.predict/extract, which the inference routers catch with their generic `except Exception` and return as HTTP 500. Consumers (Wildbook) classify 5xx as retryable and re-queue the job, so a *permanent* bad-image failure is retried as if transient — the work never reaches a terminal state and bulk imports can stall. Reject undecodable images as a 4xx (client error) so callers treat them as permanent and non-retryable: - app/utils/image_uri.py: add ImageDecodeError(ValueError) and validate_decodable(), which fully Image.open(...).load()s the bytes (a header-only verify() would miss broken scan streams) and raises ImageDecodeError on UnidentifiedImageError / OSError / DecompressionBombError. - predict, pipeline, extract, classify routers: call validate_decodable() right after resolve_image_uri(), inside the existing `except ValueError -> HTTP 400` block. ImageDecodeError subclasses ValueError, so an undecodable image now returns 400 instead of escaping to the generic 500. - tests: pure-PIL unit tests for validate_decodable (valid passes; corrupt, non-image, empty, truncated, and decompression-bomb inputs raise). Trade-off: the validator decodes the image once and the model decodes again (~2x decode time, negligible vs GPU inference) — accepted for the robustness gain. Router-level 400 contract verified by inspection; endpoint tests require the model/GPU environment (CI). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Inference endpoints returned HTTP 500 for a corrupt/undecodable input image, which made a permanent failure look transient to clients and could stall downstream pipelines.
Root cause: a corrupt image (e.g. a JPEG with a valid header but a broken entropy-coded scan stream —
broken data stream when reading image file/Unsupported marker type 0xNN) fails during PIL decode insidemodel.predict/extract. The routers catch that with their genericexcept Exception→ 500. Wildbook (the caller) classifies any 5xx as retryable and re-queues the job — so a permanently-bad image is retried indefinitely and the asset never reaches a terminal state (observed as a bulk import stuck at 99% detection).Fix: reject an undecodable image as a 4xx (client error) so callers treat it as permanent / non-retryable.
app/utils/image_uri.py: addImageDecodeError(ValueError)andvalidate_decodable(), which fullyImage.open(...).load()s the bytes (a header-onlyverify()would miss broken scan streams) and raisesImageDecodeErroronUnidentifiedImageError/OSError/DecompressionBombError.predict,pipeline,extract,classifyrouters: callvalidate_decodable(image_bytes)right afterresolve_image_uri(...), inside the existingexcept ValueError → HTTP 400block. SinceImageDecodeErrorsubclassesValueError, an undecodable image now returns 400 instead of escaping to the generic 500.Test Plan
tests/test_image_decode_validation.py(pure PIL, no GPU) — valid image passes; corrupt-marker, non-image, empty, truncated, and decompression-bomb inputs all raiseImageDecodeError;ImageDecodeErroris aValueError. 7/7 pass locally./predict/,/pipeline/,/extract/,/classify/, and that inference is not invoked. (Router 400 contract verified by inspection; endpoint tests need the model/GPU environment.)Notes
🤖 Generated with Claude Code