You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A token-authenticated upload that contains a single line of deeply-nested JSON (e.g. [[[[…]]]] a few thousand levels deep) raises a production 500 and loses the entire raw upload instead of being quarantined as an invalid AuditFile.
parse_jsonl() parses each line with json.loads() inside a try/except that only catches json.JSONDecodeError:
But json.loads() on deeply-nested input does not raise JSONDecodeError — it recurses until it hits Python's recursion limit and raises RecursionError (a RuntimeError subclass). That exception is not caught here, and it is not caught by the except IntegrityError handler in ingest_audit_log_bytes() either, so it propagates out of the view to a 500. Because it fires before any AuditFile is created, the raw evidence is never persisted — exactly the failure mode the quarantine path is meant to prevent.
This is the same "uncaught exception on the ingest path → 500 → evidence lost" class as #7, #14, and #15, but a distinct root cause: it is triggered purely in the JSON parser, before the database is touched, so unlike #7/#14 it reproduces on both SQLite and Postgres (and locally).
Code pointers
forensics/ingest.py:215-222 — parse_jsonl() only catches json.JSONDecodeError around json.loads().
Verified that json.loads() raises RecursionError, not JSONDecodeError, for a single deeply-nested line:
importjsonline="["*6000+"]"*6000# one JSONL line, well under the 50 MiB limittry:
json.loads(line)
exceptjson.JSONDecodeError:
print("caught by the parser")
exceptRecursionError:
print("RecursionError — NOT caught by the JSONDecodeError handler")
# -> "RecursionError — NOT caught by the JSONDecodeError handler"
So uploading a JSONL file whose first line is [×6000 + ]×6000 makes parse_jsonl() raise RecursionError, which escapes ingest_audit_log_bytes() and the view: the client gets a 500 and no AuditFile is saved.
Expected behavior
An over-nested / malformed JSON line should be treated like any other malformed JSON: produce a 400 and a saved quarantined AuditFile with a clear validation error (invalid JSON: …), preserving the raw upload text and the offending raw line — matching the app's behavior for ordinary bad JSON.
Suggested fix
Broaden the except in parse_jsonl() to also catch RecursionError (and ideally ValueError, which already covers JSONDecodeError) and record it as a per-line validation error instead of letting it escape.
Optionally bound nesting depth explicitly (e.g. a small object_pairs_hook/depth-limited decoder, or reject lines whose nesting exceeds a sane limit) so the recursion limit is never approached.
As defense-in-depth, wrap the body of ingest_audit_log_bytes() so any unexpected parse-time error still results in a quarantined AuditFile rather than evidence loss.
Add a regression test that uploads a deeply-nested JSON line and asserts the upload is quarantined (status invalid, raw evidence preserved, 400) rather than 500ing.
Impact
A token-authenticated upload that contains a single line of deeply-nested JSON (e.g.
[[[[…]]]]a few thousand levels deep) raises a production 500 and loses the entire raw upload instead of being quarantined as an invalidAuditFile.parse_jsonl()parses each line withjson.loads()inside atry/exceptthat only catchesjson.JSONDecodeError:But
json.loads()on deeply-nested input does not raiseJSONDecodeError— it recurses until it hits Python's recursion limit and raisesRecursionError(aRuntimeErrorsubclass). That exception is not caught here, and it is not caught by theexcept IntegrityErrorhandler iningest_audit_log_bytes()either, so it propagates out of the view to a 500. Because it fires before anyAuditFileis created, the raw evidence is never persisted — exactly the failure mode the quarantine path is meant to prevent.This is the same "uncaught exception on the ingest path → 500 → evidence lost" class as #7, #14, and #15, but a distinct root cause: it is triggered purely in the JSON parser, before the database is touched, so unlike #7/#14 it reproduces on both SQLite and Postgres (and locally).
Code pointers
forensics/ingest.py:215-222—parse_jsonl()only catchesjson.JSONDecodeErroraroundjson.loads().forensics/ingest.py:100—ingest_audit_log_bytes()callsparse_jsonl(raw_text).forensics/ingest.py:145—except IntegrityErrordoes not catchRecursionError.forensics/views.py:126—api_audit_log_upload()callsingest_audit_log_bytes()with no guard, so the exception 500s the request.forensics/ingest.py:46—first_group_ref_from_audit_log_bytes()has the same narrowexcept json.JSONDecodeErrorgap (currently dead code, see Dead code: first_group_ref_from_audit_log_bytes is defined but never used #23).Reproduction (analysis)
Verified that
json.loads()raisesRecursionError, notJSONDecodeError, for a single deeply-nested line:So uploading a JSONL file whose first line is
[×6000 +]×6000 makesparse_jsonl()raiseRecursionError, which escapesingest_audit_log_bytes()and the view: the client gets a 500 and noAuditFileis saved.Expected behavior
An over-nested / malformed JSON line should be treated like any other malformed JSON: produce a
400and a saved quarantinedAuditFilewith a clear validation error (invalid JSON: …), preserving the raw upload text and the offending raw line — matching the app's behavior for ordinary bad JSON.Suggested fix
exceptinparse_jsonl()to also catchRecursionError(and ideallyValueError, which already coversJSONDecodeError) and record it as a per-line validation error instead of letting it escape.object_pairs_hook/depth-limited decoder, or reject lines whose nesting exceeds a sane limit) so the recursion limit is never approached.ingest_audit_log_bytes()so any unexpected parse-time error still results in a quarantinedAuditFilerather than evidence loss.