Skip to content

Deeply-nested JSON line raises uncaught RecursionError, 500ing the upload and losing the raw evidence #24

@erskingardner

Description

@erskingardner

Impact

A token-authenticated upload that contains a single line of deeply-nested JSON (e.g. [[[[…]]]] a few thousand levels deep) raises a production 500 and loses the entire raw upload instead of being quarantined as an invalid AuditFile.

parse_jsonl() parses each line with json.loads() inside a try/except that only catches json.JSONDecodeError:

try:
    loaded = json.loads(raw_line)
    ...
except json.JSONDecodeError as exc:
    errors.append(f"invalid JSON: {exc.msg}")

But json.loads() on deeply-nested input does not raise JSONDecodeError — it recurses until it hits Python's recursion limit and raises RecursionError (a RuntimeError subclass). That exception is not caught here, and it is not caught by the except IntegrityError handler in ingest_audit_log_bytes() either, so it propagates out of the view to a 500. Because it fires before any AuditFile is created, the raw evidence is never persisted — exactly the failure mode the quarantine path is meant to prevent.

This is the same "uncaught exception on the ingest path → 500 → evidence lost" class as #7, #14, and #15, but a distinct root cause: it is triggered purely in the JSON parser, before the database is touched, so unlike #7/#14 it reproduces on both SQLite and Postgres (and locally).

Code pointers

  • forensics/ingest.py:215-222parse_jsonl() only catches json.JSONDecodeError around json.loads().
  • forensics/ingest.py:100ingest_audit_log_bytes() calls parse_jsonl(raw_text).
  • forensics/ingest.py:145except IntegrityError does not catch RecursionError.
  • forensics/views.py:126api_audit_log_upload() calls ingest_audit_log_bytes() with no guard, so the exception 500s the request.
  • forensics/ingest.py:46first_group_ref_from_audit_log_bytes() has the same narrow except json.JSONDecodeError gap (currently dead code, see Dead code: first_group_ref_from_audit_log_bytes is defined but never used #23).

Reproduction (analysis)

Verified that json.loads() raises RecursionError, not JSONDecodeError, for a single deeply-nested line:

import json
line = "[" * 6000 + "]" * 6000          # one JSONL line, well under the 50 MiB limit
try:
    json.loads(line)
except json.JSONDecodeError:
    print("caught by the parser")
except RecursionError:
    print("RecursionError — NOT caught by the JSONDecodeError handler")
# -> "RecursionError — NOT caught by the JSONDecodeError handler"

So uploading a JSONL file whose first line is [×6000 + ]×6000 makes parse_jsonl() raise RecursionError, which escapes ingest_audit_log_bytes() and the view: the client gets a 500 and no AuditFile is saved.

Expected behavior

An over-nested / malformed JSON line should be treated like any other malformed JSON: produce a 400 and a saved quarantined AuditFile with a clear validation error (invalid JSON: …), preserving the raw upload text and the offending raw line — matching the app's behavior for ordinary bad JSON.

Suggested fix

  • Broaden the except in parse_jsonl() to also catch RecursionError (and ideally ValueError, which already covers JSONDecodeError) and record it as a per-line validation error instead of letting it escape.
  • Optionally bound nesting depth explicitly (e.g. a small object_pairs_hook/depth-limited decoder, or reject lines whose nesting exceeds a sane limit) so the recursion limit is never approached.
  • As defense-in-depth, wrap the body of ingest_audit_log_bytes() so any unexpected parse-time error still results in a quarantined AuditFile rather than evidence loss.
  • Add a regression test that uploads a deeply-nested JSON line and asserts the upload is quarantined (status invalid, raw evidence preserved, 400) rather than 500ing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    HIGHSeverity: serious correctness, availability, or data-integrity issuebugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions