-
Notifications
You must be signed in to change notification settings - Fork 114
Description
name: 🚀 Feature Request
about: Suggest an idea or a new capability for FireForm.
title: "[FEAT]: Add Schema Validation and Error Recovery for LLM Output"
labels: enhancement
assignees: ''
📝 Description
Currently, LLM extraction output is passed directly into the PDF filler without validation.
There is no verification that extracted fields:
- Match expected data types
- Exist in the schema
- Were actually found in the transcript
If the LLM hallucinates, returns partial data, or outputs mismatched types, the PDF may be silently corrupted or left incomplete.
💡 Rationale
FireForm is designed for real-world emergency response environments where accuracy and reliability are critical.
Without schema validation:
- Incorrect values silently propagate
- Missing fields are not surfaced to operators
- Type mismatches go unnoticed
- There is no structured error recovery
A validation layer improves reliability, transparency, and operator trust.
🛠️ Proposed Solution
Introduce a SchemaValidator class:
-
Validate extracted data against template schema
-
Attempt type coercion where possible (
"2"→int) -
Classify field confidence:
HIGHLOWMISSING
-
Produce a structured
ValidationReport -
Surface warnings without crashing the pipeline
-
Return validated clean data for PDF filling
-
Logic change in
src/ -
New validation module (
src/validator.py) -
Integrate into
file_manipulator.py -
Unit tests for validation edge cases
✅ Acceptance Criteria
- All extracted fields validated against schema
-
null,"", and"-1"treated asMISSING - Type coercion attempted before marking
LOW - ValidationReport exposes
validated_data - Missing fields surfaced as warnings
- Pipeline continues gracefully
- Full unit test coverage
📌 Additional Context
This improves robustness of the core AI → JSON → PDF pipeline and aligns with FireForm’s goal of production-grade reliability.