Skip to content

[FEAT]: Schema Validation + Error Recovery #114

@Acuspeedster

Description

@Acuspeedster

name: 🚀 Feature Request
about: Suggest an idea or a new capability for FireForm.
title: "[FEAT]: Add Schema Validation and Error Recovery for LLM Output"
labels: enhancement
assignees: ''

📝 Description

Currently, LLM extraction output is passed directly into the PDF filler without validation.
There is no verification that extracted fields:

  • Match expected data types
  • Exist in the schema
  • Were actually found in the transcript

If the LLM hallucinates, returns partial data, or outputs mismatched types, the PDF may be silently corrupted or left incomplete.


💡 Rationale

FireForm is designed for real-world emergency response environments where accuracy and reliability are critical.

Without schema validation:

  • Incorrect values silently propagate
  • Missing fields are not surfaced to operators
  • Type mismatches go unnoticed
  • There is no structured error recovery

A validation layer improves reliability, transparency, and operator trust.


🛠️ Proposed Solution

Introduce a SchemaValidator class:

  • Validate extracted data against template schema

  • Attempt type coercion where possible ("2"int)

  • Classify field confidence:

    • HIGH
    • LOW
    • MISSING
  • Produce a structured ValidationReport

  • Surface warnings without crashing the pipeline

  • Return validated clean data for PDF filling

  • Logic change in src/

  • New validation module (src/validator.py)

  • Integrate into file_manipulator.py

  • Unit tests for validation edge cases


✅ Acceptance Criteria

  • All extracted fields validated against schema
  • null, "", and "-1" treated as MISSING
  • Type coercion attempted before marking LOW
  • ValidationReport exposes validated_data
  • Missing fields surfaced as warnings
  • Pipeline continues gracefully
  • Full unit test coverage

📌 Additional Context

This improves robustness of the core AI → JSON → PDF pipeline and aligns with FireForm’s goal of production-grade reliability.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions