Skip to content

[FEAT]: Batch LLM extraction to reduce N API calls to 1 #102

@Acuspeedster

Description

@Acuspeedster

name: 🚀 Feature Request
about: Suggest an idea or a new capability for FireForm.
title: "[FEAT]: Batch LLM extraction to reduce N API calls to 1"
labels: enhancement
assignees: ''

📝 Description

Currently, LLM.main_loop() makes one Ollama API call per form field.

For a form with N fields, this results in:

  • N prompts
  • N network round-trips
  • N model inference cycles
  • N JSON parses

This creates a scalability bottleneck, especially for multi-form submissions.

We propose adding a batch extraction method that sends all target fields in a single structured prompt and parses a single JSON response.


💡 Rationale

At scale, sequential per-field API calls significantly increase latency and infrastructure load.

Example:

  • 1 form × 10 fields → 10 LLM calls
  • 3 forms × 10 fields → 30 LLM calls

Reducing this to a single API call per form would:

  • Improve performance
  • Reduce response time
  • Lower model inference overhead
  • Improve scalability for multi-agency submissions

🛠️ Proposed Solution

Introduce a new method, e.g., LLM.main_loop_batch():

  • Build one structured prompt containing:

    • All target fields
    • Transcript/input text
  • Instruct the model to return strict JSON

  • Parse once

  • Strip markdown code fences (```json) if present

  • Handle null and list values

  • Gracefully fall back to main_loop() if JSON parsing fails

  • Logic change in src/

  • New prompt for Mistral/Ollama

  • Unit tests in tests/test_llm.py

  • Update src/filler.py to use batch method


✅ Acceptance Criteria

  • Exactly 1 API call per form regardless of field count
  • Batch method correctly populates all fields
  • Handles null and list values correctly
  • Gracefully falls back to sequential method if invalid JSON is returned
  • Feature works in Docker container
  • Documentation updated in docs/
  • JSON output validates against the schema

📌 Additional Context

Before:
N fields → N API calls
After:
N fields → 1 API call

This change significantly improves performance while maintaining backward compatibility.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions