Skip to content

feat: Optimize LLM extraction from O(N) sequential calls to O(1) batch JSON request#197

Open
saurabh12nxf wants to merge 1 commit intofireform-core:mainfrom
saurabh12nxf:fix-llm-bottleneck
Open

feat: Optimize LLM extraction from O(N) sequential calls to O(1) batch JSON request#197
saurabh12nxf wants to merge 1 commit intofireform-core:mainfrom
saurabh12nxf:fix-llm-bottleneck

Conversation

@saurabh12nxf
Copy link

Description

This PR refactors the core LLM prompt generation in src/llm.py to dramatically improve performance.

Previously, the main_loop() function sent a brand new network request to the Ollama endpoint for every single PDF field. This was an O(N) operation that forced the local LLM to re-evaluate the entire incident transcript from scratch for each field, causing generation times of over a minute for larger forms.

I rewrote build_prompt() and main_loop() to accept the entire list of fields at once. By utilizing Ollama's format: "json" argument, we now force the LLM to extract all data natively and return it as a structured JSON object in a single O(1) batch request.

This drops generation time down to a fraction of what it used to be, speeding up the core application significantly.

Fixes #196

Type of change

  • New feature / Performance Enhancement (non-breaking change which adds functionality)

How Has This Been Tested?

I tested this locally using llama3 as the backend model against a standard 7-field test PDF.

  • Test A (Repro path): Generated a form with 7 fields. It successfully batched all 7 fields into a single JSON request. Output successfully mapped to the final answers_list array.
  • Test B (Performance): Using a simple time.time() intercept, confirmed that generation dropped from ~70 seconds (7 sequential calls) down to ~17 seconds total (1 batch call).

Test Configuration:

  • Firmware version: N/A
  • Hardware: Windows Desktop / Local inference
  • Python SDK: 3.12+

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEAT]: Optimize LLM Generation from O(N) to O(1) batch requests in llm.py

1 participant