feat: Optimize LLM extraction from O(N) sequential calls to O(1) batch JSON request#197
Open
saurabh12nxf wants to merge 1 commit intofireform-core:mainfrom
Open
feat: Optimize LLM extraction from O(N) sequential calls to O(1) batch JSON request#197saurabh12nxf wants to merge 1 commit intofireform-core:mainfrom
saurabh12nxf wants to merge 1 commit intofireform-core:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR refactors the core LLM prompt generation in src/llm.py to dramatically improve performance.
Previously, the main_loop() function sent a brand new network request to the Ollama endpoint for every single PDF field. This was an O(N) operation that forced the local LLM to re-evaluate the entire incident transcript from scratch for each field, causing generation times of over a minute for larger forms.
I rewrote build_prompt() and main_loop() to accept the entire list of fields at once. By utilizing Ollama's
format: "json"argument, we now force the LLM to extract all data natively and return it as a structured JSON object in a single O(1) batch request.This drops generation time down to a fraction of what it used to be, speeding up the core application significantly.
Fixes #196
Type of change
How Has This Been Tested?
I tested this locally using
llama3as the backend model against a standard 7-field test PDF.answers_listarray.time.time()intercept, confirmed that generation dropped from ~70 seconds (7 sequential calls) down to ~17 seconds total (1 batch call).Test Configuration:
Checklist: