Problem
Both chat() calls in src/llm/llm_client.py (initial extraction at line 128, retry at line 187) have no timeout and no error handling around the network call itself. The current retry logic only handles null fields in a successful response — it does not handle connection errors, server errors, or hung requests.
This means:
- If the Ollama server becomes unresponsive mid-run, the pipeline hangs indefinitely with no output and no way to detect it without watching the terminal
- In multiprocessing mode (
classify_extract.py --workers N), a single hung worker blocks its slot in the pool for the rest of the run
- A transient 503 or connection reset kills the entire job for that file rather than retrying
Tasks
Context
src/llm/llm_client.py:128 and :187 are the two call sites. The existing null-field retry at line 148 is unrelated — it re-prompts on a successful response that returned nulls, not on a failed call.
Problem
Both
chat()calls insrc/llm/llm_client.py(initial extraction at line 128, retry at line 187) have no timeout and no error handling around the network call itself. The current retry logic only handles null fields in a successful response — it does not handle connection errors, server errors, or hung requests.This means:
classify_extract.py --workers N), a single hung worker blocks its slot in the pool for the rest of the runTasks
chat()calls in a try/except that catches connection errors, timeouts, and malformed response errorschat()call (e.g.--llm-timeoutCLI flag, defaulting to a reasonable value like 120s)Context
src/llm/llm_client.py:128and:187are the two call sites. The existing null-field retry at line 148 is unrelated — it re-prompts on a successful response that returned nulls, not on a failed call.