VLM: Switch from local Ollama/llava to Gemini Vision API

Running llava 7B locally is bottlenecked by hardware — the model gets heavily quantized on limited RAM which hurts both speed (~5-20s per pair) and accuracy.

PR #46 already merged a working Gemini 2.0 Flash client at `app/services/gemini.py`. Switching the VLM to use this would mean:
- No local Ollama dependency
- Faster inference (sub-second)
- Higher accuracy from a larger model
- Works on any machine with an API key

The existing `build_prompt()`, `parse_damage_score()`, and `score_to_label()` in `generate_vlm.py` are model-agnostic and can stay as-is. The main change is replacing the `ollama.chat()` call (line 119) with `client.models.generate_content()` using the Gemini client pattern already in `app/services/gemini.py`.

`GEMINI_API_KEY` goes in `.env`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VLM: Switch from local Ollama/llava to Gemini Vision API #49

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

VLM: Switch from local Ollama/llava to Gemini Vision API #49

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions