Skip to content

VLM: Switch from local Ollama/llava to Gemini Vision API #49

@madhavcodez

Description

@madhavcodez

Running llava 7B locally is bottlenecked by hardware — the model gets heavily quantized on limited RAM which hurts both speed (~5-20s per pair) and accuracy.

PR #46 already merged a working Gemini 2.0 Flash client at app/services/gemini.py. Switching the VLM to use this would mean:

  • No local Ollama dependency
  • Faster inference (sub-second)
  • Higher accuracy from a larger model
  • Works on any machine with an API key

The existing build_prompt(), parse_damage_score(), and score_to_label() in generate_vlm.py are model-agnostic and can stay as-is. The main change is replacing the ollama.chat() call (line 119) with client.models.generate_content() using the Gemini client pattern already in app/services/gemini.py.

GEMINI_API_KEY goes in .env.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Current Week

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions