VLM: Wire into preprocessing pipeline as a vlm_step

The preprocessing pipeline currently runs `parse_step → load_step`. The VLM classification should slot in between as `parse_step → vlm_step → load_step`.

What this looks like:
1. Create `util/preprocessing/vlm_step.py` with `run_vlm_step(parsed_json_path)`
2. Read `parsed_data.json`, iterate image pairs + locations
3. Call Gemini for each pair, write the result into the `"prediction"` field (already `null` in the JSON — `parse_step.py` line 187 sets this up)
4. Save the updated JSON
5. Add `"vlm"` to the `order` list in `preprocess-data.py` line 56 so `--start-at vlm` / `--stop-after vlm` works

The `chat.vlm_assessments` table from `readme_Chat.md` is the eventual DB destination but getting predictions into the JSON first is the right step — easier to inspect before committing to the database.

546 image pairs and 11,548 locations are loaded and ready to test against.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VLM: Wire into preprocessing pipeline as a vlm_step #50

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

VLM: Wire into preprocessing pipeline as a vlm_step #50

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions