The preprocessing pipeline currently runs parse_step → load_step. The VLM classification should slot in between as parse_step → vlm_step → load_step.
What this looks like:
- Create
util/preprocessing/vlm_step.py with run_vlm_step(parsed_json_path)
- Read
parsed_data.json, iterate image pairs + locations
- Call Gemini for each pair, write the result into the
"prediction" field (already null in the JSON — parse_step.py line 187 sets this up)
- Save the updated JSON
- Add
"vlm" to the order list in preprocess-data.py line 56 so --start-at vlm / --stop-after vlm works
The chat.vlm_assessments table from readme_Chat.md is the eventual DB destination but getting predictions into the JSON first is the right step — easier to inspect before committing to the database.
546 image pairs and 11,548 locations are loaded and ready to test against.
The preprocessing pipeline currently runs
parse_step → load_step. The VLM classification should slot in between asparse_step → vlm_step → load_step.What this looks like:
util/preprocessing/vlm_step.pywithrun_vlm_step(parsed_json_path)parsed_data.json, iterate image pairs + locations"prediction"field (alreadynullin the JSON —parse_step.pyline 187 sets this up)"vlm"to theorderlist inpreprocess-data.pyline 56 so--start-at vlm/--stop-after vlmworksThe
chat.vlm_assessmentstable fromreadme_Chat.mdis the eventual DB destination but getting predictions into the JSON first is the right step — easier to inspect before committing to the database.546 image pairs and 11,548 locations are loaded and ready to test against.