Two small changes that should improve classification accuracy:
1. Line 145 of generate_vlm.py — currently if score is not None and score != 0 and label: which means every "no-damage" prediction gets retried. Since ~73% of florence buildings are undamaged, this systematically over-predicts damage. Change to if score is not None and label: to trust all scores equally.
2. Line 125 — temperature: 0.6 is good for creative text but adds unnecessary randomness for a 4-class classification task. Lowering to 0.2-0.3 should make predictions more consistent.
Both are one-line changes. The prompt rubric and parsing logic are solid as-is — these tweaks just let them work better.
Two small changes that should improve classification accuracy:
1. Line 145 of
generate_vlm.py— currentlyif score is not None and score != 0 and label:which means every "no-damage" prediction gets retried. Since ~73% of florence buildings are undamaged, this systematically over-predicts damage. Change toif score is not None and label:to trust all scores equally.2. Line 125 —
temperature: 0.6is good for creative text but adds unnecessary randomness for a 4-class classification task. Lowering to 0.2-0.3 should make predictions more consistent.Both are one-line changes. The prompt rubric and parsing logic are solid as-is — these tweaks just let them work better.