Pre-registered cross-validated voter committees for honest evaluation on dental panoramic VQA: 75.36% on MMOral-OPG-Bench (370/491), McNemar p=1.1e-7
cross-validation vision-language-models mcnemar-test deployment-engineering mmoral-opg-bench dental-vqa voter-committee-aggregation honest-evaluation dental-panoramic-radiograph
-
Updated
Apr 27, 2026 - Python