Feature/refcoco/+/g benchmark support#201
Feature/refcoco/+/g benchmark support#201zhongzhouTan-coder wants to merge 8 commits intoAISBench:masterfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the benchmarking framework's capabilities by integrating the RefCOCO family of visual grounding datasets. It provides a complete pipeline from data loading and prompt generation to evaluation, enabling comprehensive assessment of models on multimodal object localization tasks. The changes introduce new dataset classes, configuration files, and a dedicated bounding box Intersection over Union (IoU) evaluator, along with necessary post-processing logic. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces comprehensive support for the RefCOCO, RefCOCOg, and RefCOCOplus datasets, which is a great addition to the benchmark framework. The implementation, including the new dataset loaders, the BBoxIoUEvaluator, and the associated unit tests, is well-structured and robust.
My review has identified a couple of issues. There's a critical configuration mismatch in the refcoco_plus and refcocog dataset configurations where the input_columns do not align with the variables required by the prompt template, which would cause runtime errors. I've also pointed out a minor stylistic issue concerning the lack of a final newline character in several of the newly added files.
After addressing these points, the PR should be in excellent shape.
ais_bench/benchmark/configs/datasets/refcoco_plus/refcoco_plus_gen.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Adds RefCOCO-family dataset support and a bounding-box IoU evaluator to enable referring-expression grounding benchmarks (RefCOCO / RefCOCO+ / RefCOCOg) within the OpenICL evaluation pipeline.
Changes:
- Introduces
BBoxIoUEvaluator(registry-registered) for IoU-based accuracy scoring with optional coordinate scaling/clipping. - Adds
RefCOCODatasetloader plusRefCOCOPlusDataset/RefCOCOgDatasetvariants, along with bbox prediction postprocessing. - Adds dataset config entries and unit tests covering dataset loading, registry wiring, and evaluator behavior.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
ais_bench/benchmark/openicl/icl_evaluator/bbox_iou_evaluator.py |
New IoU evaluator for bbox predictions (with scaling/clipping + detailed per-sample results). |
ais_bench/benchmark/openicl/icl_evaluator/__init__.py |
Exposes the new evaluator for import-side registration. |
ais_bench/benchmark/datasets/refcoco/refcoco.py |
New RefCOCO dataset loader + bbox postprocessor and prompt template. |
ais_bench/benchmark/datasets/refcoco/refcoco_plus.py |
RefCOCO+ dataset variant registered via inheritance. |
ais_bench/benchmark/datasets/refcoco/refcoco_g.py |
RefCOCOg dataset variant registered via inheritance. |
ais_bench/benchmark/datasets/refcoco/__init__.py |
RefCOCO package exports. |
ais_bench/benchmark/datasets/__init__.py |
Imports RefCOCO package to ensure registration. |
ais_bench/benchmark/configs/datasets/refcoco/refcoco_gen.py |
New RefCOCO generation config using BBoxIoUEvaluator + bbox postprocessor. |
ais_bench/benchmark/configs/datasets/refcocog/refcocog_gen.py |
New RefCOCOg generation config. |
ais_bench/benchmark/configs/datasets/refcoco_plus/refcoco_plus_gen.py |
New RefCOCO+ generation config. |
tests/UT/openicl/icl_evaluator/test_bbox_iou_evaluator.py |
Unit tests for IoU evaluator scoring, clipping, error paths, and registry registration. |
tests/UT/datasets/refcoco/test_refcoco.py |
Unit tests for RefCOCO loader behavior and bbox postprocessor registration. |
tests/UT/datasets/refcoco/test_refcoco_plus.py |
Unit tests for RefCOCO+ delegation + registry registration. |
tests/UT/datasets/refcoco/test_refcocog.py |
Unit tests for RefCOCOg delegation + registry registration. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
ais_bench/benchmark/configs/datasets/refcoco_plus/refcoco_plus_gen.py
Outdated
Show resolved
Hide resolved
41f9c02 to
e2ce2b5
Compare
There was a problem hiding this comment.
Pull request overview
Adds RefCOCO/RefCOCO+/RefCOCOg visual grounding dataset support to the ais_bench benchmarking framework, including dataset loaders, evaluation via bbox IoU, prompt/postprocessing integration, and accompanying unit tests.
Changes:
- Introduce RefCOCO-family dataset loaders with image path/base64 modes and bbox-answer normalization.
- Add
BBoxIoUEvaluatorfor IoU-thresholded accuracy plus framework registry integration. - Add dataset config presets (path + base64 variants) and unit tests for loaders/evaluator/registry wiring.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/UT/openicl/icl_evaluator/test_bbox_iou_evaluator.py | Unit tests for IoU evaluator scoring, scaling/clipping, invalid cases, and registry registration. |
| tests/UT/datasets/refcoco/test_refcocog.py | Unit tests for RefCOCOg loader delegation and registry registration. |
| tests/UT/datasets/refcoco/test_refcoco.py | Unit tests for RefCOCO loader row expansion, image handling (path/base64), and bbox postprocessor registration. |
| tests/UT/datasets/refcoco/test_refcoco_plus.py | Unit tests for RefCOCO+ loader delegation and registry registration. |
| ais_bench/benchmark/openicl/icl_evaluator/bbox_iou_evaluator.py | New IoU-based evaluator for bbox predictions with scaling/clipping and per-sample details. |
| ais_bench/benchmark/openicl/icl_evaluator/init.py | Expose BBoxIoUEvaluator from evaluator package. |
| ais_bench/benchmark/datasets/refcoco/refcoco.py | Core RefCOCO loader, image resolver strategies (path/base64), bbox postprocessor, and prompt generation. |
| ais_bench/benchmark/datasets/refcoco/refcoco_plus.py | RefCOCO+ dataset class reusing RefCOCO loader. |
| ais_bench/benchmark/datasets/refcoco/refcoco_g.py | RefCOCOg dataset class reusing RefCOCO loader. |
| ais_bench/benchmark/datasets/refcoco/init.py | Package exports for RefCOCO datasets and helpers. |
| ais_bench/benchmark/datasets/init.py | Register RefCOCO datasets into datasets package star-imports. |
| ais_bench/benchmark/configs/datasets/refcocog/refcocog_gen.py | RefCOCOg generation config (file-path images). |
| ais_bench/benchmark/configs/datasets/refcocog/refcocog_gen_base64.py | RefCOCOg generation config (base64 images). |
| ais_bench/benchmark/configs/datasets/refcoco/refcoco_gen.py | RefCOCO generation config (file-path images). |
| ais_bench/benchmark/configs/datasets/refcoco/refcoco_gen_base64.py | RefCOCO generation config (base64 images). |
| ais_bench/benchmark/configs/datasets/refcoco_plus/refcoco_plus_gen.py | RefCOCO+ generation config (file-path images). |
| ais_bench/benchmark/configs/datasets/refcoco_plus/refcoco_plus_gen_base64.py | RefCOCO+ generation config (base64 images). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
ais_bench/benchmark/openicl/icl_evaluator/bbox_iou_evaluator.py
Outdated
Show resolved
Hide resolved
ais_bench/benchmark/openicl/icl_evaluator/bbox_iou_evaluator.py
Outdated
Show resolved
Hide resolved
e2ce2b5 to
b2f1bac
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds RefCOCO-family visual grounding benchmark support to the AISBench benchmarking framework, including dataset loaders, configs, and an IoU-based bounding-box evaluator for localization-style multimodal tasks.
Changes:
- Added
RefCOCODatasetloader with image caching/base64 options plus a bbox extraction postprocessor; introducedRefCOCOgDataset/RefCOCOPlusDatasetvariants. - Added
BBoxIoUEvaluatorto score predicted boxes against ground-truth boxes using IoU. - Added dataset config files (path + base64 variants) and registered new datasets/evaluator via package
__init__imports.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| ais_bench/benchmark/openicl/icl_evaluator/bbox_iou_evaluator.py | New IoU evaluator for bbox localization tasks. |
| ais_bench/benchmark/openicl/icl_evaluator/init.py | Exposes the new evaluator for discovery/import. |
| ais_bench/benchmark/datasets/refcoco/refcoco.py | Core RefCOCO loader + bbox postprocessor + image resolution logic. |
| ais_bench/benchmark/datasets/refcoco/refcoco_g.py | RefCOCOg dataset variant (reuses RefCOCO loader). |
| ais_bench/benchmark/datasets/refcoco/refcoco_plus.py | RefCOCOPlus dataset variant (reuses RefCOCO loader). |
| ais_bench/benchmark/datasets/refcoco/init.py | Re-exports new datasets/constants/postprocessor. |
| ais_bench/benchmark/datasets/init.py | Imports RefCOCO module to register/expose datasets. |
| ais_bench/benchmark/configs/datasets/refcoco/refcoco_gen.py | RefCOCO path-based generation config wired to bbox evaluator. |
| ais_bench/benchmark/configs/datasets/refcoco/refcoco_gen_base64.py | RefCOCO base64 generation config wired to bbox evaluator. |
| ais_bench/benchmark/configs/datasets/refcocog/refcocog_gen.py | RefCOCOg path-based generation config. |
| ais_bench/benchmark/configs/datasets/refcocog/refcocog_gen_base64.py | RefCOCOg base64 generation config. |
| ais_bench/benchmark/configs/datasets/refcoco_plus/refcoco_plus_gen.py | RefCOCOPlus path-based generation config. |
| ais_bench/benchmark/configs/datasets/refcoco_plus/refcoco_plus_gen_base64.py | RefCOCOPlus base64 generation config. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。
PR Type / PR类型
Related Issue | 关联 Issue
Fixes #(issue ID / issue 编号) / Relates to #(issue ID / issue 编号)
🔍 Motivation / 变更动机
This pull request introduces comprehensive support for the RefCOCO, RefCOCOg, and RefCOCOplus visual grounding datasets in the benchmarking framework. It adds dataset loaders, configuration files, and a bounding box IoU evaluator for multimodal tasks involving object localization in images. The main changes include new dataset classes, prompt templates, evaluation logic, and integration into the registry system.
📝 Modification / 修改内容
This pull request adds comprehensive support for the RefCOCO, RefCOCOg, and RefCOCOplus referring expression comprehension datasets to the benchmarking suite. It introduces modular and configurable dataset loaders, evaluation configurations, and prompt templates for both file-based and base64-encoded image formats. The changes also include dataset registration, utility functions for image handling, and integration with the evaluation pipeline.
Key changes include:
New Dataset Loaders and Utilities:
RefCOCODatasetand its variants (RefCOCOgDataset,RefCOCOPlusDataset) with support for both file path and base64 image encoding, including modular image resolver strategies and methods for loading, normalizing, and expanding dataset rows.Benchmark Configuration Additions:
Evaluation and Postprocessing Integration:
BBoxIoUEvaluator) and a postprocessing function (refcoco_bbox_postprocess) for extracting and normalizing predicted bounding boxes from model outputs.These changes provide a robust and extensible foundation for benchmarking multimodal models on referring expression comprehension tasks using the RefCOCO family of datasets.
📐 Associated Test Results / 关联测试结果
[dataset usage]
[ut test]