Skip to content

Feature/refcoco/+/g benchmark support#201

Open
zhongzhouTan-coder wants to merge 8 commits intoAISBench:masterfrom
zhongzhouTan-coder:feature/refcoco
Open

Feature/refcoco/+/g benchmark support#201
zhongzhouTan-coder wants to merge 8 commits intoAISBench:masterfrom
zhongzhouTan-coder:feature/refcoco

Conversation

@zhongzhouTan-coder
Copy link
Collaborator

@zhongzhouTan-coder zhongzhouTan-coder commented Mar 18, 2026

Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
感谢您的贡献,我们非常重视。以下说明将使您的拉取请求更健康,更易于获得反馈。如果您不理解某些项目,请不要担心,只需提交拉取请求并从维护人员那里寻求帮助即可。

PR Type / PR类型

  • Feature(功能新增)
  • Bugfix(Bug 修复)
  • Docs(文档更新)
  • CI/CD(持续集成/持续部署)
  • Refactor(代码重构)
  • Perf(性能优化)
  • Dependency(依赖项更新)
  • Test-Cases(测试用例更新)
  • Other(其他)

Related Issue | 关联 Issue
Fixes #(issue ID / issue 编号) / Relates to #(issue ID / issue 编号)

🔍 Motivation / 变更动机

This pull request introduces comprehensive support for the RefCOCO, RefCOCOg, and RefCOCOplus visual grounding datasets in the benchmarking framework. It adds dataset loaders, configuration files, and a bounding box IoU evaluator for multimodal tasks involving object localization in images. The main changes include new dataset classes, prompt templates, evaluation logic, and integration into the registry system.

📝 Modification / 修改内容

This pull request adds comprehensive support for the RefCOCO, RefCOCOg, and RefCOCOplus referring expression comprehension datasets to the benchmarking suite. It introduces modular and configurable dataset loaders, evaluation configurations, and prompt templates for both file-based and base64-encoded image formats. The changes also include dataset registration, utility functions for image handling, and integration with the evaluation pipeline.

Key changes include:

New Dataset Loaders and Utilities:

  • Added RefCOCODataset and its variants (RefCOCOgDataset, RefCOCOPlusDataset) with support for both file path and base64 image encoding, including modular image resolver strategies and methods for loading, normalizing, and expanding dataset rows.
  • Registered dataset classes and utility constants in the package init files for easy import and discovery.

Benchmark Configuration Additions:

  • Added configuration files for each dataset and image encoding variant, specifying prompt templates, retriever/inferencer setup, and evaluation logic for RefCOCO, RefCOCOg, and RefCOCOplus.

Evaluation and Postprocessing Integration:

  • Implemented and registered a bounding box IoU evaluator (BBoxIoUEvaluator) and a postprocessing function (refcoco_bbox_postprocess) for extracting and normalizing predicted bounding boxes from model outputs.

These changes provide a robust and extensible foundation for benchmarking multimodal models on referring expression comprehension tasks using the RefCOCO family of datasets.

📐 Associated Test Results / 关联测试结果

[dataset usage]

refcoco_file_path

[ut test]

ut

Copilot AI review requested due to automatic review settings March 18, 2026 09:40
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the benchmarking framework's capabilities by integrating the RefCOCO family of visual grounding datasets. It provides a complete pipeline from data loading and prompt generation to evaluation, enabling comprehensive assessment of models on multimodal object localization tasks. The changes introduce new dataset classes, configuration files, and a dedicated bounding box Intersection over Union (IoU) evaluator, along with necessary post-processing logic.

Highlights

  • New Dataset Support: Added comprehensive support for RefCOCO, RefCOCOg, and RefCOCOplus visual grounding datasets, including dataset loaders and configuration files for various splits.
  • Bounding Box IoU Evaluation: Introduced a new BBoxIoUEvaluator to accurately score predicted bounding boxes against ground truth using Intersection over Union (IoU), with configurable thresholds and coordinate scaling.
  • Bounding Box Post-processing: Implemented a refcoco_bbox_postprocess function to extract and parse bounding box coordinates from model outputs, ensuring compatibility with the new evaluator.
  • Framework Integration: Integrated all new datasets and the IoU evaluator into the benchmarking framework's registry, making them discoverable and usable within the existing system.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive support for the RefCOCO, RefCOCOg, and RefCOCOplus datasets, which is a great addition to the benchmark framework. The implementation, including the new dataset loaders, the BBoxIoUEvaluator, and the associated unit tests, is well-structured and robust.

My review has identified a couple of issues. There's a critical configuration mismatch in the refcoco_plus and refcocog dataset configurations where the input_columns do not align with the variables required by the prompt template, which would cause runtime errors. I've also pointed out a minor stylistic issue concerning the lack of a final newline character in several of the newly added files.

After addressing these points, the PR should be in excellent shape.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds RefCOCO-family dataset support and a bounding-box IoU evaluator to enable referring-expression grounding benchmarks (RefCOCO / RefCOCO+ / RefCOCOg) within the OpenICL evaluation pipeline.

Changes:

  • Introduces BBoxIoUEvaluator (registry-registered) for IoU-based accuracy scoring with optional coordinate scaling/clipping.
  • Adds RefCOCODataset loader plus RefCOCOPlusDataset / RefCOCOgDataset variants, along with bbox prediction postprocessing.
  • Adds dataset config entries and unit tests covering dataset loading, registry wiring, and evaluator behavior.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
ais_bench/benchmark/openicl/icl_evaluator/bbox_iou_evaluator.py New IoU evaluator for bbox predictions (with scaling/clipping + detailed per-sample results).
ais_bench/benchmark/openicl/icl_evaluator/__init__.py Exposes the new evaluator for import-side registration.
ais_bench/benchmark/datasets/refcoco/refcoco.py New RefCOCO dataset loader + bbox postprocessor and prompt template.
ais_bench/benchmark/datasets/refcoco/refcoco_plus.py RefCOCO+ dataset variant registered via inheritance.
ais_bench/benchmark/datasets/refcoco/refcoco_g.py RefCOCOg dataset variant registered via inheritance.
ais_bench/benchmark/datasets/refcoco/__init__.py RefCOCO package exports.
ais_bench/benchmark/datasets/__init__.py Imports RefCOCO package to ensure registration.
ais_bench/benchmark/configs/datasets/refcoco/refcoco_gen.py New RefCOCO generation config using BBoxIoUEvaluator + bbox postprocessor.
ais_bench/benchmark/configs/datasets/refcocog/refcocog_gen.py New RefCOCOg generation config.
ais_bench/benchmark/configs/datasets/refcoco_plus/refcoco_plus_gen.py New RefCOCO+ generation config.
tests/UT/openicl/icl_evaluator/test_bbox_iou_evaluator.py Unit tests for IoU evaluator scoring, clipping, error paths, and registry registration.
tests/UT/datasets/refcoco/test_refcoco.py Unit tests for RefCOCO loader behavior and bbox postprocessor registration.
tests/UT/datasets/refcoco/test_refcoco_plus.py Unit tests for RefCOCO+ delegation + registry registration.
tests/UT/datasets/refcoco/test_refcocog.py Unit tests for RefCOCOg delegation + registry registration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot AI review requested due to automatic review settings March 19, 2026 11:28
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds RefCOCO/RefCOCO+/RefCOCOg visual grounding dataset support to the ais_bench benchmarking framework, including dataset loaders, evaluation via bbox IoU, prompt/postprocessing integration, and accompanying unit tests.

Changes:

  • Introduce RefCOCO-family dataset loaders with image path/base64 modes and bbox-answer normalization.
  • Add BBoxIoUEvaluator for IoU-thresholded accuracy plus framework registry integration.
  • Add dataset config presets (path + base64 variants) and unit tests for loaders/evaluator/registry wiring.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/UT/openicl/icl_evaluator/test_bbox_iou_evaluator.py Unit tests for IoU evaluator scoring, scaling/clipping, invalid cases, and registry registration.
tests/UT/datasets/refcoco/test_refcocog.py Unit tests for RefCOCOg loader delegation and registry registration.
tests/UT/datasets/refcoco/test_refcoco.py Unit tests for RefCOCO loader row expansion, image handling (path/base64), and bbox postprocessor registration.
tests/UT/datasets/refcoco/test_refcoco_plus.py Unit tests for RefCOCO+ loader delegation and registry registration.
ais_bench/benchmark/openicl/icl_evaluator/bbox_iou_evaluator.py New IoU-based evaluator for bbox predictions with scaling/clipping and per-sample details.
ais_bench/benchmark/openicl/icl_evaluator/init.py Expose BBoxIoUEvaluator from evaluator package.
ais_bench/benchmark/datasets/refcoco/refcoco.py Core RefCOCO loader, image resolver strategies (path/base64), bbox postprocessor, and prompt generation.
ais_bench/benchmark/datasets/refcoco/refcoco_plus.py RefCOCO+ dataset class reusing RefCOCO loader.
ais_bench/benchmark/datasets/refcoco/refcoco_g.py RefCOCOg dataset class reusing RefCOCO loader.
ais_bench/benchmark/datasets/refcoco/init.py Package exports for RefCOCO datasets and helpers.
ais_bench/benchmark/datasets/init.py Register RefCOCO datasets into datasets package star-imports.
ais_bench/benchmark/configs/datasets/refcocog/refcocog_gen.py RefCOCOg generation config (file-path images).
ais_bench/benchmark/configs/datasets/refcocog/refcocog_gen_base64.py RefCOCOg generation config (base64 images).
ais_bench/benchmark/configs/datasets/refcoco/refcoco_gen.py RefCOCO generation config (file-path images).
ais_bench/benchmark/configs/datasets/refcoco/refcoco_gen_base64.py RefCOCO generation config (base64 images).
ais_bench/benchmark/configs/datasets/refcoco_plus/refcoco_plus_gen.py RefCOCO+ generation config (file-path images).
ais_bench/benchmark/configs/datasets/refcoco_plus/refcoco_plus_gen_base64.py RefCOCO+ generation config (base64 images).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot AI review requested due to automatic review settings March 20, 2026 02:17
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds RefCOCO-family visual grounding benchmark support to the AISBench benchmarking framework, including dataset loaders, configs, and an IoU-based bounding-box evaluator for localization-style multimodal tasks.

Changes:

  • Added RefCOCODataset loader with image caching/base64 options plus a bbox extraction postprocessor; introduced RefCOCOgDataset / RefCOCOPlusDataset variants.
  • Added BBoxIoUEvaluator to score predicted boxes against ground-truth boxes using IoU.
  • Added dataset config files (path + base64 variants) and registered new datasets/evaluator via package __init__ imports.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
ais_bench/benchmark/openicl/icl_evaluator/bbox_iou_evaluator.py New IoU evaluator for bbox localization tasks.
ais_bench/benchmark/openicl/icl_evaluator/init.py Exposes the new evaluator for discovery/import.
ais_bench/benchmark/datasets/refcoco/refcoco.py Core RefCOCO loader + bbox postprocessor + image resolution logic.
ais_bench/benchmark/datasets/refcoco/refcoco_g.py RefCOCOg dataset variant (reuses RefCOCO loader).
ais_bench/benchmark/datasets/refcoco/refcoco_plus.py RefCOCOPlus dataset variant (reuses RefCOCO loader).
ais_bench/benchmark/datasets/refcoco/init.py Re-exports new datasets/constants/postprocessor.
ais_bench/benchmark/datasets/init.py Imports RefCOCO module to register/expose datasets.
ais_bench/benchmark/configs/datasets/refcoco/refcoco_gen.py RefCOCO path-based generation config wired to bbox evaluator.
ais_bench/benchmark/configs/datasets/refcoco/refcoco_gen_base64.py RefCOCO base64 generation config wired to bbox evaluator.
ais_bench/benchmark/configs/datasets/refcocog/refcocog_gen.py RefCOCOg path-based generation config.
ais_bench/benchmark/configs/datasets/refcocog/refcocog_gen_base64.py RefCOCOg base64 generation config.
ais_bench/benchmark/configs/datasets/refcoco_plus/refcoco_plus_gen.py RefCOCOPlus path-based generation config.
ais_bench/benchmark/configs/datasets/refcoco_plus/refcoco_plus_gen_base64.py RefCOCOPlus base64 generation config.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants