Skip to content

OpenDFM/AirQA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AirQA: A Comprehensive QA Dataset for AI Research with Instance-Level Evaluation
ICLR 2026 Poster

website arXiv data

our-dataset

💫 Table of Contents (Click to expand)

💡 Main Contributions

  • We propose AirQA, a human-annotated multi-modal multi-task multi-paper QA dataset with function-based instance-specific evaluations. To the best of our knowledge, AirQA is the first dataset that encompasses multiple question types, also the first to bring function-based evaluation into QA domain, enabling convenient and systematic assessment of research capabilities.
  • We introduce ExTrActor, a document-based framework aiming at the synthesis of QA examples, interaction trajectories and instruction data, serving as an empirical method for improving the agent's multi-turn tool-using ability without the involvement of manual annotation.
  • We evaluate various LLMs and different QA baselines on our AirQA dataset, demonstrating the quality of our dataset, and indicating the insufficiency of current methods. Extensive experiments on instruction tuning reveal that, small models significantly benefit from our synthetic instruction data, validating the effectiveness of our proposed ExTrActor framework.

🔍 Quick Start

  1. Create the conda environment and install dependencies:

    conda create -n airqa python=3.10
    conda activate airqa
    pip install -r requirements.txt
  2. (Optional) Download related files:

    We have included the full QA data in our repository at data/test_data.jsonl, so it is sufficient to download only the paper-related metadata, processed data, and PDFs as required. Note that the evaluation itself doesn't require these paper-related data, but you can use the provided data to help answer the questions.

    python utils/download_utils.py --datatype metadata processed_data
    # Add 'papers' if you want to run ExTrActor
    # Note that this requires additional disk space (~60G)

    You can also download these files manually from our official Hugging Face repository and organize them into the following folder structure:

    Click to see the folder structure 👇🏻
    AirQA
    |── data/
    |   |── metadata/
    |   |   |── .gitkeep
    |   |   |── 000ab6db-4b65-5dc0-8393-fbc2c05843c8.json
    |   |   └── ... # more metadata dicts
    |   |── papers/
    |   |   |── .gitkeep
    |   |   |── acl2016/
    |   |   |   └── 16c3a7ad-d638-5ebf-a72a-bd58f06c16d7.pdf
    |   |   |── acl2019/
    |   |   |   └── c7563d97-695f-5c77-8021-334bf2ff9ddb.pdf
    |   |   |── acl2023/
    |   |   |   |── 001ab93b-7665-5d56-a28e-eac95d2a9d7e.pdf
    |   |   |   └── ... # more .pdf published in ACL 2023
    |   |   └── ... # other sub-folders of paper collections
    |   └── processed_data/
    |       |── .gitkeep
    |       |── 000ab6db-4b65-5dc0-8393-fbc2c05843c8.json # cached data for PDF parsing
    |       └── ... # more cached data for PDFs
    └── ... # other folders and files
  3. Evaluate your answers on AirQA.

    export OPENAI_API_KEY="sk-xxxxxxxxxxxxxxxxxxxx"
    export OPENAI_BASE_URL="https://api.openai.com/v1"
    python utils/eval_utils.py \
        --gold data/test_data.jsonl \
        --dir results/example

    See the Evaluation document for more details.

  4. (Optional) Use ExTrActor to automately generate examples. Running ExTrActor requires further configuration and preparation, please refer to the ExTrActor document for more details.

📚 Detailed Documents and Tutorials

Fine-grained documents in this project are detailed in folder documents/. Here is the checklist:

Documents Description
📗 documents/data_format.md Example and paper data format for AirQA.
📕 documents/evaluation.md Evaluation functions, arguments and scripts for AirQA.
📘 documents/extractor.md Details on automately generating examples with ExTrActor.

✍🏻 Citation

If you find this dataset useful, please cite our work:

@misc{huang2025airqacomprehensiveqadataset,
      title={AirQA: A Comprehensive QA Dataset for AI Research with Instance-Level Evaluation}, 
      author={Tiancheng Huang and Ruisheng Cao and Yuxin Zhang and Zhangyi Kang and Zijian Wang and Chenrun Wang and Yijie Luo and Hang Zheng and Lirong Qian and Lu Chen and Kai Yu},
      year={2025},
      eprint={2509.16952},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2509.16952}, 
}

About

[ICLR 2026] AirQA: A Comprehensive QA Dataset for AI Research with Instance-Level Evaluation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages