Skip to content

arctanxarc/GENIUS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 GENIUS: Generative Fluid Intelligence Evaluation Suite

Paper Dataset

Ruichuan An*, Sihan Yang*, Ziyu Guo, Wei Dai, Zijun Shen, Haodong Li
Renrui Zhang, Xinyu Wei, Guopeng Li, Wenshan Wu, Wentao Zhang

PKU, CUHK, StepFun, PolyU, MSRA.

* Equal Contribution     Project Leader     Corresponding Author

📄 Blog | 🚀 Quick Start | 📦 Dataset | 📜 License | 📝 Citation | 📬 Contact

An overview of GENIUS benchmark.

🕙 Timeline

  • 2026.02.11: 🌟 Release of the evaluation code and the core test dataset.
  • TBD: Integration of more model inference scripts.

🚀 Quick Start

1. Download the Test Dataset

The dataset is available across multiple platforms for your convenience:

2. Installation & Directory Setup

Clone the repository and prepare your local environment:

git clone https://github.com/arctanxarc/GENIUS.git
cd GENIUS

After downloading the dataset, ensure your directory structure matches the following:

./
├── cal_score.py           # Scoring script
├── dataset/               # Test dataset
│   ├── implicit_pattern
│   ├── multi_semantic
│   ├── prior_conflicting
│   ├── symbolic_constraint
│   └── visual_constraint
├── eval_prompt.py         # Prompt management
├── eval.py                # Main evaluation logic
├── eval.sh                # Entry script
├── GENIUS.pdf             # Paper
└── README.md

3. Prepare Model Outputs

Place the images generated by your models into the outputs directory. Organize them using the following hierarchy: outputs/<model_name>/<task_name>/{id}.png.

Important

The {id} must correspond strictly to the id field in test_data.json (Note: IDs are unique identifiers, not necessarily a continuous sequence starting from 0).

Example Structure:

./
./outputs/
└── nanobanana/              # Example: Model Name
    ├── implicit_pattern/
    │   ├── 002.png          # Matches ID=002 in ./dataset/implicit_pattern/test_data.json
    │   ├── 003.png
    │   └── ...
    ├── multi_semantic/
    └── ...

4. Running the Evaluation

Configure your credentials and target models in eval.sh:

  1. Set your API_URL and API_KEY for LMM-as-a-judge.
  2. Define the evaluation scope:
DIMENSIONS=("implicit_pattern" "symbolic_constraint" "visual_constraint" "prior_conflicting" "multi_semantic")
MODELS=("your_model_name")
  1. Execute the evaluation script:
bash eval.sh

📜 License

The dataset and code are released under CC-BY-NC 4.0 and are intended for academic research only. Commercial use is not permitted.

📝 Citation

@misc{an2026geniusgenerativefluidintelligence,
      title={GENIUS: Generative Fluid Intelligence Evaluation Suite}, 
      author={Ruichuan An and Sihan Yang and Ziyu Guo and Wei Dai and Zijun Shen and Haodong Li and Renrui Zhang and Xinyu Wei and Guopeng Li and Wenshan Wu and Wentao Zhang},
      year={2026},
      eprint={2602.11144},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2602.11144}, 
}

📬 Contact

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors