Skip to content

Add wbench eval framework#2200

Open
KainingYing wants to merge 1 commit into
huggingface:mainfrom
KainingYing:add-wbench-eval-framework
Open

Add wbench eval framework#2200
KainingYing wants to merge 1 commit into
huggingface:mainfrom
KainingYing:add-wbench-eval-framework

Conversation

@KainingYing
Copy link
Copy Markdown

@KainingYing KainingYing commented May 29, 2026

Summary

Adds wbench to the supported evaluation frameworks for benchmark dataset eval.yaml files.

WBench is a comprehensive multi-turn benchmark for interactive video world model evaluation, assessing models across 5 dimensions (video quality, setting adherence, interaction adherence, consistency, physics compliance) and 22 metrics over 289 multi-turn interaction cases.

Dataset prepared for the Hub Evaluation Results feature:
https://huggingface.co/datasets/meituan-longcat/WBench

The dataset repo already includes an eval.yaml.


Note

Low Risk
Registry-only metadata change with no runtime logic, auth, or data path changes.

Overview
Registers wbench in EVALUATION_FRAMEWORKS (packages/tasks/src/eval.ts) so benchmark datasets can declare it in eval.yaml and surface correctly on Hub Evaluation Results.

The new entry includes display name, a short description of WBench (multi-turn interactive video world model evaluation), and a link to the upstream repo.

Reviewed by Cursor Bugbot for commit 30f6c1c. Bugbot is set up for automated code reviews on this repo. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant