Add wbench eval framework by KainingYing · Pull Request #2200 · huggingface/huggingface.js

KainingYing · 2026-05-29T09:14:34Z

Summary

Adds wbench to the supported evaluation frameworks for benchmark dataset eval.yaml files.

WBench is a comprehensive multi-turn benchmark for interactive video world model evaluation, assessing models across 5 dimensions (video quality, setting adherence, interaction adherence, consistency, physics compliance) and 22 metrics over 289 multi-turn interaction cases.

Dataset prepared for the Hub Evaluation Results feature:
https://huggingface.co/datasets/meituan-longcat/WBench

The dataset repo already includes an eval.yaml.

Note

Low Risk
Registry-only metadata change with no runtime logic, auth, or data path changes.

Overview
Registers wbench in EVALUATION_FRAMEWORKS (packages/tasks/src/eval.ts) so benchmark datasets can declare it in eval.yaml and surface correctly on Hub Evaluation Results.

The new entry includes display name, a short description of WBench (multi-turn interactive video world model evaluation), and a link to the upstream repo.

^{Reviewed by Cursor Bugbot for commit 30f6c1c. Bugbot is set up for automated code reviews on this repo. Configure here.}

Add wbench eval framework

30f6c1c

KainingYing requested review from NathanHB, gary149, julien-c, krampstudio and pcuenca as code owners May 29, 2026 09:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add wbench eval framework#2200

Add wbench eval framework#2200
KainingYing wants to merge 1 commit into
huggingface:mainfrom
KainingYing:add-wbench-eval-framework

KainingYing commented May 29, 2026 •

edited by cursor Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KainingYing commented May 29, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

KainingYing commented May 29, 2026 •

edited by cursor Bot

Loading