Skip to content

How to Evaluate Qwen3-VL on the TraceSpatial Benchmark? #2

@k00dj-19

Description

@k00dj-19

Thanks for the great work.

I’m trying to reproduce your reported results by running Qwen3-VL inference on the TraceSpatial benchmark. However, when I use the system prompt provided in your repository, the output format differs from what the evaluation script expects, which results in a metric score of 0.0 or scores that do not match those reported in the paper.

Could you please share the exact prompting or evaluation setup you used (e.g., system prompt, output format constraints, or post-processing steps) to obtain the reported results?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions