Skip to content

[Question] Observation on Wan 3x extrapolation: Gap in subject_consistency between PE and UltraViCo #30

@bxuanz

Description

@bxuanz

Hi authors,

First of all, thanks for the amazing work and for open-sourcing the code!

I am currently trying to reproduce the VBench metrics for the Wan base model (3x extrapolation) as reported in Table 1. However, I noticed that the trend for the subject_consistency metric in my local tests doesn't perfectly align with the paper's results, and the gap between PE and UltraViCo is quite large.

Here are the results from my two comparative experiments:

📊 My Experimental Results

Experiment 1:

  • UltraViCo:
    {
        "subject_consistency": 0.9141,
        "dynamic_degree": 32.0,
        "imaging_quality": 63.957,
        "overall_consistency": 19.4289
    }
  • PE:
    {
        "subject_consistency": 0.9759,
        "dynamic_degree": 9.0,
        "imaging_quality": 58.7393,
        "overall_consistency": 16.7371
    }

Experiment 2:

  • UltraViCo:
    {
        "subject_consistency": 0.9147,
        "dynamic_degree": 32.0,
        "imaging_quality": 65.1345,
        "overall_consistency": 21.4484
    }
  • PE:
    {
        "subject_consistency": 0.9821,
        "dynamic_degree": 10.0,
        "imaging_quality": 58.4083,
        "overall_consistency": 18.5526
    }

💻 My Evaluation Code

For reference, here is the VBench evaluation script I am currently using to calculate the metrics:

Click to expand the evaluation code
import os
import json
import argparse
import re
from vbench import VBench

def main():
    parser = argparse.ArgumentParser(description="VBench Evaluation for UltraViCo Wan2.1")
    parser.add_argument("--prompt_file", type=str, default="/data/proj/zbx/DiT-Extrapolation/assets/all_dimension_100_set2.txt")
    parser.add_argument("--video_dir", type=str, default="/data/proj/zbx/DiT-Extrapolation/set2/output/vbench_results_wan_ultra_3x")
    parser.add_argument("--output_path", type=str, default="set2/output/results_ultra/vbench_wan_metrics.json")
    parser.add_argument("--vbench_info", type=str, default="/data/proj/zbx/VBench/vbench/VBench_full_info.json")
    args = parser.parse_args()

    output_dir = os.path.dirname(args.output_path)
    if not os.path.exists(output_dir):
        os.makedirs(output_dir, exist_ok=True)

    run_name = "wan2.1_extrapolation"
    dimension_list = ["subject_consistency", "dynamic_degree", "imaging_quality", "overall_consistency"]

    # 1. Read prompts
    with open(args.prompt_file, 'r', encoding='utf-8') as f:
        prompts_list = [line.strip() for line in f.read().splitlines() if line.strip()]

    print(f">>> Successfully read Prompt file, total {len(prompts_list)} valid texts.")

    # 2. Build mapping dictionary
    video_prompt_mapping = {}
    abs_video_dir = os.path.abspath(args.video_dir)
    video_files = [f for f in os.listdir(abs_video_dir) if f.endswith(".mp4")]

    for filename in video_files:
        try:
            match = re.search(r'\d+', filename)
            if not match:
                continue
            
            video_id = int(match.group())
            index = video_id - 1  # 001.mp4 -> index 0
            
            if 0 <= index < len(prompts_list):
                real_prompt = prompts_list[index]
                video_key = os.path.join(abs_video_dir, filename)
                video_prompt_mapping[video_key] = real_prompt
            else:
                print(f"Warning: Video {filename} (ID: {video_id}) index out of bounds. Total prompts: {len(prompts_list)}")
        except Exception as e:
            print(f"Error processing file {filename}: {e}")

    print(f">>> Mapping complete, {len(video_prompt_mapping)} videos matched with prompts.")

    # 3. Initialize VBench
    my_VBench = VBench(device='cuda', full_info_dir=args.vbench_info, output_path=os.path.abspath(output_dir))

    # 4. Evaluate
    my_VBench.evaluate(
        videos_path=abs_video_dir,
        name=run_name,
        dimension_list=dimension_list,
        mode='custom_input', 
        prompt_list=video_prompt_mapping 
    )

    # 5. Summarize Results
    print("\n" + "="*50)
    eval_results_file = os.path.join(output_dir, f"{run_name}_eval_results.json")
    if os.path.exists(eval_results_file):
        with open(eval_results_file, 'r') as f:
            all_data = json.load(f)
        summary = {}
        for dim in dimension_list:
            val = float(all_data[dim][0] if isinstance(all_data[dim], list) else all_data[dim])
            if "subject_consistency" in dim:
                summary[dim] = round(val, 4) 
            else:
                summary[dim] = round(val * 100, 4) 
        for k, v in summary.items(): print(f"{k:25}: {v:.4f}")
        with open(args.output_path, 'w') as f: json.dump(summary, f, indent=4)
        print(f"Final results exported to: {args.output_path}")

if __name__ == "__main__":
    main()

🤔 My Questions:

  1. Metric Collapse on PE? Is the extremely high subject_consistency (~0.98) for PE expected due to the videos being nearly static (as indicated by the very low dynamic_degree of ~9.0)? Did you observe a similar "metric collapse" for PE during your evaluations?
  2. Aligning UltraViCo Consistency: My UltraViCo subject_consistency is around ~0.91, which is a bit lower than the reported ~0.94 in the paper. I noticed in Appendix C.2 it mentions: "For UltraViCo, the first frame's decay factor is set negative to fix its blurring." Could you please provide some guidance on how I should modify my evaluation code or the generation script to correctly implement this and align with the paper's results?

Looking forward to your insights. Thanks again for the great contribution!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions