[Question] Observation on Wan 3x extrapolation: Gap in `subject_consistency` between PE and UltraViCo

Hi authors, 

First of all, thanks for the amazing work and for open-sourcing the code! 

I am currently trying to reproduce the VBench metrics for the **Wan** base model (3x extrapolation) as reported in Table 1. However, I noticed that the trend for the `subject_consistency` metric in my local tests doesn't perfectly align with the paper's results, and the gap between PE and UltraViCo is quite large. 

Here are the results from my two comparative experiments:

### 📊 My Experimental Results

**Experiment 1:**
* **UltraViCo:**
  ```json
  {
      "subject_consistency": 0.9141,
      "dynamic_degree": 32.0,
      "imaging_quality": 63.957,
      "overall_consistency": 19.4289
  }
  ```
* **PE:**
  ```json
  {
      "subject_consistency": 0.9759,
      "dynamic_degree": 9.0,
      "imaging_quality": 58.7393,
      "overall_consistency": 16.7371
  }
  ```

**Experiment 2:**
* **UltraViCo:**
  ```json
  {
      "subject_consistency": 0.9147,
      "dynamic_degree": 32.0,
      "imaging_quality": 65.1345,
      "overall_consistency": 21.4484
  }
  ```
* **PE:**
  ```json
  {
      "subject_consistency": 0.9821,
      "dynamic_degree": 10.0,
      "imaging_quality": 58.4083,
      "overall_consistency": 18.5526
  }
  ```

### 💻 My Evaluation Code
For reference, here is the VBench evaluation script I am currently using to calculate the metrics:

<details>
<summary>Click to expand the evaluation code</summary>

```python
import os
import json
import argparse
import re
from vbench import VBench

def main():
    parser = argparse.ArgumentParser(description="VBench Evaluation for UltraViCo Wan2.1")
    parser.add_argument("--prompt_file", type=str, default="/data/proj/zbx/DiT-Extrapolation/assets/all_dimension_100_set2.txt")
    parser.add_argument("--video_dir", type=str, default="/data/proj/zbx/DiT-Extrapolation/set2/output/vbench_results_wan_ultra_3x")
    parser.add_argument("--output_path", type=str, default="set2/output/results_ultra/vbench_wan_metrics.json")
    parser.add_argument("--vbench_info", type=str, default="/data/proj/zbx/VBench/vbench/VBench_full_info.json")
    args = parser.parse_args()

    output_dir = os.path.dirname(args.output_path)
    if not os.path.exists(output_dir):
        os.makedirs(output_dir, exist_ok=True)

    run_name = "wan2.1_extrapolation"
    dimension_list = ["subject_consistency", "dynamic_degree", "imaging_quality", "overall_consistency"]

    # 1. Read prompts
    with open(args.prompt_file, 'r', encoding='utf-8') as f:
        prompts_list = [line.strip() for line in f.read().splitlines() if line.strip()]

    print(f">>> Successfully read Prompt file, total {len(prompts_list)} valid texts.")

    # 2. Build mapping dictionary
    video_prompt_mapping = {}
    abs_video_dir = os.path.abspath(args.video_dir)
    video_files = [f for f in os.listdir(abs_video_dir) if f.endswith(".mp4")]

    for filename in video_files:
        try:
            match = re.search(r'\d+', filename)
            if not match:
                continue
            
            video_id = int(match.group())
            index = video_id - 1  # 001.mp4 -> index 0
            
            if 0 <= index < len(prompts_list):
                real_prompt = prompts_list[index]
                video_key = os.path.join(abs_video_dir, filename)
                video_prompt_mapping[video_key] = real_prompt
            else:
                print(f"Warning: Video {filename} (ID: {video_id}) index out of bounds. Total prompts: {len(prompts_list)}")
        except Exception as e:
            print(f"Error processing file {filename}: {e}")

    print(f">>> Mapping complete, {len(video_prompt_mapping)} videos matched with prompts.")

    # 3. Initialize VBench
    my_VBench = VBench(device='cuda', full_info_dir=args.vbench_info, output_path=os.path.abspath(output_dir))

    # 4. Evaluate
    my_VBench.evaluate(
        videos_path=abs_video_dir,
        name=run_name,
        dimension_list=dimension_list,
        mode='custom_input', 
        prompt_list=video_prompt_mapping 
    )

    # 5. Summarize Results
    print("\n" + "="*50)
    eval_results_file = os.path.join(output_dir, f"{run_name}_eval_results.json")
    if os.path.exists(eval_results_file):
        with open(eval_results_file, 'r') as f:
            all_data = json.load(f)
        summary = {}
        for dim in dimension_list:
            val = float(all_data[dim][0] if isinstance(all_data[dim], list) else all_data[dim])
            if "subject_consistency" in dim:
                summary[dim] = round(val, 4) 
            else:
                summary[dim] = round(val * 100, 4) 
        for k, v in summary.items(): print(f"{k:25}: {v:.4f}")
        with open(args.output_path, 'w') as f: json.dump(summary, f, indent=4)
        print(f"Final results exported to: {args.output_path}")

if __name__ == "__main__":
    main()
```
</details>

### 🤔 My Questions:

1. **Metric Collapse on PE?** Is the extremely high `subject_consistency` (~0.98) for PE expected due to the videos being nearly static (as indicated by the very low `dynamic_degree` of ~9.0)? Did you observe a similar "metric collapse" for PE during your evaluations?
2. **Aligning UltraViCo Consistency:** My UltraViCo `subject_consistency` is around ~0.91, which is a bit lower than the reported ~0.94 in the paper. I noticed in Appendix C.2 it mentions: *"For UltraViCo, the first frame's decay factor is set negative to fix its blurring."* Could you please provide some guidance on how I should modify my evaluation code or the generation script to correctly implement this and align with the paper's results? 

Looking forward to your insights. Thanks again for the great contribution!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Observation on Wan 3x extrapolation: Gap in `subject_consistency` between PE and UltraViCo #30

📊 My Experimental Results

💻 My Evaluation Code

🤔 My Questions:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] Observation on Wan 3x extrapolation: Gap in subject_consistency between PE and UltraViCo #30

Description

📊 My Experimental Results

💻 My Evaluation Code

🤔 My Questions:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Question] Observation on Wan 3x extrapolation: Gap in `subject_consistency` between PE and UltraViCo #30