Skip to content

Issue with Pretrained Weights Evaluation #6

@zhiqing0205

Description

@zhiqing0205

Hi @MaticFuc and team,

Thank you for the excellent work on SALAD and for providing the pretrained weights. I've been trying to evaluate the pretrained models on MVTec LOCO dataset, but encountered two main issues that I'd like to report.

Issue 1: Corrupted Weight File for breakfast_box

When attempting to evaluate the breakfast_box category, I encountered a RuntimeError with the autoencoder weights:

RuntimeError: [enforce fail at inline_container.cc:316] . file in archive is not in a subdirectory autoencoder_final/: breakfast_box/comp_autoencoder_final.pth

Upon investigation, I found that the breakfast_box/autoencoder_final.pth file in the pretrained weights package (salad_loco.zip) appears to be corrupted. The file contains 56 internal files instead of the expected 32, and includes model files from multiple categories:

autoencoder_final/
breakfast_box/
juice_bottle/
pushpins/
screw_bag/
splicing_connectors/

All other categories have the correct structure with only 32 files under autoencoder_final/.

Could you please verify and re-upload the breakfast_box weights?

Issue 2: Significant Performance Gap on Some Categories

I successfully evaluated the other four categories, but noticed significant discrepancies between the pretrained weights' performance and the results reported in the paper:

Results Comparison

Reported in the paper:

class logical struct mean
breakfast_box 92.9 85.7 89.3
juice_bottle 99.7 99.6 99.7
pushpins 100.0 98.8 99.4
screw_bag 93.9 96.0 95.0
splicing_connectors 96.0 98.6 97.3
avg 96.5 95.7 96.1

My evaluation results:

class logical struct mean
breakfast_box N/A N/A N/A
juice_bottle 99.91 99.89 99.90
pushpins 93.02 98.17 95.60
screw_bag 79.44 87.97 83.71
splicing_connectors 95.23 98.28 96.76
avg (w/o breakfast_box) 91.90 96.08 93.99

Key Observations:

  • juice_bottle: Results match closely (99.70 vs 99.90)
  • splicing_connectors: Results match closely (97.3 vs 96.76)
  • ⚠️ pushpins: Notable gap (99.4 vs 95.60, -3.8 points)
  • screw_bag: Significant gap (95.0 vs 83.71, -11.29 points)

Evaluation Setup:

  • Dataset: MVTec LOCO (downloaded from official source)
  • Composition maps: Used the provided pretrained composition maps from mvtec_loco_composition_maps.zip
  • Evaluation script: test_salad.py with default arguments
  • Environment: Followed the installation instructions in README

Questions:

  1. Could there be any specific configuration or preprocessing steps I might have missed?
  2. Are the pretrained weights trained with the exact hyperparameters mentioned in the paper?
  3. Could you provide guidance on reproducing the exact results from the paper?

I'd be happy to provide more detailed logs or information if needed. Thank you for your time and assistance!


Environment Details:

  • PyTorch version: [from conda env]
  • Python version: 3.10
  • Evaluation command: python test_salad.py --category <category_name>

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions