Issue with Pretrained Weights Evaluation

Hi @MaticFuc and team,

Thank you for the excellent work on SALAD and for providing the pretrained weights. I've been trying to evaluate the pretrained models on MVTec LOCO dataset, but encountered two main issues that I'd like to report.

## Issue 1: Corrupted Weight File for breakfast_box

When attempting to evaluate the `breakfast_box` category, I encountered a RuntimeError with the autoencoder weights:

```
RuntimeError: [enforce fail at inline_container.cc:316] . file in archive is not in a subdirectory autoencoder_final/: breakfast_box/comp_autoencoder_final.pth
```

Upon investigation, I found that the `breakfast_box/autoencoder_final.pth` file in the pretrained weights package (`salad_loco.zip`) appears to be corrupted. The file contains 56 internal files instead of the expected 32, and includes model files from multiple categories:

```
autoencoder_final/
breakfast_box/
juice_bottle/
pushpins/
screw_bag/
splicing_connectors/
```

All other categories have the correct structure with only 32 files under `autoencoder_final/`.

**Could you please verify and re-upload the breakfast_box weights?**

## Issue 2: Significant Performance Gap on Some Categories

I successfully evaluated the other four categories, but noticed significant discrepancies between the pretrained weights' performance and the results reported in the paper:

### Results Comparison

**Reported in the paper:**

| class | logical | struct | mean |
| --- | --- | --- | --- |
| breakfast_box | 92.9 | 85.7 | 89.3 |
| juice_bottle | 99.7 | 99.6 | 99.7 |
| pushpins | 100.0 | 98.8 | 99.4 |
| screw_bag | 93.9 | 96.0 | 95.0 |
| splicing_connectors | 96.0 | 98.6 | 97.3 |
| avg | 96.5 | 95.7 | 96.1 |

**My evaluation results:**

| class | logical | struct | mean |
| --- | --- | --- | --- |
| breakfast_box | N/A | N/A | N/A |
| juice_bottle | 99.91 | 99.89 | 99.90 |
| pushpins | 93.02 | 98.17 | 95.60 |
| screw_bag | 79.44 | 87.97 | 83.71 |
| splicing_connectors | 95.23 | 98.28 | 96.76 |
| avg (w/o breakfast_box) | 91.90 | 96.08 | 93.99 |

### Key Observations:

- ✅ **juice_bottle**: Results match closely (99.70 vs 99.90)
- ✅ **splicing_connectors**: Results match closely (97.3 vs 96.76)
- ⚠️ **pushpins**: Notable gap (99.4 vs 95.60, -3.8 points)
- ❌ **screw_bag**: Significant gap (95.0 vs 83.71, -11.29 points)

### Evaluation Setup:

- Dataset: MVTec LOCO (downloaded from official source)
- Composition maps: Used the provided pretrained composition maps from `mvtec_loco_composition_maps.zip`
- Evaluation script: `test_salad.py` with default arguments
- Environment: Followed the installation instructions in README

### Questions:

1. Could there be any specific configuration or preprocessing steps I might have missed?
2. Are the pretrained weights trained with the exact hyperparameters mentioned in the paper?
3. Could you provide guidance on reproducing the exact results from the paper?

I'd be happy to provide more detailed logs or information if needed. Thank you for your time and assistance!

---

**Environment Details:**
- PyTorch version: [from conda env]
- Python version: 3.10
- Evaluation command: `python test_salad.py --category <category_name>`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with Pretrained Weights Evaluation #6

Issue 1: Corrupted Weight File for breakfast_box

Issue 2: Significant Performance Gap on Some Categories

Results Comparison

Key Observations:

Evaluation Setup:

Questions:

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

class	logical	struct	mean
breakfast_box	92.9	85.7	89.3
juice_bottle	99.7	99.6	99.7
pushpins	100.0	98.8	99.4
screw_bag	93.9	96.0	95.0
splicing_connectors	96.0	98.6	97.3
avg	96.5	95.7	96.1

class	logical	struct	mean
breakfast_box	N/A	N/A	N/A
juice_bottle	99.91	99.89	99.90
pushpins	93.02	98.17	95.60
screw_bag	79.44	87.97	83.71
splicing_connectors	95.23	98.28	96.76
avg (w/o breakfast_box)	91.90	96.08	93.99

Issue with Pretrained Weights Evaluation #6

Description

Issue 1: Corrupted Weight File for breakfast_box

Issue 2: Significant Performance Gap on Some Categories

Results Comparison

Key Observations:

Evaluation Setup:

Questions:

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions