Hello, thanks for releasing PassNet.
I have a reproducibility question about the leaderboard / Table 1 aggregation formulas.
From the paper and code, I understand the within-sample computation of ES(t) and AS. However, I could not find the exact cross-sample aggregation used to produce the final leaderboard metrics.
Could you please clarify:
- How is model-level AS Score aggregated from the 200 evaluation samples?
- Are Correctness and fast_1 computed globally over all 2,060 subgraphs, or averaged from per-sample values?
Thanks a lot.
Hello, thanks for releasing PassNet.
I have a reproducibility question about the leaderboard / Table 1 aggregation formulas.
From the paper and code, I understand the within-sample computation of
ES(t)andAS. However, I could not find the exact cross-sample aggregation used to produce the final leaderboard metrics.Could you please clarify:
Thanks a lot.