What a great benchmark!
I have a problem about the recall@20 metric. In the article, you said that recall@k = 20 is used, and the VSS + LLM Reranker baseline used top-20 results from VSS. I'm wondering if VSS and VSS+LLM rerank shouldn't yield identical recall@20 metrics, since LLM reranking is performed on the top-20 results retrieved by VSS.
What a great benchmark!
I have a problem about the recall@20 metric. In the article, you said that recall@k = 20 is used, and the VSS + LLM Reranker baseline used top-20 results from VSS. I'm wondering if VSS and VSS+LLM rerank shouldn't yield identical recall@20 metrics, since LLM reranking is performed on the top-20 results retrieved by VSS.