Description
Currently, the search engine's ranking quality is limited because we only store a list of doc_ids in the inverted index (RocksDB). This means we assume a Term Frequency (TF) of 1 for all matches, preventing effective BM25 ranking.
We need to update the Indexer to calculate and store TF, and update the Ranker to utilize it.
Tasks
Acceptance Criteria
- Indexer stores TF information for each token-document pair.
- Ranker successfully parses the new format without crashing.
- Search scores strictly reflect term frequency (a document with 5 occurrences of a term should rank higher than one with 1, all else being equal).
Description
Currently, the search engine's ranking quality is limited because we only store a list of doc_ids in the inverted index (RocksDB). This means we assume a Term Frequency (TF) of 1 for all matches, preventing effective BM25 ranking.
We need to update the Indexer to calculate and store TF, and update the Ranker to utilize it.
Tasks
Acceptance Criteria