Implement Term Frequency (TF) Support in Indexer and Ranker

# Description 

Currently, the search engine's ranking quality is limited because we only store a list of doc_ids in the inverted index (RocksDB). This means we assume a Term Frequency (TF) of 1 for all matches, preventing effective BM25 ranking.

We need to update the Indexer to calculate and store TF, and update the Ranker to utilize it.

# Tasks 

- [ ] Indexer (C++): Modify indexer /src/main.cpp to calculate TF during tokenization.
- [ ] Indexer (C++): Update RocksDB storage format to store doc_id:tf pairs (e.g., 123:4,125:1) instead of just doc_ids.
- [ ] Ranker (Python): Update ranker/engine.py to parse the new value format from RocksDB.
- [ ]  Ranker (Python): Update the BM25 calculation in search() to use the actual TF value.

# Acceptance Criteria 

- Indexer stores TF information for each token-document pair.
- Ranker successfully parses the new format without crashing.
- Search scores strictly reflect term frequency (a document with 5 occurrences of a term should rank higher than one with 1, all else being equal).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Term Frequency (TF) Support in Indexer and Ranker #24

Description

Tasks

Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Implement Term Frequency (TF) Support in Indexer and Ranker #24

Description

Description

Tasks

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions