Skip to content

added benchmarking methods for cros referencing when needed#9

Merged
lucifer-ux merged 3 commits into
mainfrom
adding-benchmarking
Apr 8, 2026
Merged

added benchmarking methods for cros referencing when needed#9
lucifer-ux merged 3 commits into
mainfrom
adding-benchmarking

Conversation

@lucifer-ux
Copy link
Copy Markdown
Owner

@lucifer-ux lucifer-ux commented Apr 7, 2026

added benchmarking methods for anyone to benchmark the repo and get results of how much efficiency search and token reduction is providing

Comment thread cli/commands/benchmark.py
Comment thread cli/commands/benchmark.py
return f"https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/{dataset}.zip"


def _download_and_unzip_dataset(dataset: str, datasets_root: Path) -> Path:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is Path as a type here? If possible pls add a comment

Comment thread cli/commands/benchmark.py

url = _dataset_url(dataset)
try:
from beir import util # type: ignore
Copy link
Copy Markdown
Collaborator

@abhiedward001 abhiedward001 Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a kinda lazy loading; want to understand why we have differ the importing here

Comment thread cli/commands/benchmark.py
def _load_corpus(corpus_path: Path) -> dict[str, str]:
corpus: dict[str, str] = {}
with corpus_path.open("r", encoding="utf-8") as fh:
for line in fh:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is fh here??

Comment thread cli/commands/benchmark.py
doc_id = str(row.get("_id", "")).strip()
if not doc_id:
continue
title = str(row.get("title") or "").strip()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str(row.get("_id", "")).strip() ; str(row.get("title") or "").strip()
diff between both opt? why we doing or here? what does row.get("title") will return it title wont found

Comment thread cli/commands/benchmark.py
def _load_queries(queries_path: Path) -> dict[str, str]:
queries: dict[str, str] = {}
with queries_path.open("r", encoding="utf-8") as fh:
for line in fh:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ fh is not understandable

Comment thread cli/commands/benchmark.py
with queries_path.open("r", encoding="utf-8") as fh:
for line in fh:
row = json.loads(line)
qid = str(row.get("_id", "")).strip()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ same as above

Copy link
Copy Markdown
Collaborator

@abhiedward001 abhiedward001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls ask AI to add comments what all these functions does; cant understand usecase as a new dev

@lucifer-ux lucifer-ux merged commit 52aec66 into main Apr 8, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants