golden-set

Here are 2 public repositories matching this topic...

SidharthKriplani / goldensetauditor

Evaluation dataset quality auditor for LLM and RAG applications. Checks golden sets for conflicting labels, duplicate prompts, weak reference answers, ambiguous questions, over-easy examples, and category coverage gaps.

python nlp benchmark evaluation audit data-quality rag llm retrieval-augmented-generation llm-evaluation dataset-quality golden-set

Updated May 16, 2026
Python

infrixo-systems / rag-evaluation-starter

Star

Minimal Python script to evaluate your RAG pipeline against a golden set.

python ai production evaluation openai rag guardrails hallucinations llm langchain faithfulness retrieval-augmented-generation llm-evaluation golden-set

Updated Mar 23, 2026
Python

Improve this page

Add a description, image, and links to the golden-set topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the golden-set topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly