SidharthKriplani / goldensetauditor Star 0 Code Issues Pull requests Evaluation dataset quality auditor for LLM and RAG applications. Checks golden sets for conflicting labels, duplicate prompts, weak reference answers, ambiguous questions, over-easy examples, and category coverage gaps. python nlp benchmark evaluation audit data-quality rag llm retrieval-augmented-generation llm-evaluation dataset-quality golden-set Updated May 16, 2026 Python
infrixo-systems / rag-evaluation-starter Star 0 Code Issues Pull requests Discussions Minimal Python script to evaluate your RAG pipeline against a golden set. python ai production evaluation openai rag guardrails hallucinations llm langchain faithfulness retrieval-augmented-generation llm-evaluation golden-set Updated Mar 23, 2026 Python