🔍 Run efficient evaluations for prompt and LLM regression testing with this lightweight, secret-free evaluation harness.
python workflow benchmark pytest dataset hacktoberfest ai-framework ai-safety synthetic-data mlops ai-automation ai-evaluation llm prompt-engineering red-teaming-tools llm-evaluation llm-evaluation-framework ai-evaluation-framework tinybenchmarks keyword-metrics
-
Updated
Jun 8, 2026 - Python