Fix/test api key enforcement by apundhir · Pull Request #13 · aiexponenthq/rag-benchmarking

apundhir · 2026-04-11T17:21:22Z

Summary

Changes

Testing

Tests pass
Manual testing completed
No breaking changes (or migration path documented)

Related Issues

Checklist

Code follows project style guidelines
Documentation updated (if applicable)
No secrets or credentials in this PR

Adding ENFORCE_API_KEY=true to .env caused 4 legacy tests to fail with 403 instead of expected 422/200. Root cause: pydantic-settings reads .env directly, bypassing os.environ patches. Fix: conftest.py autouse fixture sets env_file=None on AppSettings during tests and clears the lru_cache, so each test gets fresh settings from os.environ only. Security tests that explicitly test enforcement are unaffected — they use their own mock.patch contexts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Root cause: /v1/evaluate was calling rr.run_evaluation() directly, which immediately requires GEMINI_API_KEY even for deterministic metrics like source_attribution_accuracy. Fix: route through EvaluationRunner which correctly dispatches: - Deterministic metrics (source_attribution_accuracy) → no LLM needed - RAGAS metrics (faithfulness, answer_relevancy, etc.) → calls Gemini/OpenAI - Retrieval metrics (precision@k, etc.) → no LLM needed Updated response shape: metrics are at top level (not nested under "result"). Updated 4 tests to patch harness.runner.run_evaluation (correct mock path) and assert against new response shape. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

os.getenv('GEMINI_API_KEY') returns None when the key is set via .env file loaded by pydantic-settings — pydantic-settings reads .env into Python attributes but does NOT export them to os.environ. Fix: use get_settings().gemini_api_key (reads .env via pydantic-settings) with os.getenv as fallback for environments where the var is already exported. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…n RAGAS - gemini-1.5-flash returns 404 NOT_FOUND on the v1beta API — updated to gemini-2.0-flash - raise_exceptions=False was silently returning NaN scores; changed to True so actual Gemini/RAGAS errors surface as real error messages - ragas_runner now reads gemini_model from app settings (defaults to gemini-2.0-flash) - Updated .env.example default model name Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ter LLM config - Handle NaN scores: replace with None so JSON serialises cleanly (null not NaN) - Add try/except around result.to_pandas() — fall back to scores dict if it fails - Clean up LLM provider config: gemini via LangChainWrapper, openai via ChatOpenAI - Log the actual exception when to_pandas() fails for debugging Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: conftest.py prevents .env API_KEY from breaking legacy tests Adding ENFORCE_API_KEY=true to .env caused 4 legacy tests to fail with 403 instead of expected 422/200. Root cause: pydantic-settings reads .env directly, bypassing os.environ patches. Fix: conftest.py autouse fixture sets env_file=None on AppSettings during tests and clears the lru_cache, so each test gets fresh settings from os.environ only. Security tests that explicitly test enforcement are unaffected — they use their own mock.patch contexts. * fix: route /v1/evaluate through EvaluationRunner, not raw RAGAS Root cause: /v1/evaluate was calling rr.run_evaluation() directly, which immediately requires GEMINI_API_KEY even for deterministic metrics like source_attribution_accuracy. Fix: route through EvaluationRunner which correctly dispatches: - Deterministic metrics (source_attribution_accuracy) → no LLM needed - RAGAS metrics (faithfulness, answer_relevancy, etc.) → calls Gemini/OpenAI - Retrieval metrics (precision@k, etc.) → no LLM needed Updated response shape: metrics are at top level (not nested under "result"). Updated 4 tests to patch harness.runner.run_evaluation (correct mock path) and assert against new response shape. * fix: ragas_runner reads GEMINI_API_KEY from app settings, not os.getenv os.getenv('GEMINI_API_KEY') returns None when the key is set via .env file loaded by pydantic-settings — pydantic-settings reads .env into Python attributes but does NOT export them to os.environ. Fix: use get_settings().gemini_api_key (reads .env via pydantic-settings) with os.getenv as fallback for environments where the var is already exported. * fix: update Gemini model to gemini-2.0-flash, raise_exceptions=True in RAGAS - gemini-1.5-flash returns 404 NOT_FOUND on the v1beta API — updated to gemini-2.0-flash - raise_exceptions=False was silently returning NaN scores; changed to True so actual Gemini/RAGAS errors surface as real error messages - ragas_runner now reads gemini_model from app settings (defaults to gemini-2.0-flash) - Updated .env.example default model name * fix: robust RAGAS result handling — NaN→None, to_pandas fallback, better LLM config - Handle NaN scores: replace with None so JSON serialises cleanly (null not NaN) - Add try/except around result.to_pandas() — fall back to scores dict if it fails - Clean up LLM provider config: gemini via LangChainWrapper, openai via ChatOpenAI - Log the actual exception when to_pandas() fails for debugging ---------

apundhir and others added 6 commits April 11, 2026 20:08

Merge branch 'main' into fix/test-api-key-enforcement

e5cd1f8

apundhir merged commit c331cf2 into main Apr 11, 2026
3 checks passed

apundhir deleted the fix/test-api-key-enforcement branch April 11, 2026 17:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/test api key enforcement#13

Fix/test api key enforcement#13
apundhir merged 6 commits intomainfrom
fix/test-api-key-enforcement

apundhir commented Apr 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

apundhir commented Apr 11, 2026

Summary

Changes

Testing

Related Issues

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant