feat: add Gemini provider support#17
Conversation
GeminiClient (:generateContent) and GeminiEmbedder (:batchEmbedContents, requesting 1536 dims via output_dimensionality) talk to the Google AI Studio REST API directly via httpx, matching the Claude/OpenAI provider style. Both are registered in their package factories behind the existing LLMClient / EmbeddingProvider protocols and gated on GEMINI_API_KEY (falling back to GOOGLE_API_KEY). A neutral gemini_common.raise_for_gemini_error surfaces the status + Google error message + model on non-2xx without leaking the key, so embeddings never imports from llm/.
Header, request shape, response parsing, timeout forwarding, base-URL normalization, dimension/count validation, error-without-key-leak, and factory wiring (incl. GOOGLE_API_KEY fallback). No network: httpx.post is monkeypatched with a typed fake, keeping CI offline.
Resolve llm_model / embedding_model from the selected provider via new Settings.active_llm_model / active_embedding_model, so a Gemini (or fake) eval run no longer mislabels itself as claude-sonnet-4-6 / text-embedding-3-small in RESULTS.md (Golden Rule #5). No metric logic changes.
Add a gemini_api_key SSM SecureString (provisioned by default, consumed only when a provider is set to gemini), grant the task-execution role read on it, and add GEMINI_API_KEY to the backend task secrets. Provider/model env values become Terraform variables (validated; defaults preserve the Anthropic+OpenAI stack) so 'terraform apply -var=llm_provider=gemini ...' deploys the Gemini path. Docs and egress comments updated to name the external model APIs generically.
Add Gemini env options to .env.example (with gemini-2.5-flash fallback note), a README Google-only quickstart + Gemini eval recipe, Gemini notes in architecture/evaluation/demo, and the reseed warning that embeddings from different providers are not comparable. Record the feature in CLAUDE.md / PROGRESS.md / HANDOFF.md and a role-aware-embedding-prefix backlog item in MILESTONES.md (retrieval prefixes deliberately deferred).
|
Warning Review limit reached
More reviews will be available in 21 minutes and 16 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (7)
📝 WalkthroughWalkthroughThis PR adds Gemini as a first-class LLM and embeddings provider. The configuration schema gains Gemini fields and active model properties; implementations follow with HTTP clients for both services; comprehensive offline tests validate factory wiring and request/response handling; eval reporting becomes provider-agnostic; infrastructure provisions the secret and wires environment variables; and user documentation covers setup, quickstart paths, and evaluation workflows. ChangesGemini Provider Support
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 06ba64cde0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
eval/harness.py (1)
495-507:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd return type annotation.
The function is missing a return type annotation. Per coding guidelines, type hints must be implemented everywhere.
🔧 Proposed fix
-def _settings_summary(settings: Settings) -> dict[str, Any]: +def _settings_summary(settings: Settings) -> dict[str, Any]:Wait, I need to check if it's actually missing. Let me look at the code again...
Looking at line 495 in the annotated code:
495 def _settings_summary(settings: Settings) -> dict[str, Any]:Actually the return type IS present:
-> dict[str, Any]. I was confused. Let me skip this comment.Source: Coding guidelines
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: ddb44fa0-d008-4a02-bfa1-5211caaebf0b
📒 Files selected for processing (28)
.env.exampleCLAUDE.mdHANDOFF.mdMILESTONES.mdPROGRESS.mdREADME.mdbackend/app/config.pybackend/app/embeddings/__init__.pybackend/app/embeddings/gemini_provider.pybackend/app/gemini_common.pybackend/app/llm/__init__.pybackend/app/llm/gemini.pybackend/tests/test_eval_harness.pybackend/tests/test_gemini_embeddings.pybackend/tests/test_gemini_llm.pydocs/architecture.mddocs/demo.mddocs/evaluation.mdeval/harness.pyeval/results.pyeval/run.pyinfra/README.mdinfra/main.tfinfra/modules/ecs/main.tfinfra/modules/ecs/variables.tfinfra/modules/network/main.tfinfra/modules/secrets/main.tfinfra/modules/secrets/outputs.tf
📜 Review details
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Implement type hints everywhere in Python code.
mypymust be clean.Include docstrings on all public functions in Python code. Comments should explain why, not what.
Seed any randomness in Python code; pin temperatures for LLM calls used in evaluation. Deterministic-by-default.
Files:
backend/app/embeddings/gemini_provider.pyeval/run.pyeval/harness.pybackend/app/gemini_common.pyeval/results.pybackend/app/llm/__init__.pybackend/app/embeddings/__init__.pybackend/app/config.pybackend/tests/test_eval_harness.pybackend/app/llm/gemini.pybackend/tests/test_gemini_embeddings.pybackend/tests/test_gemini_llm.py
backend/app/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Enforce citation-or-refuse guardrail: never answer without a supporting retrieved source above threshold.
Enforce PII redaction: apply redaction before LLM calls and before storage in the backend.
Enforce confidence gating: low-confidence extractions route to human review; never auto-apply.
Audit everything: every model suggestion and every human decision must write an audit event.
Files:
backend/app/embeddings/gemini_provider.pybackend/app/gemini_common.pybackend/app/llm/__init__.pybackend/app/embeddings/__init__.pybackend/app/config.pybackend/app/llm/gemini.py
backend/tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Hard-test all deterministic logic: chunking, the workflow engine, guardrails, and the audit log.
Mock LLM and embeddings in tests via the
fakeimplementations; CI runs offline with no API keys.Include explicit determinism and idempotency tests for the workflow engine: same input produces same output; re-run produces no duplicate side effects.
Audit log must be append-only. Include a test that reconstructs current state by replaying events.
Files:
backend/tests/test_eval_harness.pybackend/tests/test_gemini_embeddings.pybackend/tests/test_gemini_llm.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: div0rce/sentinel
Timestamp: 2026-06-07T18:54:03.653Z
Learning: Never commit or push to `main`. All work happens on a feature branch enforced by a `no-commit-to-branch` pre-commit hook and GitHub branch protection.
Learnt from: CR
Repo: div0rce/sentinel
Timestamp: 2026-06-07T18:54:03.653Z
Learning: Use branch naming convention `feat/mNN-slug` where NN is the milestone number and slug is defined per milestone in `MILESTONES.md`.
Learnt from: CR
Repo: div0rce/sentinel
Timestamp: 2026-06-07T18:54:03.653Z
Learning: Use Conventional Commits on feature branches with prefixes: `feat:`, `fix:`, `test:`, `docs:`, `refactor:`, `chore:`, `ci:`.
Learnt from: CR
Repo: div0rce/sentinel
Timestamp: 2026-06-07T18:54:03.653Z
Learning: A milestone is DONE only when: every item in its Definition of Done is met, `make check` passes (lint + types + tests), and `PROGRESS.md` is updated.
Learnt from: CR
Repo: div0rce/sentinel
Timestamp: 2026-06-07T18:54:03.653Z
Learning: Never fabricate evaluation numbers. Metrics exist only after the evaluation harness runs against the labeled benchmark. Report real numbers; if not yet measured, write 'not measured'.
Learnt from: CR
Repo: div0rce/sentinel
Timestamp: 2026-06-07T18:54:03.653Z
Learning: No secrets in git. Use `.env` (gitignored) locally and CI / cloud secret stores otherwise.
Learnt from: CR
Repo: div0rce/sentinel
Timestamp: 2026-06-07T18:54:03.653Z
Learning: Keep small, reviewable diffs. If a milestone balloons past approximately 400 changed lines of real logic, split the PR and note the split in `MILESTONES.md`.
Learnt from: CR
Repo: div0rce/sentinel
Timestamp: 2026-06-07T18:54:03.653Z
Learning: Do not merge your own PRs. Opening the PR is the implementer's job; squash-merging is the human's gate.
Learnt from: CR
Repo: div0rce/sentinel
Timestamp: 2026-06-07T18:54:03.653Z
Learning: Never include dead code. No `TODO`s left in merged PRs; move them to the `MILESTONES.md` backlog.
🪛 Checkov (3.2.530)
infra/modules/secrets/main.tf
[high] 50-59: Ensure SSM parameters are using KMS CMK
(CKV_AWS_337)
🪛 dotenv-linter (4.0.0)
.env.example
[warning] 3-3: [ValueWithoutQuotes] This value needs to be surrounded in quotes
(ValueWithoutQuotes)
[warning] 4-4: [UnorderedKey] The CLAUDE_MODEL key should go before the DATABASE_URL key
(UnorderedKey)
[warning] 5-5: [ValueWithoutQuotes] This value needs to be surrounded in quotes
(ValueWithoutQuotes)
[warning] 6-6: [UnorderedKey] The LLM_MAX_TOKENS key should go before the LLM_PROVIDER key
(UnorderedKey)
[warning] 7-7: [UnorderedKey] The EMBEDDINGS_PROVIDER key should go before the LLM_MAX_TOKENS key
(UnorderedKey)
[warning] 7-7: [ValueWithoutQuotes] This value needs to be surrounded in quotes
(ValueWithoutQuotes)
[warning] 8-8: [UnorderedKey] The EMBEDDING_DIM key should go before the LLM_MAX_TOKENS key
(UnorderedKey)
[warning] 8-8: [ValueWithoutQuotes] This value needs to be surrounded in quotes
(ValueWithoutQuotes)
[warning] 17-17: [ValueWithoutQuotes] This value needs to be surrounded in quotes
(ValueWithoutQuotes)
[warning] 18-18: [UnorderedKey] The GEMINI_EMBEDDING_MODEL key should go before the GEMINI_MODEL key
(UnorderedKey)
[warning] 19-19: [UnorderedKey] The GEMINI_BASE_URL key should go before the GEMINI_EMBEDDING_MODEL key
(UnorderedKey)
🔇 Additional comments (55)
.env.example (1)
3-3: LGTM!Also applies to: 7-8, 13-20
README.md (1)
124-124: LGTM!Also applies to: 146-167, 209-220
HANDOFF.md (1)
44-47: LGTM!Also applies to: 344-356
CLAUDE.md (1)
56-60: LGTM!docs/architecture.md (1)
126-132: LGTM!Also applies to: 134-140, 421-421, 475-477
docs/demo.md (1)
22-22: LGTM!Also applies to: 26-29, 322-324
docs/evaluation.md (1)
47-52: LGTM!Also applies to: 170-170, 176-184
MILESTONES.md (1)
163-168: LGTM!PROGRESS.md (1)
31-31: LGTM!backend/app/config.py (6)
46-46: LGTM!
51-58: LGTM!
78-78: LGTM!
87-94: LGTM!
95-98: LGTM!
120-127: LGTM!backend/app/gemini_common.py (1)
14-39: LGTM!backend/app/llm/gemini.py (1)
43-45: LGTM!backend/app/llm/__init__.py (2)
8-8: LGTM!Also applies to: 19-19, 24-24
47-52: LGTM!backend/app/embeddings/gemini_provider.py (1)
52-54: LGTM!backend/app/embeddings/__init__.py (2)
8-8: LGTM!Also applies to: 19-19, 26-26
58-64: LGTM!backend/tests/test_gemini_llm.py (11)
1-6: LGTM!
7-16: LGTM!
18-31: LGTM!
33-57: LGTM!
62-65: LGTM!
67-71: LGTM!
73-81: LGTM!
83-86: LGTM!
91-107: LGTM!
109-114: LGTM!
116-183: LGTM!backend/tests/test_gemini_embeddings.py (8)
1-17: LGTM!
19-55: LGTM!
60-90: LGTM!
95-98: LGTM!
100-120: LGTM!
122-137: LGTM!
139-151: LGTM!
153-175: LGTM!backend/tests/test_eval_harness.py (1)
524-541: LGTM!eval/harness.py (1)
495-507: LGTM!eval/results.py (1)
35-39: LGTM!eval/run.py (1)
40-46: LGTM!infra/modules/secrets/main.tf (1)
4-9: LGTM!infra/modules/secrets/outputs.tf (1)
9-11: LGTM!infra/modules/ecs/variables.tf (1)
66-123: LGTM!infra/modules/ecs/main.tf (4)
61-61: LGTM!
214-220: LGTM!
229-229: LGTM!
301-301: LGTM!infra/main.tf (1)
79-79: LGTM!infra/modules/network/main.tf (1)
60-61: LGTM!infra/README.md (1)
23-24: LGTM!Also applies to: 52-53, 96-96, 157-165, 175-191
| @property | ||
| def active_llm_model(self) -> str: | ||
| """Model id of the *currently selected* LLM provider. | ||
|
|
||
| Used by the eval harness so RESULTS.md reports the model that actually ran | ||
| rather than always labelling it with ``claude_model`` (Golden Rule #5). | ||
| """ | ||
| if self.llm_provider == "anthropic": | ||
| return self.claude_model | ||
| if self.llm_provider == "gemini": | ||
| return self.gemini_model | ||
| return "fake-llm" | ||
|
|
||
| @property | ||
| def active_embedding_model(self) -> str: | ||
| """Embedding model id of the *currently selected* embeddings provider.""" | ||
| if self.embeddings_provider == "openai": | ||
| return self.openai_embedding_model | ||
| if self.embeddings_provider == "gemini": | ||
| return self.gemini_embedding_model | ||
| # 'voyage' has no model field yet (provider unimplemented) and 'fake' is | ||
| # non-semantic; fall back to the provider name so the label is never wrong. | ||
| return self.embeddings_provider |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | 💤 Low value
Consider caching the active model properties.
The AI summary describes these as "cached-by-instance properties," but they lack a @cached_property decorator. While the simple if/else logic is fast, adding @functools.cached_property would align with the description and prevent redundant evaluations if these properties are called multiple times within a request lifecycle.
♻️ Proposed refactor
Import cached_property at the top:
-from functools import lru_cache
+from functools import cached_property, lru_cacheThen decorate the properties:
- `@property`
+ `@cached_property`
def active_llm_model(self) -> str:- `@property`
+ `@cached_property`
def active_embedding_model(self) -> str:| # Provisioned by default but only consumed when llm_provider or embeddings_provider | ||
| # is set to "gemini". Overwrite out-of-band like the other keys. | ||
| resource "aws_ssm_parameter" "gemini_api_key" { | ||
| name = "${local.prefix}/gemini_api_key" | ||
| description = "Gemini API key consumed by the backend at task start. Overwrite out-of-band." | ||
| type = "SecureString" | ||
| value = "REPLACE_ME" | ||
|
|
||
| lifecycle { | ||
| ignore_changes = [value] | ||
| } | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | ⚖️ Poor tradeoff
Consider customer-managed KMS keys for production deployments.
The Gemini API key parameter uses the AWS-managed alias/aws/ssm key, consistent with the existing Anthropic and OpenAI parameters. For the stated demo-only scope, this is acceptable. However, if this infrastructure is ever lifted to production, consider migrating all three API key parameters to customer-managed KMS keys (CMKs) for:
- Granular key rotation policies
- Tighter IAM access controls
- Enhanced audit trails via CloudTrail key usage events
This would require creating a CMK, updating the kms_key_id argument on all three aws_ssm_parameter resources, and adjusting the task execution policy's KMS decrypt condition.
🧰 Tools
🪛 Checkov (3.2.530)
[high] 50-59: Ensure SSM parameters are using KMS CMK
(CKV_AWS_337)
Source: Linters/SAST tools
The documented 'terraform apply -var=llm_provider=gemini ...' recipe referenced variables that existed only in modules/ecs; root -var values must be declared in the root module. Declare llm_provider, embeddings_provider, embedding_dim, gemini_model, and gemini_embedding_model at the root (validated; defaults preserve the Anthropic+OpenAI stack) and thread them into module "ecs". Addresses PR review (chatgpt-codex).
…allback Per house style (docstrings on public functions), document GeminiClient.__init__/ complete and GeminiEmbedder.__init__/embed (params, base-url normalisation, empty handling, error contract). Also document gemini-embedding-001 as a fallback when gemini-embedding-2 is unavailable. Addresses PR review (CodeRabbit).
Move generateContent/batchEmbedContents response parsing into module-private _parse_generate_content / _parse_batch_embeddings helpers. Separates the HTTP call from response handling, clears the CodeScene 'Complex Method' findings on GeminiClient.complete and GeminiEmbedder.embed, and lifts llm/gemini.py to a 10.0 Code Health score. No behaviour change (the 21 mocked Gemini tests pass).
…P base GeminiEmbedder keeps the 5-arg keyword-only constructor that mirrors OpenAIEmbedder and ClaudeClient. Document why at the flag site and record the real cross-cutting fix (a shared provider HTTP base) as a MILESTONES.md backlog item rather than diverging one provider from its siblings to satisfy a 4-arg heuristic.
Milestone
Net-new feature (outside the M0–M11 roadmap) — Google AI Studio / Gemini provider support
Summary
Adds first-class Gemini support for both the LLM (
GeminiClient,:generateContent) and embeddings (GeminiEmbedder,:batchEmbedContents, requesting 1536 dims via the RESToutput_dimensionalityfield), so the whole RAG + extraction + eval stack can run on a single free Google AI Studio key — no Anthropic/OpenAI key required. Both providers follow the repo's existing pattern: a small class behind theLLMClient/EmbeddingProviderprotocol, registered in the package factory, talking to the REST API directly viahttpx(no vendor SDK).fakestays the CI/test default; no live API calls in CI. Optionally wired through Terraform/ECS so the deployed demo can also run on Gemini.Definition of Done
make checkpasses — 222 backend pytest (+21 Gemini), 7 frontend Vitest + build,ruff/ruff-format/mypyclean,terraform fmt -check+validatecleanhttpxmocked): factory wiring (incl.GOOGLE_API_KEYfallback), header, request shape, response parsing, timeout forwarding, base-URL normalization, dimension/count validation, error-without-key-leakSettings.active_llm_model/active_embedding_model) so a Gemini run no longer mislabels itself as Claude/OpenAI inRESULTS.mdEnv vars added
GEMINI_API_KEY(+GOOGLE_API_KEYfallback),GEMINI_MODEL(defaultgemini-3.5-flash;gemini-2.5-flashdocumented as a fallback),GEMINI_EMBEDDING_MODEL(defaultgemini-embedding-2),GEMINI_BASE_URL. Provider literals widened:LLM_PROVIDER ∈ {anthropic, gemini, fake},EMBEDDINGS_PROVIDER ∈ {openai, voyage, gemini, fake}. Committed defaults stayanthropic/openai.Notes / decisions
output_dimensionalityin the REST body (the JS SDK uses camelCase). A key-gated, uncommitted real-provider smoke is documented to settle it empirically — inspect the live error/docs before flipping casing.gemini-embedding-2recommends instruction-prefixing queries vs. documents, butEmbeddingProvider.embed(texts)is role-agnostic and shared by ingest + query paths. Adding prefixes cleanly needs a role parameter threaded through every provider — deferred to aMILESTONES.mdbacklog item rather than silently changing retrieval semantics for OpenAI/fake.gemini_api_keySSM parameter is provisioned by default but only consumed whenllm_provider/embeddings_provideris set togemini; defaults preserve today's Anthropic+OpenAI deployment.EMBEDDINGS_PROVIDERrequires a re-seed (documented in README/architecture).