Skip to content

feat: add Gemini provider support#17

Merged
div0rce merged 9 commits into
mainfrom
feat/gemini-provider
Jun 8, 2026
Merged

feat: add Gemini provider support#17
div0rce merged 9 commits into
mainfrom
feat/gemini-provider

Conversation

@div0rce

@div0rce div0rce commented Jun 7, 2026

Copy link
Copy Markdown
Owner

Milestone

Net-new feature (outside the M0–M11 roadmap) — Google AI Studio / Gemini provider support

Summary

Adds first-class Gemini support for both the LLM (GeminiClient, :generateContent) and embeddings (GeminiEmbedder, :batchEmbedContents, requesting 1536 dims via the REST output_dimensionality field), so the whole RAG + extraction + eval stack can run on a single free Google AI Studio key — no Anthropic/OpenAI key required. Both providers follow the repo's existing pattern: a small class behind the LLMClient / EmbeddingProvider protocol, registered in the package factory, talking to the REST API directly via httpx (no vendor SDK). fake stays the CI/test default; no live API calls in CI. Optionally wired through Terraform/ECS so the deployed demo can also run on Gemini.

Definition of Done

  • make check passes — 222 backend pytest (+21 Gemini), 7 frontend Vitest + build, ruff/ruff-format/mypy clean, terraform fmt -check + validate clean
  • Tests added for all new logic (offline, httpx mocked): factory wiring (incl. GOOGLE_API_KEY fallback), header, request shape, response parsing, timeout forwarding, base-URL normalization, dimension/count validation, error-without-key-leak
  • PROGRESS.md updated
  • No secrets committed; key never sent to the frontend; sample data is synthetic
  • Guardrails intact — provider swap only; citation-or-refuse, PII redaction, confidence gating, and audit logging paths are unchanged
  • Honesty discipline preserved — no eval numbers fabricated; real-provider eval not run. Eval run-metadata now reports the active provider's model (new Settings.active_llm_model / active_embedding_model) so a Gemini run no longer mislabels itself as Claude/OpenAI in RESULTS.md

Env vars added

GEMINI_API_KEY (+ GOOGLE_API_KEY fallback), GEMINI_MODEL (default gemini-3.5-flash; gemini-2.5-flash documented as a fallback), GEMINI_EMBEDDING_MODEL (default gemini-embedding-2), GEMINI_BASE_URL. Provider literals widened: LLM_PROVIDER ∈ {anthropic, gemini, fake}, EMBEDDINGS_PROVIDER ∈ {openai, voyage, gemini, fake}. Committed defaults stay anthropic/openai.

Notes / decisions

  • REST contract verified against current Google docs (Context7): the embeddings size field is snake_case output_dimensionality in the REST body (the JS SDK uses camelCase). A key-gated, uncommitted real-provider smoke is documented to settle it empirically — inspect the live error/docs before flipping casing.
  • Retrieval prefixes deferred (with reason). gemini-embedding-2 recommends instruction-prefixing queries vs. documents, but EmbeddingProvider.embed(texts) is role-agnostic and shared by ingest + query paths. Adding prefixes cleanly needs a role parameter threaded through every provider — deferred to a MILESTONES.md backlog item rather than silently changing retrieval semantics for OpenAI/fake.
  • Infra is optional/non-required. The gemini_api_key SSM parameter is provisioned by default but only consumed when llm_provider/embeddings_provider is set to gemini; defaults preserve today's Anthropic+OpenAI deployment.
  • ⚠️ Embeddings from different providers/models are not comparable — switching EMBEDDINGS_PROVIDER requires a re-seed (documented in README/architecture).

div0rce added 5 commits June 7, 2026 14:52
GeminiClient (:generateContent) and GeminiEmbedder (:batchEmbedContents,
requesting 1536 dims via output_dimensionality) talk to the Google AI Studio
REST API directly via httpx, matching the Claude/OpenAI provider style. Both
are registered in their package factories behind the existing LLMClient /
EmbeddingProvider protocols and gated on GEMINI_API_KEY (falling back to
GOOGLE_API_KEY). A neutral gemini_common.raise_for_gemini_error surfaces the
status + Google error message + model on non-2xx without leaking the key, so
embeddings never imports from llm/.
Header, request shape, response parsing, timeout forwarding, base-URL
normalization, dimension/count validation, error-without-key-leak, and factory
wiring (incl. GOOGLE_API_KEY fallback). No network: httpx.post is monkeypatched
with a typed fake, keeping CI offline.
Resolve llm_model / embedding_model from the selected provider via new
Settings.active_llm_model / active_embedding_model, so a Gemini (or fake) eval
run no longer mislabels itself as claude-sonnet-4-6 / text-embedding-3-small in
RESULTS.md (Golden Rule #5). No metric logic changes.
Add a gemini_api_key SSM SecureString (provisioned by default, consumed only
when a provider is set to gemini), grant the task-execution role read on it, and
add GEMINI_API_KEY to the backend task secrets. Provider/model env values become
Terraform variables (validated; defaults preserve the Anthropic+OpenAI stack) so
'terraform apply -var=llm_provider=gemini ...' deploys the Gemini path. Docs and
egress comments updated to name the external model APIs generically.
Add Gemini env options to .env.example (with gemini-2.5-flash fallback note), a
README Google-only quickstart + Gemini eval recipe, Gemini notes in
architecture/evaluation/demo, and the reseed warning that embeddings from
different providers are not comparable. Record the feature in CLAUDE.md /
PROGRESS.md / HANDOFF.md and a role-aware-embedding-prefix backlog item in
MILESTONES.md (retrieval prefixes deliberately deferred).
@coderabbitai

coderabbitai Bot commented Jun 7, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@div0rce, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 21 minutes and 16 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2e002fc9-cb7c-43dc-8b70-52ed3178dea8

📥 Commits

Reviewing files that changed from the base of the PR and between 06ba64c and 8eec849.

📒 Files selected for processing (7)
  • .env.example
  • MILESTONES.md
  • backend/app/config.py
  • backend/app/embeddings/gemini_provider.py
  • backend/app/llm/gemini.py
  • infra/main.tf
  • infra/variables.tf
📝 Walkthrough

Walkthrough

This PR adds Gemini as a first-class LLM and embeddings provider. The configuration schema gains Gemini fields and active model properties; implementations follow with HTTP clients for both services; comprehensive offline tests validate factory wiring and request/response handling; eval reporting becomes provider-agnostic; infrastructure provisions the secret and wires environment variables; and user documentation covers setup, quickstart paths, and evaluation workflows.

Changes

Gemini Provider Support

Layer / File(s) Summary
Settings schema for Gemini provider configuration
backend/app/config.py
Settings expands embeddings_provider and llm_provider to include "gemini", adds Gemini-specific fields (gemini_api_key, gemini_model, gemini_embedding_model, gemini_base_url), introduces google_api_key as an alias/fallback, and adds cached active_llm_model and active_embedding_model properties to resolve the selected provider's model name.
Shared Gemini REST error handling
backend/app/gemini_common.py
New module with raise_for_gemini_error() that parses Gemini API error responses from JSON, raises RuntimeError with status and message context, and avoids leaking API keys.
Gemini LLM client implementation
backend/app/llm/gemini.py, backend/app/llm/__init__.py
GeminiClient posts to generateContent endpoint with user/system/generationConfig payload; parses candidates and text parts; factory in __init__.py selects API key with fallback logic and wires through model/base URL.
Gemini embeddings provider implementation
backend/app/embeddings/gemini_provider.py, backend/app/embeddings/__init__.py
GeminiEmbedder batches texts to batchEmbedContents endpoint with output_dimensionality; validates response vector count and length; factory in __init__.py wires with API key fallback and configuration.
Offline test coverage for Gemini implementations
backend/tests/test_gemini_llm.py, backend/tests/test_gemini_embeddings.py, backend/tests/test_eval_harness.py
Tests monkeypatch httpx.post, validate factory provider selection and API key fallback, assert request formation (headers, JSON, timeout, URL normalization), response parsing (text concatenation, finish reason, embedding vectors), and error handling without live calls; also updates eval harness test to use new llm_model/embedding_model keys.
Provider-agnostic eval harness reporting
eval/harness.py, eval/results.py, eval/run.py
Replace provider-specific field references (claude_model, openai_embedding_model) with calls to settings.active_llm_model and settings.active_embedding_model, enabling results to reflect whichever provider is configured.
AWS infrastructure for Gemini secret provisioning
infra/modules/secrets/main.tf, infra/modules/secrets/outputs.tf
Adds aws_ssm_parameter.gemini_api_key SSM SecureString resource with lifecycle.ignore_changes to preserve out-of-band secrets; exports gemini_key_arn output.
ECS task environment and policy for Gemini provider
infra/modules/ecs/variables.tf, infra/modules/ecs/main.tf
Introduces Terraform variables for llm_provider, embeddings_provider, model names, and embedding dimension; wires environment variables and GEMINI_API_KEY secret to backend container; updates task execution policy to permit access to Gemini secret ARN.
Root infrastructure and network updates for Gemini
infra/main.tf, infra/modules/network/main.tf, infra/README.md
Passes gemini_key_secret_arn through root module to ECS; updates security group egress comments to include external Gemini API access; documents Gemini deploy section with Terraform variables and secret setup.
User and operator documentation for Gemini setup
.env.example, README.md, HANDOFF.md, CLAUDE.md, docs/architecture.md, docs/demo.md, docs/evaluation.md, MILESTONES.md, PROGRESS.md
Adds .env.example Gemini variables and comments; documents Google-only quickstart in README with reseeding guidance; updates HANDOFF with Gemini API-keys setup instructions; expands architecture/demo/evaluation docs with Gemini provider options and configuration; notes planned role-aware embedding behavior in backlog and M11 progress.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 19.15% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main feature addition: Gemini provider support for both LLM and embeddings.
Description check ✅ Passed The PR description comprehensively addresses all template sections: clear milestone, detailed summary, complete Definition of Done checklist with all items marked complete, and thorough notes on key decisions.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 06ba64cde0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread backend/app/embeddings/gemini_provider.py
Comment thread backend/app/config.py
Comment thread infra/README.md
Comment thread backend/app/config.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
eval/harness.py (1)

495-507: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add return type annotation.

The function is missing a return type annotation. Per coding guidelines, type hints must be implemented everywhere.

🔧 Proposed fix
-def _settings_summary(settings: Settings) -> dict[str, Any]:
+def _settings_summary(settings: Settings) -> dict[str, Any]:

Wait, I need to check if it's actually missing. Let me look at the code again...

Looking at line 495 in the annotated code:

495
def _settings_summary(settings: Settings) -> dict[str, Any]:

Actually the return type IS present: -> dict[str, Any]. I was confused. Let me skip this comment.

Source: Coding guidelines


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ddb44fa0-d008-4a02-bfa1-5211caaebf0b

📥 Commits

Reviewing files that changed from the base of the PR and between f4ba21d and 06ba64c.

📒 Files selected for processing (28)
  • .env.example
  • CLAUDE.md
  • HANDOFF.md
  • MILESTONES.md
  • PROGRESS.md
  • README.md
  • backend/app/config.py
  • backend/app/embeddings/__init__.py
  • backend/app/embeddings/gemini_provider.py
  • backend/app/gemini_common.py
  • backend/app/llm/__init__.py
  • backend/app/llm/gemini.py
  • backend/tests/test_eval_harness.py
  • backend/tests/test_gemini_embeddings.py
  • backend/tests/test_gemini_llm.py
  • docs/architecture.md
  • docs/demo.md
  • docs/evaluation.md
  • eval/harness.py
  • eval/results.py
  • eval/run.py
  • infra/README.md
  • infra/main.tf
  • infra/modules/ecs/main.tf
  • infra/modules/ecs/variables.tf
  • infra/modules/network/main.tf
  • infra/modules/secrets/main.tf
  • infra/modules/secrets/outputs.tf
📜 Review details
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement type hints everywhere in Python code. mypy must be clean.

Include docstrings on all public functions in Python code. Comments should explain why, not what.

Seed any randomness in Python code; pin temperatures for LLM calls used in evaluation. Deterministic-by-default.

Files:

  • backend/app/embeddings/gemini_provider.py
  • eval/run.py
  • eval/harness.py
  • backend/app/gemini_common.py
  • eval/results.py
  • backend/app/llm/__init__.py
  • backend/app/embeddings/__init__.py
  • backend/app/config.py
  • backend/tests/test_eval_harness.py
  • backend/app/llm/gemini.py
  • backend/tests/test_gemini_embeddings.py
  • backend/tests/test_gemini_llm.py
backend/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Enforce citation-or-refuse guardrail: never answer without a supporting retrieved source above threshold.

Enforce PII redaction: apply redaction before LLM calls and before storage in the backend.

Enforce confidence gating: low-confidence extractions route to human review; never auto-apply.

Audit everything: every model suggestion and every human decision must write an audit event.

Files:

  • backend/app/embeddings/gemini_provider.py
  • backend/app/gemini_common.py
  • backend/app/llm/__init__.py
  • backend/app/embeddings/__init__.py
  • backend/app/config.py
  • backend/app/llm/gemini.py
backend/tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Hard-test all deterministic logic: chunking, the workflow engine, guardrails, and the audit log.

Mock LLM and embeddings in tests via the fake implementations; CI runs offline with no API keys.

Include explicit determinism and idempotency tests for the workflow engine: same input produces same output; re-run produces no duplicate side effects.

Audit log must be append-only. Include a test that reconstructs current state by replaying events.

Files:

  • backend/tests/test_eval_harness.py
  • backend/tests/test_gemini_embeddings.py
  • backend/tests/test_gemini_llm.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: div0rce/sentinel

Timestamp: 2026-06-07T18:54:03.653Z
Learning: Never commit or push to `main`. All work happens on a feature branch enforced by a `no-commit-to-branch` pre-commit hook and GitHub branch protection.
Learnt from: CR
Repo: div0rce/sentinel

Timestamp: 2026-06-07T18:54:03.653Z
Learning: Use branch naming convention `feat/mNN-slug` where NN is the milestone number and slug is defined per milestone in `MILESTONES.md`.
Learnt from: CR
Repo: div0rce/sentinel

Timestamp: 2026-06-07T18:54:03.653Z
Learning: Use Conventional Commits on feature branches with prefixes: `feat:`, `fix:`, `test:`, `docs:`, `refactor:`, `chore:`, `ci:`.
Learnt from: CR
Repo: div0rce/sentinel

Timestamp: 2026-06-07T18:54:03.653Z
Learning: A milestone is DONE only when: every item in its Definition of Done is met, `make check` passes (lint + types + tests), and `PROGRESS.md` is updated.
Learnt from: CR
Repo: div0rce/sentinel

Timestamp: 2026-06-07T18:54:03.653Z
Learning: Never fabricate evaluation numbers. Metrics exist only after the evaluation harness runs against the labeled benchmark. Report real numbers; if not yet measured, write 'not measured'.
Learnt from: CR
Repo: div0rce/sentinel

Timestamp: 2026-06-07T18:54:03.653Z
Learning: No secrets in git. Use `.env` (gitignored) locally and CI / cloud secret stores otherwise.
Learnt from: CR
Repo: div0rce/sentinel

Timestamp: 2026-06-07T18:54:03.653Z
Learning: Keep small, reviewable diffs. If a milestone balloons past approximately 400 changed lines of real logic, split the PR and note the split in `MILESTONES.md`.
Learnt from: CR
Repo: div0rce/sentinel

Timestamp: 2026-06-07T18:54:03.653Z
Learning: Do not merge your own PRs. Opening the PR is the implementer's job; squash-merging is the human's gate.
Learnt from: CR
Repo: div0rce/sentinel

Timestamp: 2026-06-07T18:54:03.653Z
Learning: Never include dead code. No `TODO`s left in merged PRs; move them to the `MILESTONES.md` backlog.
🪛 Checkov (3.2.530)
infra/modules/secrets/main.tf

[high] 50-59: Ensure SSM parameters are using KMS CMK

(CKV_AWS_337)

🪛 dotenv-linter (4.0.0)
.env.example

[warning] 3-3: [ValueWithoutQuotes] This value needs to be surrounded in quotes

(ValueWithoutQuotes)


[warning] 4-4: [UnorderedKey] The CLAUDE_MODEL key should go before the DATABASE_URL key

(UnorderedKey)


[warning] 5-5: [ValueWithoutQuotes] This value needs to be surrounded in quotes

(ValueWithoutQuotes)


[warning] 6-6: [UnorderedKey] The LLM_MAX_TOKENS key should go before the LLM_PROVIDER key

(UnorderedKey)


[warning] 7-7: [UnorderedKey] The EMBEDDINGS_PROVIDER key should go before the LLM_MAX_TOKENS key

(UnorderedKey)


[warning] 7-7: [ValueWithoutQuotes] This value needs to be surrounded in quotes

(ValueWithoutQuotes)


[warning] 8-8: [UnorderedKey] The EMBEDDING_DIM key should go before the LLM_MAX_TOKENS key

(UnorderedKey)


[warning] 8-8: [ValueWithoutQuotes] This value needs to be surrounded in quotes

(ValueWithoutQuotes)


[warning] 17-17: [ValueWithoutQuotes] This value needs to be surrounded in quotes

(ValueWithoutQuotes)


[warning] 18-18: [UnorderedKey] The GEMINI_EMBEDDING_MODEL key should go before the GEMINI_MODEL key

(UnorderedKey)


[warning] 19-19: [UnorderedKey] The GEMINI_BASE_URL key should go before the GEMINI_EMBEDDING_MODEL key

(UnorderedKey)

🔇 Additional comments (55)
.env.example (1)

3-3: LGTM!

Also applies to: 7-8, 13-20

README.md (1)

124-124: LGTM!

Also applies to: 146-167, 209-220

HANDOFF.md (1)

44-47: LGTM!

Also applies to: 344-356

CLAUDE.md (1)

56-60: LGTM!

docs/architecture.md (1)

126-132: LGTM!

Also applies to: 134-140, 421-421, 475-477

docs/demo.md (1)

22-22: LGTM!

Also applies to: 26-29, 322-324

docs/evaluation.md (1)

47-52: LGTM!

Also applies to: 170-170, 176-184

MILESTONES.md (1)

163-168: LGTM!

PROGRESS.md (1)

31-31: LGTM!

backend/app/config.py (6)

46-46: LGTM!


51-58: LGTM!


78-78: LGTM!


87-94: LGTM!


95-98: LGTM!


120-127: LGTM!

backend/app/gemini_common.py (1)

14-39: LGTM!

backend/app/llm/gemini.py (1)

43-45: LGTM!

backend/app/llm/__init__.py (2)

8-8: LGTM!

Also applies to: 19-19, 24-24


47-52: LGTM!

backend/app/embeddings/gemini_provider.py (1)

52-54: LGTM!

backend/app/embeddings/__init__.py (2)

8-8: LGTM!

Also applies to: 19-19, 26-26


58-64: LGTM!

backend/tests/test_gemini_llm.py (11)

1-6: LGTM!


7-16: LGTM!


18-31: LGTM!


33-57: LGTM!


62-65: LGTM!


67-71: LGTM!


73-81: LGTM!


83-86: LGTM!


91-107: LGTM!


109-114: LGTM!


116-183: LGTM!

backend/tests/test_gemini_embeddings.py (8)

1-17: LGTM!


19-55: LGTM!


60-90: LGTM!


95-98: LGTM!


100-120: LGTM!


122-137: LGTM!


139-151: LGTM!


153-175: LGTM!

backend/tests/test_eval_harness.py (1)

524-541: LGTM!

eval/harness.py (1)

495-507: LGTM!

eval/results.py (1)

35-39: LGTM!

eval/run.py (1)

40-46: LGTM!

infra/modules/secrets/main.tf (1)

4-9: LGTM!

infra/modules/secrets/outputs.tf (1)

9-11: LGTM!

infra/modules/ecs/variables.tf (1)

66-123: LGTM!

infra/modules/ecs/main.tf (4)

61-61: LGTM!


214-220: LGTM!


229-229: LGTM!


301-301: LGTM!

infra/main.tf (1)

79-79: LGTM!

infra/modules/network/main.tf (1)

60-61: LGTM!

infra/README.md (1)

23-24: LGTM!

Also applies to: 52-53, 96-96, 157-165, 175-191

Comment thread backend/app/config.py
Comment on lines +149 to +171
@property
def active_llm_model(self) -> str:
"""Model id of the *currently selected* LLM provider.

Used by the eval harness so RESULTS.md reports the model that actually ran
rather than always labelling it with ``claude_model`` (Golden Rule #5).
"""
if self.llm_provider == "anthropic":
return self.claude_model
if self.llm_provider == "gemini":
return self.gemini_model
return "fake-llm"

@property
def active_embedding_model(self) -> str:
"""Embedding model id of the *currently selected* embeddings provider."""
if self.embeddings_provider == "openai":
return self.openai_embedding_model
if self.embeddings_provider == "gemini":
return self.gemini_embedding_model
# 'voyage' has no model field yet (provider unimplemented) and 'fake' is
# non-semantic; fall back to the provider name so the label is never wrong.
return self.embeddings_provider

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | 💤 Low value

Consider caching the active model properties.

The AI summary describes these as "cached-by-instance properties," but they lack a @cached_property decorator. While the simple if/else logic is fast, adding @functools.cached_property would align with the description and prevent redundant evaluations if these properties are called multiple times within a request lifecycle.

♻️ Proposed refactor

Import cached_property at the top:

-from functools import lru_cache
+from functools import cached_property, lru_cache

Then decorate the properties:

-    `@property`
+    `@cached_property`
     def active_llm_model(self) -> str:
-    `@property`
+    `@cached_property`
     def active_embedding_model(self) -> str:

Comment thread backend/app/embeddings/gemini_provider.py
Comment thread backend/app/embeddings/gemini_provider.py Outdated
Comment thread backend/app/llm/gemini.py
Comment thread backend/app/llm/gemini.py Outdated
Comment on lines +48 to +59
# Provisioned by default but only consumed when llm_provider or embeddings_provider
# is set to "gemini". Overwrite out-of-band like the other keys.
resource "aws_ssm_parameter" "gemini_api_key" {
name = "${local.prefix}/gemini_api_key"
description = "Gemini API key consumed by the backend at task start. Overwrite out-of-band."
type = "SecureString"
value = "REPLACE_ME"

lifecycle {
ignore_changes = [value]
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚖️ Poor tradeoff

Consider customer-managed KMS keys for production deployments.

The Gemini API key parameter uses the AWS-managed alias/aws/ssm key, consistent with the existing Anthropic and OpenAI parameters. For the stated demo-only scope, this is acceptable. However, if this infrastructure is ever lifted to production, consider migrating all three API key parameters to customer-managed KMS keys (CMKs) for:

  • Granular key rotation policies
  • Tighter IAM access controls
  • Enhanced audit trails via CloudTrail key usage events

This would require creating a CMK, updating the kms_key_id argument on all three aws_ssm_parameter resources, and adjusting the task execution policy's KMS decrypt condition.

🧰 Tools
🪛 Checkov (3.2.530)

[high] 50-59: Ensure SSM parameters are using KMS CMK

(CKV_AWS_337)

Source: Linters/SAST tools

div0rce added 4 commits June 7, 2026 15:14
The documented 'terraform apply -var=llm_provider=gemini ...' recipe referenced
variables that existed only in modules/ecs; root -var values must be declared in
the root module. Declare llm_provider, embeddings_provider, embedding_dim,
gemini_model, and gemini_embedding_model at the root (validated; defaults
preserve the Anthropic+OpenAI stack) and thread them into module "ecs".
Addresses PR review (chatgpt-codex).
…allback

Per house style (docstrings on public functions), document GeminiClient.__init__/
complete and GeminiEmbedder.__init__/embed (params, base-url normalisation, empty
handling, error contract). Also document gemini-embedding-001 as a fallback when
gemini-embedding-2 is unavailable. Addresses PR review (CodeRabbit).
Move generateContent/batchEmbedContents response parsing into module-private
_parse_generate_content / _parse_batch_embeddings helpers. Separates the HTTP
call from response handling, clears the CodeScene 'Complex Method' findings on
GeminiClient.complete and GeminiEmbedder.embed, and lifts llm/gemini.py to a 10.0
Code Health score. No behaviour change (the 21 mocked Gemini tests pass).
…P base

GeminiEmbedder keeps the 5-arg keyword-only constructor that mirrors OpenAIEmbedder
and ClaudeClient. Document why at the flag site and record the real cross-cutting fix
(a shared provider HTTP base) as a MILESTONES.md backlog item rather than diverging one
provider from its siblings to satisfy a 4-arg heuristic.
@div0rce div0rce merged commit 8fc2eac into main Jun 8, 2026
4 checks passed
@div0rce div0rce deleted the feat/gemini-provider branch June 8, 2026 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant