Skip to content
Merged
15 changes: 12 additions & 3 deletions .env.example
Original file line number Diff line number Diff line change
@@ -1,14 +1,23 @@
DATABASE_URL=postgresql+psycopg://sentinel:sentinel@localhost:5432/sentinel
ANTHROPIC_API_KEY=
LLM_PROVIDER=anthropic # one of: anthropic, fake (tests/CI use 'fake')
LLM_PROVIDER=anthropic # one of: anthropic, gemini, fake (tests/CI use 'fake')
CLAUDE_MODEL=claude-sonnet-4-6
LLM_TEMPERATURE=0.0 # pin to 0.0 for determinism in eval (M9)
LLM_MAX_TOKENS=1024
EMBEDDINGS_PROVIDER=openai # one of: openai, voyage, fake (tests/CI use 'fake')
EMBEDDING_DIM=1536 # 1536 = text-embedding-3-small; 1024 = voyage-3-lite
EMBEDDINGS_PROVIDER=openai # one of: openai, voyage, gemini, fake (tests/CI use 'fake')
EMBEDDING_DIM=1536 # 1536 = text-embedding-3-small / gemini-embedding-2; 1024 = voyage-3-lite
OPENAI_API_KEY=
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
VOYAGE_API_KEY=

# Google AI Studio / Gemini (free key path). Set LLM_PROVIDER=gemini and/or
# EMBEDDINGS_PROVIDER=gemini above to run the stack on a single free Google key.
GEMINI_API_KEY=
# GOOGLE_API_KEY= # fallback if GEMINI_API_KEY is unset
GEMINI_MODEL=gemini-3.5-flash # fallback: gemini-2.5-flash if 3.5 is unavailable to your account/region
GEMINI_EMBEDDING_MODEL=gemini-embedding-2 # fallback: gemini-embedding-001 if -2 is unavailable
GEMINI_BASE_URL=https://generativelanguage.googleapis.com/v1beta

CHUNK_SIZE_TOKENS=512
CHUNK_OVERLAP_TOKENS=64
RETRIEVAL_TOP_K=5
Expand Down
8 changes: 5 additions & 3 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,9 +53,11 @@ This is a portfolio project demonstrating enterprise-grade, auditable AI for reg

- **Backend:** Python 3.12, FastAPI, Pydantic v2, SQLAlchemy 2.x, Alembic
- **DB:** PostgreSQL 16 + `pgvector`
- **AI:** Anthropic Claude API for generation/extraction; embeddings via a hosted provider
(`text-embedding-3-small` or `voyage-3-lite`) **behind an interface** in `backend/app/llm/` and
`backend/app/embeddings/` so both are swappable and **mocked in tests** (no live API calls in CI).
- **AI:** hosted LLM for generation/extraction (Anthropic Claude **or** Google Gemini, via
`LLM_PROVIDER`); embeddings via a hosted provider (`text-embedding-3-small`, `gemini-embedding-2`,
or `voyage-3-lite`, via `EMBEDDINGS_PROVIDER`) — all **behind an interface** in `backend/app/llm/`
and `backend/app/embeddings/` so both are swappable and **mocked in tests** (no live API calls in
CI). A single free Google AI Studio key can drive both LLM and embeddings.
- **Frontend:** React + TypeScript (Vite), Recharts
- **Infra:** Docker + docker-compose (dev); Terraform → AWS ECS Fargate + RDS (M10)
- **CI/CD:** GitHub Actions
Expand Down
13 changes: 10 additions & 3 deletions HANDOFF.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,10 @@ draft the case-study writeup or polish the résumé — but the engineering is C
-F enforce_admins=false -F required_status_checks=null -F restrictions=null
```
(Or do it in GitHub → Settings → Branches → add rule on `main`: "Require a pull request before merging".)
6. **API keys:** have an Anthropic API key (the app's LLM) and an embeddings key (OpenAI or Voyage). You'll
put them in `.env` (gitignored) during M2/M3. **CI needs none** — tests mock both providers.
6. **API keys:** have an Anthropic API key (the app's LLM) and an embeddings key (OpenAI or Voyage) —
**or** a single free Google AI Studio key, which drives both the LLM and embeddings when you set
`LLM_PROVIDER=gemini` + `EMBEDDINGS_PROVIDER=gemini` (see the README "Google-only quickstart"). You'll
put them in `.env` (gitignored) during M2/M3. **CI needs none** — tests mock every provider.

---

Expand Down Expand Up @@ -339,14 +341,19 @@ node_modules/ dist/
```
DATABASE_URL=postgresql+psycopg://sentinel:sentinel@localhost:5432/sentinel
ANTHROPIC_API_KEY=
EMBEDDINGS_PROVIDER=openai # or: voyage
LLM_PROVIDER=anthropic # or: gemini, fake
EMBEDDINGS_PROVIDER=openai # or: voyage, gemini, fake
OPENAI_API_KEY=
VOYAGE_API_KEY=
GEMINI_API_KEY= # free Google AI Studio key drives both LLM + embeddings
RETRIEVAL_TOP_K=5
RETRIEVAL_MIN_SCORE=0.30
CONFIDENCE_REVIEW_THRESHOLD=0.75
```

> The committed `.env.example` is the source of truth and has the full set
> (Gemini model/base-url options included); this is an abridged illustration.

### `docker-compose.yml`
```yaml
services:
Expand Down
12 changes: 12 additions & 0 deletions MILESTONES.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,5 +160,17 @@ and presentable. Do not skip ahead — later milestones assume earlier ones exis
- Eval expansion (larger labeled set, per-category breakdown).
- Observability: OpenTelemetry traces, dashboards.
- Reranking stage before generation.
- Shared provider HTTP base. `ClaudeClient`, `GeminiClient`, `OpenAIEmbedder`, and
`GeminiEmbedder` each repeat api-key validation, base-URL normalization, the `httpx`
POST + headers + error handling, and the timeout knob. Extract a small shared base (or
transport helper) to DRY all four and shrink each constructor's argument surface — done
across all providers together so they stay consistent (deliberately out of scope for the
Gemini-provider PR, which keeps the new classes parallel to the existing ones).
- Role-aware embedding API + Gemini retrieval prefixes. `gemini-embedding-2` recommends
instruction-prefixing queries vs. documents (no `task_type` for text retrieval), but the
`EmbeddingProvider.embed(texts)` interface is role-agnostic and shared by ingest and query
paths. Adding prefixes cleanly needs a query/document role threaded through the Protocol and
all providers — deferred from the Gemini-provider PR to avoid changing retrieval semantics
for OpenAI/fake. Requires tests before any behaviour change.

> Do not pull backlog items into earlier PRs. Park ideas here.
1 change: 1 addition & 0 deletions PROGRESS.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
- **#13** — record real-provider eval numbers (M9 follow-up). Stays open until keys are wired and `make eval` is run for real.
- **Backlog (MILESTONES.md):** multi-tenant + RBAC, eval set expansion, OTel traces, Multi-AZ + private subnets + ACM TLS + S3/DynamoDB Terraform backend.
- **Design system** — dual-theme (dark default + light) audit-grade visual layer for the frontend + a real `GET /dashboard/kpis` endpoint, on branch `claude/serene-maxwell-54yMC` (draft PR). Net-new work beyond the M0–M11 roadmap; `make check` green (201 backend pytest, 7 frontend Vitest, ruff/mypy/tsc/build clean).
- **Gemini provider** — first-class Google AI Studio / Gemini support for **both** LLM (`GeminiClient`, `:generateContent`) and embeddings (`GeminiEmbedder`, `:batchEmbedContents`, requesting 1536 dims), so the whole stack runs on a single free Google key; wired through config, both factories, eval run-metadata (active-model labels), `.env.example`/README/eval/architecture/demo docs, and Terraform/ECS (optional `gemini_api_key` SSM param + provider env vars). On branch `feat/gemini-provider`. New offline tests mock `httpx`; `fake` stays the CI default (no live calls). `make check` green (222 backend pytest [+21 Gemini], 7 frontend Vitest, ruff/mypy/terraform clean). No fabricated eval numbers — real-provider eval not run.

---

Expand Down
37 changes: 36 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ The full step-by-step is in [`docs/demo.md`](docs/demo.md). Short version
# 1. clone
git clone https://github.com/div0rce/sentinel.git
cd sentinel
cp .env.example .env # set ANTHROPIC_API_KEY and OPENAI_API_KEY
cp .env.example .env # set ANTHROPIC_API_KEY + OPENAI_API_KEY (or use the Google-only path below)

# 2. start Postgres + the API
docker compose up -d db
Expand All @@ -143,6 +143,28 @@ curl -s http://localhost:8000/query \
Open <http://localhost:5173> for the SPA: **Query**, **Review**, and
**Dashboard** views.

### Google-only quickstart (one free Google AI Studio key)

Sentinel speaks Gemini for both the LLM and embeddings, so you can run the whole
stack on a single free [Google AI Studio](https://aistudio.google.com/apikey) key
— no Anthropic or OpenAI key required. After `cp .env.example .env`, set:

```bash
GEMINI_API_KEY=... # GOOGLE_API_KEY also works
LLM_PROVIDER=gemini
GEMINI_MODEL=gemini-3.5-flash # fallback: gemini-2.5-flash if 3.5 isn't available to your account
EMBEDDINGS_PROVIDER=gemini
GEMINI_EMBEDDING_MODEL=gemini-embedding-2
EMBEDDING_DIM=1536
```

Then continue with `docker compose up -d db && make dev && make migrate && make seed`.

> **Switching embedding providers?** Embeddings from different providers/models are
> **not comparable** — never mix them in one seeded DB. After changing
> `EMBEDDINGS_PROVIDER`/`GEMINI_EMBEDDING_MODEL`/`EMBEDDING_DIM`, reset and reseed:
> `docker compose down -v && docker compose up -d db && make migrate && make seed`.

### Run the test suite

```bash
Expand Down Expand Up @@ -184,6 +206,19 @@ export EMBEDDINGS_PROVIDER=openai
make migrate && make seed && make eval
```

…or on a single free Google AI Studio key (re-seed first, since Gemini embeddings
are not comparable to OpenAI's):

```bash
export GEMINI_API_KEY=...
export LLM_PROVIDER=gemini
export EMBEDDINGS_PROVIDER=gemini
export GEMINI_MODEL=gemini-3.5-flash
export GEMINI_EMBEDDING_MODEL=gemini-embedding-2
export EMBEDDING_DIM=1536
make migrate && make seed && make eval
```

## Governance & guardrails

Three pillars, all deterministic and tested:
Expand Down
59 changes: 57 additions & 2 deletions backend/app/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,11 +43,20 @@ class Settings(BaseSettings):
"against the canonical database schema dimension before storing vectors."
),
)
embeddings_provider: Literal["openai", "voyage", "fake"] = "openai"
embeddings_provider: Literal["openai", "voyage", "gemini", "fake"] = "openai"
openai_embedding_model: str = Field(
default="text-embedding-3-small",
description="OpenAI embedding model id used when embeddings_provider='openai'.",
)
gemini_embedding_model: str = Field(
default="gemini-embedding-2",
Comment thread
div0rce marked this conversation as resolved.
description=(
"Gemini embedding model id used when embeddings_provider='gemini'. Supports "
"flexible output dimensions (128–3072); EMBEDDING_DIM must still equal the "
"database schema dimension (1536). If 'gemini-embedding-2' is unavailable to "
"your account/region, 'gemini-embedding-001' is a compatible alternative."
),
)

# --- Chunking (consumed from M2 onward) ---------------------------------------

Expand All @@ -67,7 +76,7 @@ class Settings(BaseSettings):

# --- LLM (consumed from M3 onward) --------------------------------------------

llm_provider: Literal["anthropic", "fake"] = "anthropic"
llm_provider: Literal["anthropic", "gemini", "fake"] = "anthropic"
claude_model: str = Field(
default="claude-sonnet-4-6",
description=(
Expand All @@ -76,6 +85,18 @@ class Settings(BaseSettings):
"model-versioning docs); bumping this default is intentional."
),
)
gemini_model: str = Field(
default="gemini-3.5-flash",
Comment thread
div0rce marked this conversation as resolved.
description=(
"Gemini model id used when llm_provider='gemini'. If 'gemini-3.5-flash' is "
"not available to your account/region, 'gemini-2.5-flash' is a stable "
"fallback."
),
)
gemini_base_url: str = Field(
default="https://generativelanguage.googleapis.com/v1beta",
description="Base URL for the Gemini (Google AI Studio) REST API.",
)
llm_temperature: float = Field(
default=0.0,
ge=0.0,
Expand All @@ -97,6 +118,14 @@ class Settings(BaseSettings):
anthropic_api_key: str = ""
openai_api_key: str = ""
voyage_api_key: str = ""
gemini_api_key: str = ""
google_api_key: str = Field(
default="",
description=(
"Fallback for GEMINI_API_KEY. Google AI Studio keys work under either name; "
"GEMINI_API_KEY is the documented one and takes precedence."
),
)

# --- Retrieval and review thresholds (consumed from M3/M5 onward) -------------

Expand All @@ -116,6 +145,32 @@ class Settings(BaseSettings):
),
)

# --- Resolved-by-provider model labels (consumed by the eval harness) ---------

@property
def active_llm_model(self) -> str:
"""Model id of the *currently selected* LLM provider.

Used by the eval harness so RESULTS.md reports the model that actually ran
rather than always labelling it with ``claude_model`` (Golden Rule #5).
"""
if self.llm_provider == "anthropic":
return self.claude_model
if self.llm_provider == "gemini":
return self.gemini_model
return "fake-llm"

@property
def active_embedding_model(self) -> str:
"""Embedding model id of the *currently selected* embeddings provider."""
if self.embeddings_provider == "openai":
return self.openai_embedding_model
if self.embeddings_provider == "gemini":
return self.gemini_embedding_model
# 'voyage' has no model field yet (provider unimplemented) and 'fake' is
# non-semantic; fall back to the provider name so the label is never wrong.
return self.embeddings_provider
Comment on lines +150 to +172

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | 💤 Low value

Consider caching the active model properties.

The AI summary describes these as "cached-by-instance properties," but they lack a @cached_property decorator. While the simple if/else logic is fast, adding @functools.cached_property would align with the description and prevent redundant evaluations if these properties are called multiple times within a request lifecycle.

♻️ Proposed refactor

Import cached_property at the top:

-from functools import lru_cache
+from functools import cached_property, lru_cache

Then decorate the properties:

-    `@property`
+    `@cached_property`
     def active_llm_model(self) -> str:
-    `@property`
+    `@cached_property`
     def active_embedding_model(self) -> str:



@lru_cache(maxsize=1)
def get_settings() -> Settings:
Expand Down
10 changes: 10 additions & 0 deletions backend/app/embeddings/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
* :class:`EmbeddingProvider` — the protocol all providers implement.
* :class:`FakeEmbedder` — deterministic, no-API embedder for tests/CI.
* :class:`OpenAIEmbedder` — hosted ``text-embedding-3-*`` via OpenAI's REST API.
* :class:`GeminiEmbedder` — hosted ``gemini-embedding-*`` via Google's REST API.
* :func:`get_embedder` — factory that maps :class:`backend.app.config.Settings` to the
right provider, validating that the runtime ``embedding_dim`` matches the canonical
database schema dimension before any vector is generated.
Expand All @@ -15,12 +16,14 @@
from backend.app.config import Settings, get_settings
from backend.app.embeddings.base import EmbeddingProvider
from backend.app.embeddings.fake import FakeEmbedder
from backend.app.embeddings.gemini_provider import GeminiEmbedder
from backend.app.embeddings.openai_provider import OpenAIEmbedder
from backend.app.models import SCHEMA_EMBEDDING_DIM

__all__ = [
"EmbeddingProvider",
"FakeEmbedder",
"GeminiEmbedder",
"OpenAIEmbedder",
"get_embedder",
]
Expand Down Expand Up @@ -52,6 +55,13 @@ def get_embedder(settings: Settings | None = None) -> EmbeddingProvider:
model=settings.openai_embedding_model,
dim=SCHEMA_EMBEDDING_DIM,
)
if provider == "gemini":
return GeminiEmbedder(
api_key=settings.gemini_api_key or settings.google_api_key,
model=settings.gemini_embedding_model,
dim=SCHEMA_EMBEDDING_DIM,
base_url=settings.gemini_base_url,
)
if provider == "voyage":
# Voyage support arrives in a later milestone; fail loudly so misconfiguration
# in CI or production is surfaced before any ingest work runs.
Expand Down
Loading
Loading