diff --git a/apps/git-second-brain/README.md b/apps/git-second-brain/README.md
new file mode 100644
index 00000000..9bebfc7e
--- /dev/null
+++ b/apps/git-second-brain/README.md
@@ -0,0 +1,93 @@
+# Git Second Brain
+
+A RAG (Retrieval-Augmented Generation) application that lets you ask
+natural-language questions about **any Git repository** by analysing its
+commit history. The included example uses the **FastAPI** open-source project.
+
+Commits are embedded as vectors and stored in **Oracle AI Database 26ai**.
+At query time the most relevant commits are retrieved via `VECTOR_DISTANCE`
+and passed as context to an OpenAI model through **LangChain**, producing
+grounded answers with commit citations.
+
+## Project structure
+
+```
+git-second-brain/
+├── database/      # SQL scripts: user creation + schema setup
+├── data-loader/   # One-time ETL: parse commits, embed, load into Oracle 26ai
+├── app/           # Streamlit chat UI + LangChain RAG chain
+├── diffs/         # Pre-extracted per-commit diff files
+└── fastapi_commits.txt  # Delimited commit metadata
+```
+
+| Folder           | Purpose                                                                                                                                                                                      | Details                                        |
+| ---------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------- |
+| **database/**    | SQL scripts to create the Oracle user, table, indexes, and (optionally) the vector index.                                                                                                    | [database/README.md](database/README.md)       |
+| **data-loader/** | Reads the extracted commit metadata and diff files, generates 384-dim vector embeddings with `sentence-transformers`, and bulk-inserts everything into Oracle 26ai.                          | [data-loader/README.md](data-loader/README.md) |
+| **app/**         | Streamlit chat interface where users ask questions. A custom LangChain retriever queries Oracle 26ai vector search, and the retrieved commits are sent to OpenAI to generate a cited answer. | [app/README.md](app/README.md)                 |
+
+## Extracting repo data
+
+The examples below use **FastAPI**, but this works with **any Git repository**.
+
+```bash
+# Clone the target repo
+git clone https://github.com/tiangolo/fastapi.git
+mkdir diffs
+cd fastapi
+
+# Extract commit metadata with safe delimiters
+git log --all --no-merges \
+  --pretty=format:"<<<COMMIT>>>%n%H%n%an%n%aI%n%s%n<<<BODY>>>%n%b%n<<<END>>>%n" \
+  > ../fastapi_commits.txt
+
+# Extract diff stats as a single file
+git log --all --no-merges \
+  --pretty=format:"===SHA:%H===" --stat \
+  > ../diffs/all_diffs.txt
+
+cd ..
+```
+
+> **Tip:** The data loader caps at 3 000 commits by default, which keeps
+> indexing time under 10 minutes and covers roughly 2015–today for FastAPI.
+
+## Prerequisites
+
+- Python 3.10+
+- Oracle AI Database 26ai (running and accessible)
+- OpenAI API key (for the chat app)
+
+## Quick start
+
+> **Important:** Load the environment variables from each folder's `.env` file
+> before running Python scripts. See each folder's README for details.
+
+```bash
+# 0. Set up the database
+cd database
+sqlplus system/Welcome_123@//localhost:1521/FREEPDB1 @01_create_user.sql
+sqlplus system/Welcome_123@//localhost:1521/FREEPDB1 @02_create_schema.sql
+cd ..
+
+# 1. Extract repo data (see "Extracting repo data" above)
+
+# 2. Load data into Oracle 26ai
+cd data-loader
+python -m venv .venv && .venv\Scripts\activate   # or source .venv/bin/activate
+pip install -r requirements.txt
+cp .env.example .env   # fill in your Oracle credentials
+# load env vars, then:
+python load_data.py
+cd ..
+
+# 3. Run the app
+cd app
+python -m venv .venv && .venv\Scripts\activate
+pip install -r requirements.txt
+cp .env.example .env   # fill in Oracle + OpenAI credentials
+# load env vars, then:
+streamlit run app.py
+```
+
+See each folder's README for full setup and configuration details.
diff --git a/apps/git-second-brain/app/.env.example b/apps/git-second-brain/app/.env.example
new file mode 100644
index 00000000..8d6c6570
--- /dev/null
+++ b/apps/git-second-brain/app/.env.example
@@ -0,0 +1,7 @@
+# Oracle AI Database 26ai connection
+ORACLE_USER=GITHUB_SECOND_BRAIN
+ORACLE_PASSWORD=<your-password>
+ORACLE_DSN=localhost:1521/FREEPDB1
+
+# OpenAI (can also be entered in the Streamlit sidebar)
+OPENAI_API_KEY=sk-...
diff --git a/apps/git-second-brain/app/README.md b/apps/git-second-brain/app/README.md
new file mode 100644
index 00000000..f3582246
--- /dev/null
+++ b/apps/git-second-brain/app/README.md
@@ -0,0 +1,101 @@
+# Git Second Brain — App
+
+Streamlit chat UI that lets you ask natural-language questions about a
+repository's commit history, powered by **Oracle AI Database 26ai Vector Search**,
+LangChain, and OpenAI.
+
+## Architecture
+
+```
+User question
+    │
+    ▼
+┌────────────────────┐      ┌──────────────────────────┐
+│  Streamlit (app.py)│─────▶│  OracleCommitRetriever   │
+│  Chat interface    │      │  sentence-transformers    │
+└────────┬───────────┘      │  + Oracle 26ai vector     │
+         │                  │    VECTOR_DISTANCE search │
+         │ context docs     └──────────────────────────┘
+         ▼
+┌────────────────────┐
+│  LangChain RAG     │
+│  ChatOpenAI (GPT)  │
+└────────────────────┘
+```
+
+## Prerequisites
+
+| Requirement             | Version                       |
+| ----------------------- | ----------------------------- |
+| Python                  | 3.10+                         |
+| Oracle AI Database 26ai | Running and accessible        |
+| OpenAI API key          | Any `gpt-4o-mini` capable key |
+
+The `data-loader/` must have been run first so the `FASTAPI_COMMITS` table is
+populated with embeddings.
+
+## Setup
+
+```bash
+cd app
+python -m venv .venv
+
+# Windows
+.venv\Scripts\activate
+# Linux / macOS
+source .venv/bin/activate
+
+pip install -r requirements.txt
+```
+
+Copy `.env.example` to `.env` and fill in your credentials:
+
+```bash
+cp .env.example .env
+```
+
+## Running
+
+The app reads Oracle credentials from environment variables. Load them before
+starting Streamlit:
+
+```bash
+# Load env vars from .env (use your preferred method)
+# Windows PowerShell:
+Get-Content .env | ForEach-Object { if ($_ -match '^([^#].+?)=(.*)$') { [Environment]::SetEnvironmentVariable($Matches[1], $Matches[2]) } }
+
+# Linux / macOS:
+# export $(grep -v '^#' .env | xargs)
+
+streamlit run app.py
+```
+
+The app opens at <http://localhost:8501>.
+
+## Smoke test
+
+A standalone script that verifies the vector-search round trip without
+Streamlit or OpenAI. Requires the same environment variables:
+
+```bash
+python smoke_test.py
+```
+
+## Files
+
+| File               | Purpose                                                       |
+| ------------------ | ------------------------------------------------------------- |
+| `app.py`           | Streamlit chat UI + LangChain RAG chain                       |
+| `retriever.py`     | LangChain `BaseRetriever` backed by Oracle 26ai vector search |
+| `smoke_test.py`    | Minimal end-to-end connectivity & vector-search test          |
+| `requirements.txt` | Pinned Python dependencies                                    |
+| `.env.example`     | Template for required environment variables                   |
+
+## Environment variables
+
+| Variable          | Required | Default | Description                                    |
+| ----------------- | -------- | ------- | ---------------------------------------------- |
+| `ORACLE_USER`     | Yes      | —       | Database username                              |
+| `ORACLE_PASSWORD` | Yes      | —       | Database password                              |
+| `ORACLE_DSN`      | Yes      | —       | Connect string, e.g. `localhost:1521/FREEPDB1` |
+| `OPENAI_API_KEY`  | No       | —       | Can also be entered in the Streamlit sidebar   |
diff --git a/apps/git-second-brain/app/app.py b/apps/git-second-brain/app/app.py
new file mode 100644
index 00000000..308ff44c
--- /dev/null
+++ b/apps/git-second-brain/app/app.py
@@ -0,0 +1,171 @@
+"""
+Git Second Brain - Streamlit Chat UI
+Ask natural-language questions about FastAPI's commit history,
+powered by Oracle AI Database 26ai Vector Search + LangChain + OpenAI.
+
+Run:
+  streamlit run app.py
+"""
+
+import os
+
+import streamlit as st
+from langchain_core.output_parsers import StrOutputParser
+from langchain_core.prompts import ChatPromptTemplate
+from langchain_openai import ChatOpenAI
+from retriever import OracleCommitRetriever
+
+# ========================= Page config =========================
+st.set_page_config(
+    page_title="Git Second Brain",
+    page_icon="🧠",
+    layout="wide",
+)
+
+# ========================= Sidebar =============================
+with st.sidebar:
+    st.title("Git Second Brain")
+    st.caption("Oracle AI Database 26ai + LangChain + OpenAI")
+
+    openai_key = st.text_input(
+        "OpenAI API Key",
+        type="password",
+        value=os.getenv("OPENAI_API_KEY", ""),
+        help="Stored only in this session, never persisted.",
+    )
+
+    model_name = st.selectbox(
+        "Model",
+        ["gpt-4o-mini", "gpt-4o", "gpt-4.1-mini", "gpt-4.1-nano"],
+        index=0,
+    )
+
+    top_k = st.slider("Commits to retrieve", min_value=3, max_value=15, value=8)
+
+    temperature = st.slider("Temperature", min_value=0.0, max_value=1.0, value=0.2, step=0.05)
+
+    st.divider()
+    st.markdown(
+        "**How it works**\n\n"
+        "1. Your question is embedded with sentence-transformers\n"
+        "2. Oracle 26ai runs `VECTOR_DISTANCE` to find the most relevant commits\n"
+        "3. LangChain passes those commits as context to OpenAI\n"
+        "4. You get a grounded answer with commit citations"
+    )
+
+    st.divider()
+    st.markdown("**Sample questions**")
+    sample_questions = [
+        "Why did FastAPI switch to Pydantic v2?",
+        "How has dependency injection evolved?",
+        "What were the biggest breaking changes in the last 2 years?",
+        "When did lifespan replace startup/shutdown events?",
+        "What security fixes were applied recently?",
+    ]
+    for q in sample_questions:
+        if st.button(q, use_container_width=True):
+            st.session_state["prefill"] = q
+
+# ========================= System prompt =======================
+SYSTEM_PROMPT = """\
+You are Git Second Brain, an AI assistant that answers questions about the
+FastAPI open-source project by analyzing its Git commit history.
+
+You will receive a set of relevant commits retrieved from Oracle AI Database 26ai
+via vector similarity search. Use ONLY these commits to answer the question.
+If the commits do not contain enough information, say so honestly.
+
+Rules:
+- Cite specific commits by their short SHA and date when supporting a claim.
+- Summarize the narrative arc when multiple commits tell a story.
+- Keep answers concise but thorough (3-6 paragraphs max).
+- If you are unsure, say "Based on the commits I found..." to hedge.
+- Never invent commit SHAs or dates.
+"""
+
+RAG_TEMPLATE = ChatPromptTemplate.from_messages(
+    [
+        ("system", SYSTEM_PROMPT),
+        ("human", "Retrieved commits:\n\n{context}\n\n---\nQuestion: {question}"),
+    ]
+)
+
+# ========================= Init state ==========================
+if "messages" not in st.session_state:
+    st.session_state.messages = []
+
+if "retriever" not in st.session_state:
+    with st.spinner("Connecting to Oracle AI Database 26ai ..."):
+        st.session_state.retriever = OracleCommitRetriever(top_k=top_k)
+
+# ========================= Chat display ========================
+st.header("Ask your repo anything")
+
+for msg in st.session_state.messages:
+    with st.chat_message(msg["role"]):
+        st.markdown(msg["content"])
+        if msg.get("sources"):
+            with st.expander(f"Retrieved commits ({len(msg['sources'])})"):
+                for doc in msg["sources"]:
+                    meta = doc.metadata
+                    st.markdown(
+                        f"**`{meta['sha'][:10]}`** | {meta['date']} | "
+                        f"*{meta['author']}*\n\n"
+                        f"> {meta['subject']}"
+                    )
+                    st.divider()
+
+# ========================= Chat input ==========================
+prefill = st.session_state.pop("prefill", None)
+user_input = st.chat_input("Ask about FastAPI's history ...") or prefill
+
+if user_input:
+    if not openai_key:
+        st.error("Please enter your OpenAI API key in the sidebar.")
+        st.stop()
+
+    # Show user message
+    st.session_state.messages.append({"role": "user", "content": user_input})
+    with st.chat_message("user"):
+        st.markdown(user_input)
+
+    # Retrieve from Oracle 26ai
+    with st.chat_message("assistant"):
+        with st.spinner("Searching Oracle 26ai Vector Search ..."):
+            retriever = st.session_state.retriever
+            retriever.top_k = top_k
+            docs = retriever.invoke(user_input)
+
+        context = "\n\n---\n\n".join(doc.page_content for doc in docs)
+
+        # LangChain RAG chain
+        llm = ChatOpenAI(
+            model=model_name,
+            temperature=temperature,
+            api_key=openai_key,
+        )
+        chain = RAG_TEMPLATE | llm | StrOutputParser()
+
+        with st.spinner("Generating answer ..."):
+            answer = chain.invoke({"context": context, "question": user_input})
+
+        st.markdown(answer)
+
+        # Show retrieved commits
+        with st.expander(f"Retrieved commits ({len(docs)})"):
+            for doc in docs:
+                meta = doc.metadata
+                st.markdown(
+                    f"**`{meta['sha'][:10]}`** | {meta['date']} | "
+                    f"*{meta['author']}*\n\n"
+                    f"> {meta['subject']}"
+                )
+                st.divider()
+
+        st.session_state.messages.append(
+            {
+                "role": "assistant",
+                "content": answer,
+                "sources": docs,
+            }
+        )
diff --git a/apps/git-second-brain/app/requirements.txt b/apps/git-second-brain/app/requirements.txt
new file mode 100644
index 00000000..e387f469
--- /dev/null
+++ b/apps/git-second-brain/app/requirements.txt
@@ -0,0 +1,6 @@
+oracledb>=2.2.0,<4
+sentence-transformers>=5.0,<6
+langchain>=1.2,<2
+langchain-core>=1.2,<2
+langchain-openai>=1.1,<2
+streamlit>=1.38,<2
diff --git a/apps/git-second-brain/app/retriever.py b/apps/git-second-brain/app/retriever.py
new file mode 100644
index 00000000..49c5c817
--- /dev/null
+++ b/apps/git-second-brain/app/retriever.py
@@ -0,0 +1,107 @@
+"""
+LangChain custom retriever backed by Oracle AI Database 26ai Vector Search.
+
+This retriever embeds the user query with sentence-transformers, runs a
+VECTOR_DISTANCE query against the FASTAPI_COMMITS table, and returns
+LangChain Document objects with commit metadata.
+"""
+
+import array
+import os
+
+import oracledb
+from langchain_core.callbacks import CallbackManagerForRetrieverRun
+from langchain_core.documents import Document
+from langchain_core.retrievers import BaseRetriever
+from pydantic import Field, PrivateAttr
+from sentence_transformers import SentenceTransformer
+
+
+class OracleCommitRetriever(BaseRetriever):
+    """Retrieve FastAPI commits from Oracle AI Database 26ai via vector similarity.
+
+    Configuration is read from environment variables:
+        ORACLE_USER     – database username
+        ORACLE_PASSWORD – database password
+        ORACLE_DSN      – Oracle connect string (host:port/service)
+    """
+
+    db_user: str = Field(default_factory=lambda: os.environ["ORACLE_USER"])
+    db_password: str = Field(default_factory=lambda: os.environ["ORACLE_PASSWORD"], repr=False)
+    db_dsn: str = Field(default_factory=lambda: os.environ["ORACLE_DSN"])
+    embed_model_name: str = "sentence-transformers/all-MiniLM-L6-v2"
+    top_k: int = 8
+
+    _embed_model: SentenceTransformer = PrivateAttr()
+    _conn: oracledb.Connection = PrivateAttr()
+
+    def __init__(self, **kwargs):
+        super().__init__(**kwargs)
+        self._embed_model = SentenceTransformer(self.embed_model_name)
+        self._conn = oracledb.connect(
+            user=self.db_user,
+            password=self.db_password,
+            dsn=self.db_dsn,
+        )
+
+    def _get_relevant_documents(
+        self,
+        query: str,
+        *,
+        run_manager: CallbackManagerForRetrieverRun,
+    ) -> list[Document]:
+        """Embed the query and run vector search in Oracle 26ai."""
+        vec = array.array(
+            "f",
+            self._embed_model.encode(query, normalize_embeddings=True).tolist(),
+        )
+
+        cur = self._conn.cursor()
+        cur.execute(
+            """
+            SELECT sha,
+                   TO_CHAR(commit_date, 'YYYY-MM-DD'),
+                   author,
+                   subject,
+                   body,
+                   files_changed
+            FROM FASTAPI_COMMITS
+            ORDER BY VECTOR_DISTANCE(embedding, :1, COSINE)
+            FETCH FIRST :2 ROWS ONLY
+            """,
+            [vec, self.top_k],
+        )
+
+        docs = []
+        for sha, date_str, author, subject, body, files in cur:
+            body_text = body.read() if hasattr(body, "read") else (body or "")
+            files_text = files.read() if hasattr(files, "read") else (files or "")
+
+            content = (
+                f"Commit: {sha[:10]}\n"
+                f"Date: {date_str}\n"
+                f"Author: {author}\n"
+                f"Subject: {subject}\n"
+                f"Body: {body_text}\n"
+                f"Files changed:\n{files_text[:800]}"
+            )
+
+            docs.append(
+                Document(
+                    page_content=content,
+                    metadata={
+                        "sha": sha,
+                        "date": date_str,
+                        "author": author,
+                        "subject": subject,
+                    },
+                )
+            )
+
+        cur.close()
+        return docs
+
+    def close(self):
+        """Clean up the database connection."""
+        if self._conn:
+            self._conn.close()
diff --git a/apps/git-second-brain/app/smoke_test.py b/apps/git-second-brain/app/smoke_test.py
new file mode 100644
index 00000000..82ee99f8
--- /dev/null
+++ b/apps/git-second-brain/app/smoke_test.py
@@ -0,0 +1,62 @@
+"""
+Smoke test: verify Python can query 26ai vector search end to end.
+Run: python smoke_test.py
+"""
+
+import array
+import os
+import sys
+
+import oracledb
+from sentence_transformers import SentenceTransformer
+
+_REQUIRED_ENV = ("ORACLE_USER", "ORACLE_PASSWORD", "ORACLE_DSN")
+_missing = [v for v in _REQUIRED_ENV if v not in os.environ]
+if _missing:
+    sys.exit(f"ERROR: missing environment variables: {', '.join(_missing)}")
+
+DB_USER = os.environ["ORACLE_USER"]
+DB_PASSWORD = os.environ["ORACLE_PASSWORD"]
+DB_DSN = os.environ["ORACLE_DSN"]
+
+EMBED_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
+
+QUESTION = "Why did FastAPI adopt Pydantic v2?"
+
+
+def main():
+    print(f"Loading model: {EMBED_MODEL}")
+    model = SentenceTransformer(EMBED_MODEL)
+
+    print(f"Encoding question: {QUESTION}")
+    vec = array.array("f", model.encode(QUESTION, normalize_embeddings=True).tolist())
+
+    print(f"Connecting to {DB_DSN} ...")
+    conn = oracledb.connect(user=DB_USER, password=DB_PASSWORD, dsn=DB_DSN)
+    cur = conn.cursor()
+
+    print("Running vector search ...\n")
+    cur.execute(
+        """
+        SELECT sha, commit_date, subject
+        FROM FASTAPI_COMMITS
+        ORDER BY VECTOR_DISTANCE(embedding, :1, COSINE)
+        FETCH FIRST 5 ROWS ONLY
+    """,
+        [vec],
+    )
+
+    print(f"{'#':<4} {'SHA':<12} {'DATE':<22} {'SUBJECT'}")
+    print("-" * 90)
+    for i, (sha, dt, subject) in enumerate(cur, 1):
+        short_sha = sha[:10]
+        date_str = dt.strftime("%Y-%m-%d %H:%M") if dt else "unknown"
+        print(f"{i:<4} {short_sha:<12} {date_str:<22} {subject[:60]}")
+
+    cur.close()
+    conn.close()
+    print("\nSmoke test passed.")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/apps/git-second-brain/data-loader/.env.example b/apps/git-second-brain/data-loader/.env.example
new file mode 100644
index 00000000..1e3d2ee0
--- /dev/null
+++ b/apps/git-second-brain/data-loader/.env.example
@@ -0,0 +1,7 @@
+# Oracle AI Database 26ai connection
+ORACLE_USER=GITHUB_SECOND_BRAIN
+ORACLE_PASSWORD=<your-password>
+ORACLE_DSN=localhost:1521/FREEPDB1
+
+# Optional: override the target schema (defaults to GITHUB_SECOND_BRAIN)
+# ORACLE_SCHEMA=GITHUB_SECOND_BRAIN
diff --git a/apps/git-second-brain/data-loader/README.md b/apps/git-second-brain/data-loader/README.md
new file mode 100644
index 00000000..15f49411
--- /dev/null
+++ b/apps/git-second-brain/data-loader/README.md
@@ -0,0 +1,81 @@
+# Git Second Brain — Data Loader
+
+Reads a repository's commit history from a plain-text dump, generates vector
+embeddings with `sentence-transformers`, and bulk-inserts everything into an
+**Oracle AI Database 26ai** table.
+
+## How it works
+
+1. Parses `fastapi_commits.txt` (delimited commit metadata) and
+   `diffs/all_diffs.txt` (per-commit file-change stats).
+2. For each commit, builds a combined text blob and encodes it with the
+   `all-MiniLM-L6-v2` model (384-dimensional vectors).
+3. Inserts rows in batches into `FASTAPI_COMMITS`, handling duplicate SHAs
+   gracefully.
+
+## Prerequisites
+
+| Requirement             | Version                                                  |
+| ----------------------- | -------------------------------------------------------- |
+| Python                  | 3.10+                                                    |
+| Oracle AI Database 26ai | Running, with the schema created via `database/` scripts |
+
+The following data files must exist relative to this folder:
+
+- `../fastapi_commits.txt` — commit metadata
+- `../diffs/all_diffs.txt` — diff / file-change information (optional but recommended)
+
+## Setup
+
+```bash
+cd data-loader
+python -m venv .venv
+
+# Windows
+.venv\Scripts\activate
+# Linux / macOS
+source .venv/bin/activate
+
+pip install -r requirements.txt
+```
+
+Copy `.env.example` to `.env` and fill in your credentials:
+
+```bash
+cp .env.example .env
+```
+
+## Running
+
+Load your environment variables before running the script:
+
+```bash
+# Load env vars from .env (use your preferred method)
+# Windows PowerShell:
+Get-Content .env | ForEach-Object { if ($_ -match '^([^#].+?)=(.*)$') { [Environment]::SetEnvironmentVariable($Matches[1], $Matches[2]) } }
+
+# Linux / macOS:
+# export $(grep -v '^#' .env | xargs)
+
+python load_data.py
+```
+
+Progress is printed to stdout. A full run (~3 000 commits) takes a few minutes
+depending on hardware and network latency to the database.
+
+## Environment variables
+
+| Variable          | Required | Default               | Description                                    |
+| ----------------- | -------- | --------------------- | ---------------------------------------------- |
+| `ORACLE_USER`     | Yes      | —                     | Database username                              |
+| `ORACLE_PASSWORD` | Yes      | —                     | Database password                              |
+| `ORACLE_DSN`      | Yes      | —                     | Connect string, e.g. `localhost:1521/FREEPDB1` |
+| `ORACLE_SCHEMA`   | No       | `GITHUB_SECOND_BRAIN` | Target schema for the table                    |
+
+## Files
+
+| File               | Purpose                                     |
+| ------------------ | ------------------------------------------- |
+| `load_data.py`     | Main loader script                          |
+| `requirements.txt` | Pinned Python dependencies                  |
+| `.env.example`     | Template for required environment variables |
diff --git a/apps/git-second-brain/data-loader/load_data.py b/apps/git-second-brain/data-loader/load_data.py
new file mode 100644
index 00000000..4ea3b530
--- /dev/null
+++ b/apps/git-second-brain/data-loader/load_data.py
@@ -0,0 +1,239 @@
+"""
+Load FastAPI commit history into Oracle AI Database 26ai with vector embeddings.
+
+Prerequisites:
+  1. ../fastapi_commits.txt  (delimited commit metadata, one block per commit)
+  2. ../diffs/all_diffs.txt  (git log --stat dump, delimited by ===SHA:hash===)
+  3. pip install -r requirements.txt
+  4. Schema created by running schema.sql
+
+Run:
+  python load_data.py
+"""
+
+import array
+import os
+import re
+import sys
+
+import oracledb
+from sentence_transformers import SentenceTransformer
+
+# =========================== Config ===========================
+_REQUIRED_ENV = ("ORACLE_USER", "ORACLE_PASSWORD", "ORACLE_DSN")
+_missing = [v for v in _REQUIRED_ENV if v not in os.environ]
+if _missing:
+    print(f"ERROR: missing environment variables: {', '.join(_missing)}")
+    sys.exit(1)
+
+DB_USER = os.environ["ORACLE_USER"]
+DB_PASSWORD = os.environ["ORACLE_PASSWORD"]
+DB_DSN = os.environ["ORACLE_DSN"]
+DB_SCHEMA = os.getenv("ORACLE_SCHEMA", "GITHUB_SECOND_BRAIN")
+
+COMMITS_FILE = "../fastapi_commits.txt"
+DIFFS_FILE = "../diffs/all_diffs.txt"
+
+MAX_COMMITS = 3000
+BATCH_SIZE = 100
+EMBED_MODEL = "sentence-transformers/all-MiniLM-L6-v2"  # 384 dims
+# ==============================================================
+
+
+def parse_diffs(path):
+    """Parse the single-file diff dump into a dict keyed by SHA."""
+    diffs = {}
+    if not os.path.exists(path):
+        print(f"WARNING: {path} not found. Continuing without file-change info.")
+        return diffs
+
+    current_sha = None
+    buffer = []
+    with open(path, encoding="utf-8", errors="replace") as f:
+        for line in f:
+            m = re.match(r"===SHA:([0-9a-f]+)===", line.strip())
+            if m:
+                if current_sha:
+                    diffs[current_sha] = "".join(buffer).strip()
+                current_sha = m.group(1)
+                buffer = []
+            else:
+                buffer.append(line)
+        if current_sha:
+            diffs[current_sha] = "".join(buffer).strip()
+    return diffs
+
+
+def load_commits(path, limit):
+    """Load commits from a delimited plain-text dump.
+
+    Each block looks like:
+        <<<COMMIT>>>
+        <sha>
+        <author>
+        <iso_date>
+        <subject>
+        <<<BODY>>>
+        <body lines, possibly multiple>
+        <<<END>>>
+    """
+    commits = []
+    with open(path, encoding="utf-8", errors="replace") as f:
+        raw = f.read()
+
+    blocks = raw.split("<<<COMMIT>>>")
+    for block in blocks:
+        block = block.strip()
+        if not block:
+            continue
+        if len(commits) >= limit:
+            break
+
+        try:
+            header, rest = block.split("<<<BODY>>>", 1)
+        except ValueError:
+            continue
+
+        body, _, _ = rest.partition("<<<END>>>")
+        header_lines = header.strip().splitlines()
+        if len(header_lines) < 4:
+            continue
+
+        commits.append(
+            {
+                "sha": header_lines[0].strip(),
+                "author": header_lines[1].strip(),
+                "date": header_lines[2].strip(),
+                "subject": header_lines[3].strip(),
+                "body": body.strip(),
+            }
+        )
+
+    return commits
+
+
+def build_content(commit, files_changed):
+    """Combine commit fields into a single string for embedding."""
+    body = (commit.get("body") or "").strip() or "(no body)"
+    files = (files_changed or "").strip()[:1500] or "(unknown)"
+    return (
+        f"Subject: {commit.get('subject', '')}\n"
+        f"Author: {commit.get('author', '')}\n"
+        f"Date: {commit.get('date', '')}\n"
+        f"Body: {body}\n"
+        f"Files changed:\n{files}"
+    )
+
+
+def normalize_date(raw):
+    """Turn a git ISO date like 2024-03-12T10:15:30+01:00 into a clean string."""
+    if not raw:
+        return "1970-01-01T00:00:00"
+    # Strip timezone suffix, keep first 19 chars
+    cleaned = raw.split("+")[0].split("Z")[0][:19]
+    return cleaned if "T" in cleaned else "1970-01-01T00:00:00"
+
+
+def flush_batch(cursor, sql, batch, model):
+    """Encode texts in batch, insert via executemany."""
+    texts = [item[1] for item in batch]
+    vectors = model.encode(texts, normalize_embeddings=True, show_progress_bar=False)
+
+    rows = []
+    for (commit, content, files_changed), vec in zip(batch, vectors, strict=True):
+        rows.append(
+            (
+                commit.get("sha"),
+                (commit.get("author") or "")[:200],
+                normalize_date(commit.get("date", "")),
+                (commit.get("subject") or "")[:1000],
+                commit.get("body") or "",
+                files_changed,
+                content,
+                array.array("f", vec.tolist()),
+            )
+        )
+
+    try:
+        cursor.executemany(sql, rows)
+    except oracledb.IntegrityError:
+        # Retry one-by-one so a duplicate SHA does not kill the whole batch
+        inserted = 0
+        for row in rows:
+            try:
+                cursor.execute(sql, row)
+                inserted += 1
+            except oracledb.IntegrityError:
+                pass
+        return inserted
+
+    return len(rows)
+
+
+def main():
+    if not os.path.exists(COMMITS_FILE):
+        print(f"ERROR: {COMMITS_FILE} not found in current directory.")
+        sys.exit(1)
+
+    print("Loading embedding model (first run downloads ~90 MB)...")
+    model = SentenceTransformer(EMBED_MODEL)
+
+    print(f"Parsing {DIFFS_FILE} ...")
+    diffs = parse_diffs(DIFFS_FILE)
+    print(f"  parsed {len(diffs)} diffs")
+
+    print(f"Loading {COMMITS_FILE} ...")
+    commits = load_commits(COMMITS_FILE, MAX_COMMITS)
+    print(f"  loaded {len(commits)} commits")
+
+    print(f"Connecting to Oracle AI Database 26ai at {DB_DSN} ...")
+    conn = oracledb.connect(user=DB_USER, password=DB_PASSWORD, dsn=DB_DSN)
+    cursor = conn.cursor()
+
+    # Point all unqualified object references at the target schema.
+    # Schema names cannot be parameterised in DDL; validate against a strict
+    # allowlist pattern to prevent SQL injection.
+    if not re.fullmatch(r"[A-Za-z_][A-Za-z0-9_$#]{0,127}", DB_SCHEMA):
+        print(f"ERROR: ORACLE_SCHEMA value '{DB_SCHEMA}' is not a valid Oracle identifier.")
+        sys.exit(1)
+
+    cursor.execute(f"ALTER SESSION SET CURRENT_SCHEMA = {DB_SCHEMA}")
+
+    insert_sql = f"""
+        INSERT INTO {DB_SCHEMA}.FASTAPI_COMMITS
+          (sha, author, commit_date, subject, body,
+           files_changed, content_for_embedding, embedding)
+        VALUES
+          (:1, :2,
+           TO_TIMESTAMP(:3, 'YYYY-MM-DD"T"HH24:MI:SS'),
+           :4, :5, :6, :7, :8)
+    """
+
+    batch = []
+    total = 0
+    for commit in commits:
+        sha = commit.get("sha")
+        if not sha:
+            continue
+        files_changed = diffs.get(sha, "")
+        content = build_content(commit, files_changed)
+        batch.append((commit, content, files_changed))
+
+        if len(batch) >= BATCH_SIZE:
+            total += flush_batch(cursor, insert_sql, batch, model)
+            conn.commit()
+            batch = []
+            print(f"  inserted {total} commits...")
+
+    if batch:
+        total += flush_batch(cursor, insert_sql, batch, model)
+        conn.commit()
+
+    print(f"Done. Inserted {total} commits.")
+
+    cursor.close()
+    conn.close()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/apps/git-second-brain/data-loader/requirements.txt b/apps/git-second-brain/data-loader/requirements.txt
new file mode 100644
index 00000000..02397586
--- /dev/null
+++ b/apps/git-second-brain/data-loader/requirements.txt
@@ -0,0 +1,2 @@
+oracledb>=2.2.0,<4
+sentence-transformers>=5.0,<6
diff --git a/apps/git-second-brain/database/01_create_user.sql b/apps/git-second-brain/database/01_create_user.sql
new file mode 100644
index 00000000..020954b6
--- /dev/null
+++ b/apps/git-second-brain/database/01_create_user.sql
@@ -0,0 +1,34 @@
+-- =====================================================================
+-- Git Second Brain – Oracle AI Database 26ai user setup
+--
+-- Run as SYS / SYSTEM (or any DBA) against the target PDB:
+--   sqlplus system/Welcome_123@//localhost:1521/FREEPDB1 @01_create_user.sql
+-- =====================================================================
+
+-- Drop the user if it already exists (CASCADE removes all owned objects)
+BEGIN
+   EXECUTE IMMEDIATE 'DROP USER GITHUB_SECOND_BRAIN CASCADE';
+EXCEPTION
+   WHEN OTHERS THEN
+      IF SQLCODE != -01918 THEN -- ORA-01918: user does not exist
+         RAISE;
+      END IF;
+END;
+/
+
+CREATE USER GITHUB_SECOND_BRAIN
+IDENTIFIED BY "ChangeMe_123!"
+DEFAULT TABLESPACE USERS
+TEMPORARY TABLESPACE TEMP
+PROFILE DEFAULT
+ACCOUNT UNLOCK;
+
+GRANT CREATE SESSION   TO GITHUB_SECOND_BRAIN;
+GRANT CREATE TABLE     TO GITHUB_SECOND_BRAIN;
+GRANT CREATE VIEW      TO GITHUB_SECOND_BRAIN;
+GRANT CREATE SEQUENCE  TO GITHUB_SECOND_BRAIN;
+GRANT CREATE PROCEDURE TO GITHUB_SECOND_BRAIN;
+GRANT CREATE TRIGGER   TO GITHUB_SECOND_BRAIN;
+GRANT CREATE TYPE      TO GITHUB_SECOND_BRAIN;
+
+ALTER USER GITHUB_SECOND_BRAIN QUOTA UNLIMITED ON USERS;
diff --git a/apps/git-second-brain/database/02_create_schema.sql b/apps/git-second-brain/database/02_create_schema.sql
new file mode 100644
index 00000000..1b859f61
--- /dev/null
+++ b/apps/git-second-brain/database/02_create_schema.sql
@@ -0,0 +1,56 @@
+-- =====================================================================
+-- Git Second Brain – Oracle AI Database 26ai schema
+--
+-- All objects live under the GITHUB_SECOND_BRAIN schema.
+--
+-- Connect as a user with privileges on GITHUB_SECOND_BRAIN, for example:
+--   sqlplus system/Welcome_123@//localhost:1521/FREEPDB1 @02_create_schema.sql
+-- Or connect directly as GITHUB_SECOND_BRAIN if you have the password.
+-- =====================================================================
+
+ALTER SESSION SET CURRENT_SCHEMA = GITHUB_SECOND_BRAIN;
+
+-- Drop old objects if this script is rerun
+BEGIN
+  EXECUTE IMMEDIATE 'DROP TABLE GITHUB_SECOND_BRAIN.FASTAPI_COMMITS PURGE';
+EXCEPTION WHEN OTHERS THEN NULL;
+END;
+/
+
+-- =====================================================================
+-- Main table: one row per commit, with a 384-dim vector column
+-- =====================================================================
+CREATE TABLE GITHUB_SECOND_BRAIN.FASTAPI_COMMITS (
+  id                    NUMBER GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
+  sha                   VARCHAR2(64) UNIQUE NOT NULL,
+  author                VARCHAR2(200),
+  commit_date           TIMESTAMP,
+  subject               VARCHAR2(1000),
+  body                  CLOB,
+  files_changed         CLOB,
+  content_for_embedding CLOB,
+  embedding             VECTOR(384, FLOAT32)
+);
+
+-- Supporting indexes for metadata filters (date range, author)
+CREATE INDEX GITHUB_SECOND_BRAIN.FASTAPI_COMMITS_DATE_IDX
+  ON GITHUB_SECOND_BRAIN.FASTAPI_COMMITS(commit_date);
+
+CREATE INDEX GITHUB_SECOND_BRAIN.FASTAPI_COMMITS_AUTHOR_IDX
+  ON GITHUB_SECOND_BRAIN.FASTAPI_COMMITS(author);
+
+COMMIT;
+
+-- =====================================================================
+-- Vector index
+-- Create AFTER loading the data (faster build, better quality).
+-- Run the block below once load_data.py finishes.
+-- =====================================================================
+--
+-- CREATE VECTOR INDEX GITHUB_SECOND_BRAIN.FASTAPI_COMMITS_VEC_IDX
+-- ON GITHUB_SECOND_BRAIN.FASTAPI_COMMITS (embedding)
+-- ORGANIZATION INMEMORY NEIGHBOR GRAPH
+-- DISTANCE COSINE
+-- WITH TARGET ACCURACY 95;
+--
+-- =====================================================================
diff --git a/apps/git-second-brain/database/README.md b/apps/git-second-brain/database/README.md
new file mode 100644
index 00000000..440c7583
--- /dev/null
+++ b/apps/git-second-brain/database/README.md
@@ -0,0 +1,33 @@
+# Git Second Brain — Database
+
+SQL scripts for setting up the Oracle AI Database 26ai schema used by the
+data loader and the app.
+
+## Prerequisites
+
+- Oracle AI Database 26ai (e.g. the free container image)
+- A DBA connection to the target PDB (e.g. `SYSTEM`)
+
+## Scripts
+
+Run in order:
+
+| Script                 | Run as                        | Purpose                                                                         |
+| ---------------------- | ----------------------------- | ------------------------------------------------------------------------------- |
+| `01_create_user.sql`   | SYS / SYSTEM                  | Creates the `GITHUB_SECOND_BRAIN` user with required grants                     |
+| `02_create_schema.sql` | SYSTEM or GITHUB_SECOND_BRAIN | Creates the `FASTAPI_COMMITS` table, indexes, and (optionally) the vector index |
+
+## Usage
+
+```bash
+# Connect as SYSTEM to the pluggable database
+sqlplus system/Welcome_123@//localhost:1521/FREEPDB1
+
+# Then run each script
+@01_create_user.sql
+@02_create_schema.sql
+```
+
+> **Note:** The vector index creation is commented out in `02_create_schema.sql`.
+> Create it **after** loading data with `data-loader/` for faster build times
+> and better index quality.