Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 19 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -242,7 +242,7 @@ On VS Code's 2.45M‑line codebase, SocratiCode answers architectural questions
- **Hybrid code search** — Built on Qdrant, a purpose-built vector database with HNSW indexing, concurrent read/write, and payload filtering. Each chunk stores both a dense vector and a BM25 sparse vector; the Query API runs both sub-queries in a single round-trip and fuses results with Reciprocal Rank Fusion (RRF). Semantic search handles conceptual queries like "authentication middleware" even when those exact words don't appear in the code. BM25 handles exact identifier and keyword lookups. You get the best of both in every query with no tuning required.
- **Configurable Qdrant** — Use the built-in Docker Qdrant (default, zero config) or connect to your own instance (self-hosted, remote server, or Qdrant Cloud). Configure via `QDRANT_MODE`, `QDRANT_URL`, and `QDRANT_API_KEY` environment variables.
- **Configurable Ollama** — Use the built-in Docker Ollama (default, zero config) or point to your own Ollama instance (native install -GPU access-, remote server, etc.). Configure via `OLLAMA_MODE`, `OLLAMA_URL`, `EMBEDDING_MODEL` and `EMBEDDING_DIMENSIONS` environment variables.
- **Multi-provider embeddings** — Switch between Local Ollama (private, GPU access), Docker Ollama (zero-config), OpenAI (`text-embedding-3-small`, fastest), Google Gemini (`gemini-embedding-001`, free tier), LM Studio (local OpenAI-compatible server), or LiteLLM (proxy gateway in front of 100+ providers) with a single environment variable. No provider-specific configuration files.
- **Multi-provider embeddings** — Switch between Text Embedder (deterministic Go binary, default, zero-dependency), Local Ollama (private, GPU access), Docker Ollama (zero-config), OpenAI (`text-embedding-3-small`, fastest), Google Gemini (`gemini-embedding-001`, free tier), LM Studio (local OpenAI-compatible server), or LiteLLM (proxy gateway in front of 100+ providers) with a single environment variable. No provider-specific configuration files.
- **Private & secure** — Everything runs on your machine — your code never leaves your network. The default Docker setup includes Ollama (embeddings) and Qdrant (vector storage) with no external API calls. No API costs, no token limits. Suitable for air-gapped and on-premises environments. Optional cloud providers (OpenAI, Google Gemini, Qdrant Cloud) are available but never required.
- **AST-aware chunking** — Files are split at function/class boundaries using AST parsing (ast-grep), not arbitrary line counts. This produces higher-quality search results. Falls back to line-based chunking for unsupported languages.
- **Polyglot code dependency graph** — Static analysis of import/require/use/include statements using ast-grep for 18+ languages. No external tools like dependency-cruiser required. Detects circular dependencies and generates visual Mermaid diagrams.
Expand Down Expand Up @@ -1154,7 +1154,7 @@ The rest of this section documents the variables themselves. Pass them using whi

| Variable | Default | Description |
|----------|---------|-------------|
| `EMBEDDING_PROVIDER` | `ollama` | Embedding backend: `ollama` (local, default), `openai`, `google`, `lmstudio`, or `litellm` |
| `EMBEDDING_PROVIDER` | `textembedder` | Embedding backend: `textembedder` (deterministic Go binary, default, no Docker/GPU needed), `ollama` (local), `openai`, `google`, `lmstudio`, or `litellm` |
| `EMBEDDING_MODEL` | *(per provider)* | Model name. Defaults: `nomic-embed-text` (ollama), `text-embedding-3-small` (openai), `gemini-embedding-001` (google). **Required** for `lmstudio` and `litellm` (no default). |
| `EMBEDDING_DIMENSIONS` | *(per provider)* | Vector dimensions. Defaults: `768` (ollama), `1536` (openai), `3072` (google). **Required** for `lmstudio` and `litellm` (no default; varies per loaded model / proxy alias). |
| `EMBEDDING_CONTEXT_LENGTH` | *(auto-detected)* | Model context window in tokens. Auto-detected for known model names (works for LiteLLM aliases that match the underlying model name). Set manually for custom LM Studio models or arbitrary LiteLLM aliases. |
Expand Down Expand Up @@ -1191,6 +1191,23 @@ The rest of this section documents the variables themselves. Pass them using whi
| `LITELLM_API_KEY` | *(none)* | **Required.** Master key (`general_settings.master_key` in the proxy's `config.yaml`) or a virtual key issued via LiteLLM's `/key/generate` endpoint. Unlike LM Studio, LiteLLM always authenticates — `/v1/models` itself is gated. |
| `LITELLM_SEND_DIMENSIONS` | `false` | Opt-in (`true` / `1` / `yes`). Forwards the OpenAI-style `dimensions` parameter through the proxy. Safe only for Matryoshka-aware backends (`text-embedding-3-*`, `voyage-3`); other backends (BGE, `nomic-embed-text`, Cohere v3) reject the request. Leave unset unless you know your alias resolves to a Matryoshka model. |

### Text-Embedder Configuration (when `EMBEDDING_PROVIDER=textembedder`)

The default embedding provider. Uses a deterministic Go binary (`text-embedder`) that produces bit-identical 768-dim vectors — no Docker, no GPU, no API keys, no network calls. The binary is distributed in gzip form (`text-embedder-*.gz`) and auto-extracted on first use.

| Variable | Default | Description |
|----------|---------|-------------|
| `TEXTEMBEDDER_BIN_PATH` | *(none)* | Absolute path to an already-decompressed `text-embedder` binary. Useful when you manage the binary yourself or the auto-discovery fails. |
| `TEXTEMBEDDER_URL` | *(none)* | When set, skips binary management entirely and connects to an externally running instance (e.g. `http://localhost:8089`). Useful for sharing one instance across multiple MCP hosts. |
| `TEXTEMBEDDER_PORT` | `8089` | Port the binary listens on when spawned as a subprocess. |

**Platform binaries:**
- `text-embedder-linux.gz` — linux/amd64
- `text-embedder-darwin.gz` — darwin/arm64 (Apple Silicon)
- `text-embedder-win.gz` — windows/amd64

The provider selects the correct binary based on `process.platform`. To generate them, run `make deploy-all` from the [`text-embedder`](https://github.com/guiperry/text-embedder) directory.

### Qdrant Configuration

| Variable | Default | Description |
Expand Down
57 changes: 37 additions & 20 deletions src/services/embedding-config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,11 @@
* Embedding provider configuration — loaded from environment variables (MCP config).
*
* EMBEDDING_PROVIDER:
* - "ollama" (default): Use Ollama for embeddings (Docker or external).
* - "textembedder" (default): Use the deterministic text-embedder Go binary
* (Landmark Lattice v1). Zero dependencies, bit-perfect,
* 768-dim int32 vectors. The binary is spawned automatically
* or you can run it externally and set TEXTEMBEDDER_URL.
* - "ollama": Use Ollama for embeddings (Docker or external).
* - "openai": Use OpenAI Embeddings API. Requires OPENAI_API_KEY.
* - "google": Use Google Generative AI Embedding API. Requires GOOGLE_API_KEY.
* - "lmstudio": Use a local LM Studio server (OpenAI-compatible). Requires
Expand All @@ -14,6 +18,13 @@
* EMBEDDING_MODEL (must match an alias in the proxy's config.yaml),
* and EMBEDDING_DIMENSIONS (the alias's underlying dim).
*
* Text-Embedder-specific:
* TEXTEMBEDDER_URL: HTTP URL of an external text-embedder instance.
* When set, the provider skips spawning a local binary.
* TEXTEMBEDDER_BIN_PATH: Path to the text-embedder binary.
* Default: <cwd>/text-embedder
* TEXTEMBEDDER_PORT: Port for the subprocess (default: 8089).
*
* Ollama-specific:
* OLLAMA_MODE:
* - "auto" (default): Auto-detect. If Ollama is already running natively on port 11434,
Expand Down Expand Up @@ -61,7 +72,7 @@ import { logger } from "./logger.js";

// ── Types ─────────────────────────────────────────────────────────────────

export type EmbeddingProvider = "ollama" | "openai" | "google" | "lmstudio" | "litellm";
export type EmbeddingProvider = "textembedder" | "ollama" | "openai" | "google" | "lmstudio" | "litellm";
export type OllamaMode = "docker" | "external" | "auto";

export interface EmbeddingConfig {
Expand Down Expand Up @@ -91,11 +102,12 @@ export interface EmbeddingConfig {
* selected without explicit EMBEDDING_MODEL / EMBEDDING_DIMENSIONS.
*/
const PROVIDER_DEFAULTS: Record<EmbeddingProvider, { model: string; dimensions: number }> = {
ollama: { model: "nomic-embed-text", dimensions: 768 },
openai: { model: "text-embedding-3-small", dimensions: 1536 },
google: { model: "gemini-embedding-001", dimensions: 3072 },
lmstudio: { model: "", dimensions: 0 },
litellm: { model: "", dimensions: 0 },
textembedder: { model: "landmark-lattice-v1", dimensions: 768 },
ollama: { model: "nomic-embed-text", dimensions: 768 },
openai: { model: "text-embedding-3-small", dimensions: 1536 },
google: { model: "gemini-embedding-001", dimensions: 3072 },
lmstudio: { model: "", dimensions: 0 },
litellm: { model: "", dimensions: 0 },
};

// ── Ollama mode defaults ──────────────────────────────────────────────────
Expand All @@ -114,6 +126,8 @@ const MODE_DEFAULTS: Record<OllamaMode, { url: string }> = {
* and to stay within cloud provider limits.
*/
const MODEL_CONTEXT_LENGTHS: Record<string, number> = {
// Text-Embedder (Landmark Lattice — no real context limit, use generous default)
"landmark-lattice-v1": 8192,
// Ollama models
"nomic-embed-text": 2048,
"mxbai-embed-large": 512,
Expand Down Expand Up @@ -145,16 +159,17 @@ export function loadEmbeddingConfig(): EmbeddingConfig {
if (_config) return _config;

// ── Provider ────────────────────────────────────────────────────────
const rawProvider = process.env.EMBEDDING_PROVIDER || "ollama";
const rawProvider = process.env.EMBEDDING_PROVIDER || "textembedder";
if (
rawProvider !== "textembedder" &&
rawProvider !== "ollama" &&
rawProvider !== "openai" &&
rawProvider !== "google" &&
rawProvider !== "lmstudio" &&
rawProvider !== "litellm"
) {
throw new Error(
`Invalid EMBEDDING_PROVIDER: "${rawProvider}". Must be "ollama", "openai", "google", "lmstudio", or "litellm".`,
`Invalid EMBEDDING_PROVIDER: "${rawProvider}". Must be "textembedder", "ollama", "openai", "google", "lmstudio", or "litellm".`,
);
}
const embeddingProvider: EmbeddingProvider = rawProvider;
Expand Down Expand Up @@ -274,17 +289,19 @@ export function loadEmbeddingConfig(): EmbeddingConfig {
embeddingModel: _config.embeddingModel,
embeddingDimensions: _config.embeddingDimensions,
embeddingContextLength: _config.embeddingContextLength || "auto",
hasApiKey: !!(embeddingProvider === "ollama"
? _config.ollamaApiKey
: embeddingProvider === "openai"
? process.env.OPENAI_API_KEY
: embeddingProvider === "google"
? process.env.GOOGLE_API_KEY
: embeddingProvider === "lmstudio"
? process.env.LMSTUDIO_API_KEY
: embeddingProvider === "litellm"
? process.env.LITELLM_API_KEY
: undefined),
hasApiKey: !!(embeddingProvider === "textembedder"
? true // binary / external URL; no key needed
: embeddingProvider === "ollama"
? _config.ollamaApiKey
: embeddingProvider === "openai"
? process.env.OPENAI_API_KEY
: embeddingProvider === "google"
? process.env.GOOGLE_API_KEY
: embeddingProvider === "lmstudio"
? process.env.LMSTUDIO_API_KEY
: embeddingProvider === "litellm"
? process.env.LITELLM_API_KEY
: undefined),
Comment on lines +292 to +304
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

hasApiKey is inverted for textembedder in config logs.

This currently logs hasApiKey: true for textembedder even though the provider explicitly does not use an API key, which makes diagnostics misleading.

Suggested fix
-    hasApiKey: !!(embeddingProvider === "textembedder"
-      ? true // binary / external URL; no key needed
+    hasApiKey: !!(embeddingProvider === "textembedder"
+      ? false // binary / external URL; no key needed
       : embeddingProvider === "ollama"
         ? _config.ollamaApiKey
         : embeddingProvider === "openai"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
hasApiKey: !!(embeddingProvider === "textembedder"
? true // binary / external URL; no key needed
: embeddingProvider === "ollama"
? _config.ollamaApiKey
: embeddingProvider === "openai"
? process.env.OPENAI_API_KEY
: embeddingProvider === "google"
? process.env.GOOGLE_API_KEY
: embeddingProvider === "lmstudio"
? process.env.LMSTUDIO_API_KEY
: embeddingProvider === "litellm"
? process.env.LITELLM_API_KEY
: undefined),
hasApiKey: !!(embeddingProvider === "textembedder"
? false // binary / external URL; no key needed
: embeddingProvider === "ollama"
? _config.ollamaApiKey
: embeddingProvider === "openai"
? process.env.OPENAI_API_KEY
: embeddingProvider === "google"
? process.env.GOOGLE_API_KEY
: embeddingProvider === "lmstudio"
? process.env.LMSTUDIO_API_KEY
: embeddingProvider === "litellm"
? process.env.LITELLM_API_KEY
: undefined),
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/embedding-config.ts` around lines 292 - 304, The logged
hasApiKey value is inverted for the textembedder case; update the hasApiKey
computation (referencing hasApiKey and embeddingProvider) so textembedder yields
false (no API key) and all other providers evaluate whether their respective
env/config keys exist (e.g., _config.ollamaApiKey, process.env.OPENAI_API_KEY,
process.env.GOOGLE_API_KEY, process.env.LMSTUDIO_API_KEY,
process.env.LITELLM_API_KEY); ensure you return a boolean (use explicit
Boolean(...) or !!) and keep the same provider switch/conditional structure in
embedding-config.ts.

});

return _config;
Expand Down
10 changes: 8 additions & 2 deletions src/services/embedding-provider.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@
* about which backend generates the vectors.
*
* Providers:
* - ollama (default) — local Ollama (Docker or external)
* - textembedder (default) — deterministic Go binary (zero deps, bit-perfect)
* - ollama — local Ollama (Docker or external)
* - openai — OpenAI Embeddings API (text-embedding-3-small, etc.)
* - google — Google Generative AI Embedding API (gemini-embedding-001, etc.)
* - lmstudio — local LM Studio server via OpenAI-compatible API
Expand Down Expand Up @@ -46,6 +47,11 @@ export async function getEmbeddingProvider(onProgress?: InfraProgressCallback):
logger.info("Initializing embedding provider", { provider: name });

switch (name) {
case "textembedder": {
const { TextEmbedderEmbeddingProvider } = await import("./provider-textembedder.js");
_provider = new TextEmbedderEmbeddingProvider();
break;
}
case "ollama": {
// Dynamic imports avoid loading all provider SDKs at startup.
const { OllamaEmbeddingProvider } = await import("./provider-ollama.js");
Expand Down Expand Up @@ -74,7 +80,7 @@ export async function getEmbeddingProvider(onProgress?: InfraProgressCallback):
}
default:
throw new Error(
`Unknown embedding provider: "${name}". Must be "ollama", "openai", "google", "lmstudio", or "litellm".`,
`Unknown embedding provider: "${name}". Must be "textembedder", "ollama", "openai", "google", "lmstudio", or "litellm".`,
);
}

Expand Down
Loading
Loading