Litellm#57
Conversation
merging with upstream
merging with upstream
merging with upstream
LiteLLM Proxy Server (https://docs.litellm.ai/docs/simple_proxy) exposes an OpenAI-compatible /v1/embeddings endpoint and fans out to 100+ underlying providers (OpenAI, Anthropic, Cohere, Voyage, HuggingFace, Bedrock, Vertex AI, Ollama, ...). Mirroring the lmstudio strategy (PR giancarloerra#42 + commit bb141a0) but with three meaningful differences that justify a dedicated provider rather than a flag on provider-openai: - Authentication is mandatory. LiteLLM gates /v1/models with the master key or a virtual key, unlike LM Studio (no auth by default) and OpenAI (cloud key). LITELLM_API_KEY is checked at config-load time; the provider also duck-types 401/403 in ensureReady/healthCheck via err.status to surface a distinct "auth rejected" message vs. "proxy unreachable". - Model aliases come from the proxy's config.yaml, so EMBEDDING_MODEL and EMBEDDING_DIMENSIONS have no sensible defaults. Fail-fast in loadEmbeddingConfig with provider-specific error messages pointing at litellm_params.model in the proxy config and at the underlying alias's output dim. - Whether dimensions can be forwarded depends on the underlying provider: Matryoshka-aware models (text-embedding-3-*, voyage-3) accept it, non-Matryoshka backends (BGE, nomic, Cohere v3) reject. Made opt-in via LITELLM_SEND_DIMENSIONS=true rather than hardcoded like provider-openai does for text-embedding-3-*, since LiteLLM aliases are user-defined. Encoding-format=float fix from bb141a0 ports verbatim — the OpenAI SDK 6.x base64-decode path corrupts any backend that returns plain JSON float arrays (many LiteLLM aliases do, including Ollama-routed and tei-wrapped ones). Files: - src/services/provider-litellm.ts: new LiteLLMEmbeddingProvider with the same OpenAI-SDK + custom baseURL pattern. Default baseURL http://localhost:4000/v1 (LiteLLM's default port, /v1 prefix required). Batch size 256 — between OpenAI's 512 and LM Studio's 64, since the practical ceiling depends on whichever provider the alias resolves to. ensureReady distinguishes proxy-unreachable / auth-rejected / alias-not-registered. Lists up to 10 currently-registered models in the alias-missing error so the operator can sanity-check their config.yaml without leaving the log. - src/services/embedding-config.ts: extends EmbeddingProvider union with "litellm", adds litellmUrl to EmbeddingConfig, fail-fast validation for LITELLM_API_KEY + EMBEDDING_MODEL + EMBEDDING_DIMENSIONS (key first so a virtual-key user fixes the easy problem before touching the proxy config), updates Invalid EMBEDDING_PROVIDER message and hasApiKey log expression. - src/services/embedding-provider.ts: factory case for litellm with dynamic import to avoid loading the OpenAI SDK at startup for non-litellm users. - README.md: dedicated LiteLLM section, MCP host config example, env-var table entries for EMBEDDING_PROVIDER / EMBEDDING_MODEL / EMBEDDING_DIMENSIONS / EMBEDDING_CONTEXT_LENGTH (clarifying which require manual values for litellm), new LiteLLM Configuration table. - tests/unit/embedding-config.test.ts: 9 new cases (model + dim + key required, error-ordering, URL default + override, dimensions parsing, EMBEDDING_CONTEXT_LENGTH override for unknown aliases, auto-detection when alias matches a known model name) plus updated "full external config" expected object and updated invalid-provider error message. - tests/unit/embedding-provider.test.ts: factory test for litellm, plus 4 cases against a deliberately-closed port (config rejects construction without API_KEY, ensureReady unreachable error format, healthCheck short-circuits on missing key without a network call, healthCheck reaches "Not reachable" path without throwing). Backward compatible. The litellm provider is opt-in via EMBEDDING_PROVIDER=litellm. Existing ollama, openai, google, and lmstudio paths are untouched. Verified: 64/64 unit tests pass on the touched suites; biome lint clean; tsc --noEmit clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
📝 WalkthroughWalkthroughAdds LiteLLM as a new embedding provider: extends config types and validation, wires provider into factory, implements LiteLLM provider (client caching, auth/reachability checks, batching/truncation, optional dimensions), updates README, and adds unit tests. ChangesLiteLLM Embedding Provider
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/services/provider-litellm.ts (2)
144-181: ⚡ Quick win
models.list().datais first-page only — replace with auto-pagination in bothensureReadyandhealthCheckList methods in the OpenAI SDK are paginated;
.dataholds only the current page — usefor await … ofto automatically fetch more pages as needed. For a LiteLLM proxy that returns all registered aliases in a single response this is safe today, but a large enterprise deployment whose/v1/modelsresponse spans multiple pages could produce a false "alias not registered" error on everyensureReady()andhealthCheck()call.♻️ Proposed fix — use auto-pagination in both methods
ensureReady()(line 146):- let modelList: Awaited<ReturnType<typeof client.models.list>>; - try { - modelList = await client.models.list(); - } catch (err) { + const allModelIds: string[] = []; + try { + for await (const m of await client.models.list()) { + allModelIds.push(m.id); + } + } catch (err) { // ... error-handling block unchanged ... } - const modelRegistered = modelList.data.some((m) => m.id === config.embeddingModel); + const modelRegistered = allModelIds.some((id) => id === config.embeddingModel); if (!modelRegistered) { - const known = modelList.data.map((m) => m.id).slice(0, 10).join(", "); + const known = allModelIds.slice(0, 10).join(", ");
healthCheck()(line 239):- const models = await client.models.list(); - lines.push(`${icon(true)} LiteLLM: Reachable at ${config.litellmUrl}`); - const modelRegistered = models.data.some((m) => m.id === config.embeddingModel); + const allModelIds: string[] = []; + for await (const m of await client.models.list()) { + allModelIds.push(m.id); + } + lines.push(`${icon(true)} LiteLLM: Reachable at ${config.litellmUrl}`); + const modelRegistered = allModelIds.some((id) => id === config.embeddingModel);Also applies to: 237-260
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/provider-litellm.ts` around lines 144 - 181, The code incorrectly assumes client.models.list().data contains all models; update both ensureReady and healthCheck to use the SDK's auto-pagination (for await … of) to iterate all returned pages/items from client.models.list(), accumulating model ids or checking membership across pages instead of only checking modelList.data; reference the existing client.models.list call and config.embeddingModel when implementing the iteration and replace the .data-based membership check (and any slicing of modelList.data) with logic that gathers ids (or tests equality) from all pages/items before throwing the "model not registered" error.
144-181: ⚡ Quick winFuture-proof model alias check to handle paginated LiteLLM proxies
The OpenAI SDK's
models.list()returns paginated results;.datacontains only the first page. While most LiteLLM deployments are non-paginated (all aliases returned at once), larger enterprise proxies could have models on later pages, causing a false "not registered" error.Iterate through all pages using
for await:Suggested fix
In
ensureReady()(lines 144–150):- let modelList: Awaited<ReturnType<typeof client.models.list>>; - try { - modelList = await client.models.list(); + const allModelIds: string[] = []; + try { + for await (const m of await client.models.list()) { + allModelIds.push(m.id); + } - const modelRegistered = modelList.data.some((m) => m.id === config.embeddingModel); + const modelRegistered = allModelIds.some((id) => id === config.embeddingModel); - const known = modelList.data.map((m) => m.id).slice(0, 10).join(", "); + const known = allModelIds.slice(0, 10).join(", ");In
healthCheck()(line 239):- const models = await client.models.list(); - const modelRegistered = models.data.some((m) => m.id === config.embeddingModel); + const allModelIds: string[] = []; + for await (const m of await client.models.list()) { + allModelIds.push(m.id); + } + const modelRegistered = allModelIds.some((id) => id === config.embeddingModel);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/provider-litellm.ts` around lines 144 - 181, The model registration check in ensureReady (and the similar check in healthCheck) only inspects modelList.data from a single page returned by client.models.list(), which can miss models on subsequent pages; replace the single-page logic by iterating all pages from client.models.list() (use for await over client.models.list() or the SDK's pager) to collect or search every model id before setting modelRegistered and building the known list; update references in ensureReady, healthCheck, client.models.list, modelRegistered, and config.embeddingModel so the check and the "Currently registered models" message reflect the merged results of all pages.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/services/provider-litellm.ts`:
- Around line 144-181: The code incorrectly assumes client.models.list().data
contains all models; update both ensureReady and healthCheck to use the SDK's
auto-pagination (for await … of) to iterate all returned pages/items from
client.models.list(), accumulating model ids or checking membership across pages
instead of only checking modelList.data; reference the existing
client.models.list call and config.embeddingModel when implementing the
iteration and replace the .data-based membership check (and any slicing of
modelList.data) with logic that gathers ids (or tests equality) from all
pages/items before throwing the "model not registered" error.
- Around line 144-181: The model registration check in ensureReady (and the
similar check in healthCheck) only inspects modelList.data from a single page
returned by client.models.list(), which can miss models on subsequent pages;
replace the single-page logic by iterating all pages from client.models.list()
(use for await over client.models.list() or the SDK's pager) to collect or
search every model id before setting modelRegistered and building the known
list; update references in ensureReady, healthCheck, client.models.list,
modelRegistered, and config.embeddingModel so the check and the "Currently
registered models" message reflect the merged results of all pages.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 514cf71c-320e-4089-9103-b23143a31b10
📒 Files selected for processing (6)
README.mdsrc/services/embedding-config.tssrc/services/embedding-provider.tssrc/services/provider-litellm.tstests/unit/embedding-config.test.tstests/unit/embedding-provider.test.ts
The OpenAI SDK's `client.models.list()` returns a `PagePromise` that implements `AsyncIterable<Model>` and auto-paginates on demand. The previous implementation read `modelList.data` directly, which only contains the first page. Today's LiteLLM proxy returns the entire `model_list` from `config.yaml` in a single response so the bug is latent, but a future LiteLLM build (or an upstream proxy in front of it) that paginates `/v1/models` would cause `ensureReady` and `healthCheck` to throw a spurious "alias not registered" error for any alias landing on a non-first page. Switch both checks to `for await (const m of client.models.list())` and accumulate ids into a single array. Equivalent to the SDK's documented async-iteration pattern; `PagePromise` is itself the iterable, so no extra `await` is needed before the loop. Inline comment explains why the iteration matters even though today's LiteLLM doesn't paginate, so the pattern survives future drive-by "simplifications". Surfaced by CodeRabbit on PR review of 1708510. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/services/provider-litellm.ts (2)
78-107:getClient()creates a newOpenAIinstance on every API-key or URL change — consider the concurrent-call edge case
getClient()checks three variables to decide whether to reuse or replace the cached client. This is correct for sequential callers, but if two concurrentembed()calls reach theif (!litellmClient || ...)branch simultaneously (e.g., afterresetLiteLLMClient()), both will construct a newOpenAIinstance and the second write will silently win. In practice this is unlikely to matter (both instances are functionally identical for the same key/URL), but it meanslitellmApiKeycould briefly hold a stale reference if env changes mid-flight. This is a design-level observation, not a correctness bug in the typical single-threaded Node.js event-loop model.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/provider-litellm.ts` around lines 78 - 107, getClient() can race when two callers concurrently detect the cache miss and both construct an OpenAI instance; change getClient to use a double-checked assignment or a short critical section so only one constructed client is stored: compute the apiKey/baseUrl as now, then if the cached litellmClient looks stale create a local newClient, but before writing set litellmClient re-check (litellmClient, litellmBaseUrl, litellmApiKey) and only assign the newClient if the cache is still stale; alternatively implement a small Promise/mutex guard around the cache-update so resetLiteLLMClient(), getClient(), and the variables litellmClient, litellmBaseUrl, litellmApiKey are updated atomically.
275-306: 💤 Low value
response.data.sort()mutates the response object in place
Array.prototype.sortis destructive. Whileresponse.datais a local, not-shared reference here, a defensive spread avoids surprising any future code that retains a reference toresponse.♻️ Proposed fix
- const sorted = response.data.sort((a, b) => a.index - b.index); + const sorted = [...response.data].sort((a, b) => a.index - b.index);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/services/provider-litellm.ts` around lines 275 - 306, In _embedBatch, avoid mutating response.data with Array.prototype.sort; instead make a shallow copy (e.g., via [...response.data] or response.data.slice()) and sort that copy before mapping so the original response object is not mutated; update the code around the variable `response` and the `sorted` assignment to sort the copied array and then return the embeddings from the sorted copy.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/services/provider-litellm.ts`:
- Around line 78-107: getClient() can race when two callers concurrently detect
the cache miss and both construct an OpenAI instance; change getClient to use a
double-checked assignment or a short critical section so only one constructed
client is stored: compute the apiKey/baseUrl as now, then if the cached
litellmClient looks stale create a local newClient, but before writing set
litellmClient re-check (litellmClient, litellmBaseUrl, litellmApiKey) and only
assign the newClient if the cache is still stale; alternatively implement a
small Promise/mutex guard around the cache-update so resetLiteLLMClient(),
getClient(), and the variables litellmClient, litellmBaseUrl, litellmApiKey are
updated atomically.
- Around line 275-306: In _embedBatch, avoid mutating response.data with
Array.prototype.sort; instead make a shallow copy (e.g., via [...response.data]
or response.data.slice()) and sort that copy before mapping so the original
response object is not mutated; update the code around the variable `response`
and the `sorted` assignment to sort the copied array and then return the
embeddings from the sorted copy.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b0a9e765-2e80-4591-aeac-89b9051fc06a
📒 Files selected for processing (1)
src/services/provider-litellm.ts
|
Addressed CodeRabbit's pagination concern on Concern (lines 144–181 of Fix: Both Diff: 1 file changed, +19 / −6. No behavioural change against the current LiteLLM proxy; future-proofs against paginated |
Summary
Adds LiteLLM Proxy Server as a first-class embedding provider. LiteLLM exposes an OpenAI-compatible
/v1/embeddingsendpoint and fans out to 100+ underlying providers (OpenAI, Anthropic, Cohere, Voyage, HuggingFace, Bedrock, Vertex AI, Ollama, ...), so a single SocratiCode install can target any of them without per-provider code paths. The implementation mirrors the LM Studio strategy but is a dedicated provider rather than a flag onprovider-openai, because three behaviours diverge meaningfully from both OpenAI and LM Studio: mandatory authentication (master key / virtual key gates/v1/models), no sensible defaults forEMBEDDING_MODEL/EMBEDDING_DIMENSIONS(model aliases come from the operator'sconfig.yaml), and provider-dependent acceptance of thedimensionsparameter (Matryoshka-aware backends accept it, BGE / nomic / Cohere v3 reject — opt-in viaLITELLM_SEND_DIMENSIONS=true).Changes
src/services/provider-litellm.ts(+303 LOC):LiteLLMEmbeddingProviderbuilt on the OpenAI SDK with a custombaseURL(defaulthttp://localhost:4000/v1). Batch size 256 (between OpenAI's 512 and LM Studio's 64).ensureReadydistinguishes proxy-unreachable / auth-rejected / alias-not-registered by duck-typingerr.status(401/403) and lists up to 10 currently-registered aliases in the missing-alias error so the operator can sanity-checkconfig.yamlfrom the log alone.src/services/embedding-config.ts: extendsEmbeddingProviderunion with"litellm", addslitellmUrl, fail-fast validation forLITELLM_API_KEY+EMBEDDING_MODEL+EMBEDDING_DIMENSIONS(key checked first so a virtual-key user fixes the easy problem before touching proxy config), updates the Invalid EMBEDDING_PROVIDER error and thehasApiKeylog expression.src/services/embedding-provider.ts: factory case forlitellmwith dynamic import — keeps the OpenAI SDK out of startup for non-LiteLLM users.encoding_format=floatfix frombb141a0ported verbatim — without it the OpenAI SDK 6.x base64-decode path corrupts backends that return plain JSON float arrays (many LiteLLM aliases do, including Ollama-routed and TEI-wrapped ones).README.md: dedicated LiteLLM section, MCP host config example, env-var table entries clarifying which values are mandatory underlitellm, new LiteLLM Configuration table.tests/unit/embedding-config.test.ts: +9 cases (model/dim/key required, error-ordering, URL default + override, dimensions parsing,EMBEDDING_CONTEXT_LENGTHoverride for unknown aliases, auto-detection when alias matches a known model name) + updated full external config fixture and invalid-provider error message.tests/unit/embedding-provider.test.ts: factory test forlitellm, plus 4 cases against a deliberately-closed port (config rejects construction withoutAPI_KEY;ensureReadyunreachable error format;healthCheckshort-circuits on missing key without a network call;healthCheckreaches Not reachable without throwing).Backward compatible. Provider is opt-in via
EMBEDDING_PROVIDER=litellm. Existingollama/openai/google/lmstudiopaths are untouched.Diffstat: 6 files changed, +638 / −14.
Type of change
Testing
npm run test:unit) — 64/64 on the touched suites (embedding-config,embedding-provider); alsobiomelint clean per the commit author.npm run test:integration) — 153/154 withQDRANT_MODE=externalagainst a remote Qdrant; 150/153 with the default Docker-managed Qdrant. Both runs were performed against this commit (1708510). Detail on the residual failures:tests/integration/docker-ollama.test.ts > Qdrant container management > reports Qdrant as running after ensure. The test asserts that a Docker-managed Qdrant container is up; withQDRANT_MODE=externalno such container is started, so the assertion is structurally incompatible with the config — not a regression introduced by this PR (this PR doesn't touch the Docker-Qdrant code path).tools.test.ts > impact-analysis tool handlerssetup +graph tool handlers > codebase_graph_query+graph tool handlers > codebase_graph_remove) all surface asECONNREFUSED 127.0.0.1:16333from the@qdrant/js-client-restclient, while the parallelqdrant.test.ts/code-graph.test.ts/indexer.test.tssuites use Qdrant successfully in the same run. The same three tests pass cleanly against the external Qdrant. Looks like a pre-existing race / suite-ordering issue in the local managed-Qdrant test infrastructure (container teardown vs. cross-suite client reuse), not embeddings-related — also not introduced by this PR.npx tsc --noEmit) — per commit author's verification.embedding-config.test.tscovering provider validation; 5 inembedding-provider.test.tscovering the factory and the closed-port error paths.Checklist
provider-lmstudio/provider-openaishape; OpenAI-SDK + custombaseURLpattern; ESM.jsimport extensions; SPDX header).LiteLLMEmbeddingProviderand the new fields onEmbeddingConfig).README.mdgains a dedicated LiteLLM section and a configuration table; env-var table updated forlitellm-specific mandatory values.Related issues
Summary by CodeRabbit
New Features
Documentation
Tests
Bug Fixes / Validation