Skip to content

Litellm#57

Merged
giancarloerra merged 5 commits into
giancarloerra:mainfrom
airmonitor:litellm
May 8, 2026
Merged

Litellm#57
giancarloerra merged 5 commits into
giancarloerra:mainfrom
airmonitor:litellm

Conversation

@airmonitor
Copy link
Copy Markdown
Contributor

@airmonitor airmonitor commented May 8, 2026

Summary

Adds LiteLLM Proxy Server as a first-class embedding provider. LiteLLM exposes an OpenAI-compatible /v1/embeddings endpoint and fans out to 100+ underlying providers (OpenAI, Anthropic, Cohere, Voyage, HuggingFace, Bedrock, Vertex AI, Ollama, ...), so a single SocratiCode install can target any of them without per-provider code paths. The implementation mirrors the LM Studio strategy but is a dedicated provider rather than a flag on provider-openai, because three behaviours diverge meaningfully from both OpenAI and LM Studio: mandatory authentication (master key / virtual key gates /v1/models), no sensible defaults for EMBEDDING_MODEL / EMBEDDING_DIMENSIONS (model aliases come from the operator's config.yaml), and provider-dependent acceptance of the dimensions parameter (Matryoshka-aware backends accept it, BGE / nomic / Cohere v3 reject — opt-in via LITELLM_SEND_DIMENSIONS=true).

Changes

  • New src/services/provider-litellm.ts (+303 LOC): LiteLLMEmbeddingProvider built on the OpenAI SDK with a custom baseURL (default http://localhost:4000/v1). Batch size 256 (between OpenAI's 512 and LM Studio's 64). ensureReady distinguishes proxy-unreachable / auth-rejected / alias-not-registered by duck-typing err.status (401/403) and lists up to 10 currently-registered aliases in the missing-alias error so the operator can sanity-check config.yaml from the log alone.
  • src/services/embedding-config.ts: extends EmbeddingProvider union with "litellm", adds litellmUrl, fail-fast validation for LITELLM_API_KEY + EMBEDDING_MODEL + EMBEDDING_DIMENSIONS (key checked first so a virtual-key user fixes the easy problem before touching proxy config), updates the Invalid EMBEDDING_PROVIDER error and the hasApiKey log expression.
  • src/services/embedding-provider.ts: factory case for litellm with dynamic import — keeps the OpenAI SDK out of startup for non-LiteLLM users.
  • encoding_format=float fix from bb141a0 ported verbatim — without it the OpenAI SDK 6.x base64-decode path corrupts backends that return plain JSON float arrays (many LiteLLM aliases do, including Ollama-routed and TEI-wrapped ones).
  • README.md: dedicated LiteLLM section, MCP host config example, env-var table entries clarifying which values are mandatory under litellm, new LiteLLM Configuration table.
  • tests/unit/embedding-config.test.ts: +9 cases (model/dim/key required, error-ordering, URL default + override, dimensions parsing, EMBEDDING_CONTEXT_LENGTH override for unknown aliases, auto-detection when alias matches a known model name) + updated full external config fixture and invalid-provider error message.
  • tests/unit/embedding-provider.test.ts: factory test for litellm, plus 4 cases against a deliberately-closed port (config rejects construction without API_KEY; ensureReady unreachable error format; healthCheck short-circuits on missing key without a network call; healthCheck reaches Not reachable without throwing).

Backward compatible. Provider is opt-in via EMBEDDING_PROVIDER=litellm. Existing ollama / openai / google / lmstudio paths are untouched.

Diffstat: 6 files changed, +638 / −14.

Type of change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Test coverage improvement

Testing

  • Unit tests pass (npm run test:unit) — 64/64 on the touched suites (embedding-config, embedding-provider); also biome lint clean per the commit author.
  • Integration tests pass (npm run test:integration) — 153/154 with QDRANT_MODE=external against a remote Qdrant; 150/153 with the default Docker-managed Qdrant. Both runs were performed against this commit (1708510). Detail on the residual failures:
    • The 1 failure on the external-Qdrant run is tests/integration/docker-ollama.test.ts > Qdrant container management > reports Qdrant as running after ensure. The test asserts that a Docker-managed Qdrant container is up; with QDRANT_MODE=external no such container is started, so the assertion is structurally incompatible with the config — not a regression introduced by this PR (this PR doesn't touch the Docker-Qdrant code path).
    • The 3 failures on the managed-Qdrant run (tools.test.ts > impact-analysis tool handlers setup + graph tool handlers > codebase_graph_query + graph tool handlers > codebase_graph_remove) all surface as ECONNREFUSED 127.0.0.1:16333 from the @qdrant/js-client-rest client, while the parallel qdrant.test.ts / code-graph.test.ts / indexer.test.ts suites use Qdrant successfully in the same run. The same three tests pass cleanly against the external Qdrant. Looks like a pre-existing race / suite-ordering issue in the local managed-Qdrant test infrastructure (container teardown vs. cross-suite client reuse), not embeddings-related — also not introduced by this PR.
  • TypeScript compiles cleanly (npx tsc --noEmit) — per commit author's verification.
  • New tests added for new/changed functionality — 9 new cases in embedding-config.test.ts covering provider validation; 5 in embedding-provider.test.ts covering the factory and the closed-port error paths.

Checklist

  • My code follows the existing code style and conventions (mirrors the existing provider-lmstudio / provider-openai shape; OpenAI-SDK + custom baseURL pattern; ESM .js import extensions; SPDX header).
  • I have added/updated JSDoc comments where appropriate (public methods on LiteLLMEmbeddingProvider and the new fields on EmbeddingConfig).
  • I have updated documentation (README.md / DEVELOPER.md) if needed — README.md gains a dedicated LiteLLM section and a configuration table; env-var table updated for litellm-specific mandatory values.
  • I have addressed all CodeRabbit review comments (or marked as resolved with explanation) — to be confirmed once CodeRabbit runs against the opened PR.
  • I have read the Contributing Guide
  • I agree to the Contributor License Agreement

Related issues

Summary by CodeRabbit

  • New Features

    • Added LiteLLM as a supported embedding provider, enabling access to 100+ providers via a single environment-variable selection.
  • Documentation

    • Expanded configuration docs with LiteLLM setup, example JSON, required env vars, and optional dimension-forwarding guidance.
  • Tests

    • Added unit tests covering LiteLLM configuration, provider selection, readiness, and health-check scenarios.
  • Bug Fixes / Validation

    • Improved fail-fast validation and clearer error messages for missing LiteLLM-related configuration.

airmonitor and others added 4 commits May 4, 2026 12:05
LiteLLM Proxy Server (https://docs.litellm.ai/docs/simple_proxy) exposes an
OpenAI-compatible /v1/embeddings endpoint and fans out to 100+ underlying
providers (OpenAI, Anthropic, Cohere, Voyage, HuggingFace, Bedrock, Vertex AI,
Ollama, ...). Mirroring the lmstudio strategy (PR giancarloerra#42 + commit bb141a0) but
with three meaningful differences that justify a dedicated provider rather
than a flag on provider-openai:

- Authentication is mandatory. LiteLLM gates /v1/models with the master key
  or a virtual key, unlike LM Studio (no auth by default) and OpenAI (cloud
  key). LITELLM_API_KEY is checked at config-load time; the provider also
  duck-types 401/403 in ensureReady/healthCheck via err.status to surface a
  distinct "auth rejected" message vs. "proxy unreachable".
- Model aliases come from the proxy's config.yaml, so EMBEDDING_MODEL and
  EMBEDDING_DIMENSIONS have no sensible defaults. Fail-fast in
  loadEmbeddingConfig with provider-specific error messages pointing at
  litellm_params.model in the proxy config and at the underlying alias's
  output dim.
- Whether dimensions can be forwarded depends on the underlying provider:
  Matryoshka-aware models (text-embedding-3-*, voyage-3) accept it,
  non-Matryoshka backends (BGE, nomic, Cohere v3) reject. Made opt-in via
  LITELLM_SEND_DIMENSIONS=true rather than hardcoded like provider-openai
  does for text-embedding-3-*, since LiteLLM aliases are user-defined.

Encoding-format=float fix from bb141a0 ports verbatim — the OpenAI SDK 6.x
base64-decode path corrupts any backend that returns plain JSON float arrays
(many LiteLLM aliases do, including Ollama-routed and tei-wrapped ones).

Files:

- src/services/provider-litellm.ts: new LiteLLMEmbeddingProvider with the
  same OpenAI-SDK + custom baseURL pattern. Default baseURL
  http://localhost:4000/v1 (LiteLLM's default port, /v1 prefix required).
  Batch size 256 — between OpenAI's 512 and LM Studio's 64, since the
  practical ceiling depends on whichever provider the alias resolves to.
  ensureReady distinguishes proxy-unreachable / auth-rejected /
  alias-not-registered. Lists up to 10 currently-registered models in the
  alias-missing error so the operator can sanity-check their config.yaml
  without leaving the log.
- src/services/embedding-config.ts: extends EmbeddingProvider union with
  "litellm", adds litellmUrl to EmbeddingConfig, fail-fast validation for
  LITELLM_API_KEY + EMBEDDING_MODEL + EMBEDDING_DIMENSIONS (key first so a
  virtual-key user fixes the easy problem before touching the proxy
  config), updates Invalid EMBEDDING_PROVIDER message and hasApiKey log
  expression.
- src/services/embedding-provider.ts: factory case for litellm with dynamic
  import to avoid loading the OpenAI SDK at startup for non-litellm users.
- README.md: dedicated LiteLLM section, MCP host config example, env-var
  table entries for EMBEDDING_PROVIDER / EMBEDDING_MODEL /
  EMBEDDING_DIMENSIONS / EMBEDDING_CONTEXT_LENGTH (clarifying which require
  manual values for litellm), new LiteLLM Configuration table.
- tests/unit/embedding-config.test.ts: 9 new cases (model + dim + key
  required, error-ordering, URL default + override, dimensions parsing,
  EMBEDDING_CONTEXT_LENGTH override for unknown aliases, auto-detection
  when alias matches a known model name) plus updated "full external
  config" expected object and updated invalid-provider error message.
- tests/unit/embedding-provider.test.ts: factory test for litellm, plus 4
  cases against a deliberately-closed port (config rejects construction
  without API_KEY, ensureReady unreachable error format, healthCheck
  short-circuits on missing key without a network call, healthCheck
  reaches "Not reachable" path without throwing).

Backward compatible. The litellm provider is opt-in via
EMBEDDING_PROVIDER=litellm. Existing ollama, openai, google, and lmstudio
paths are untouched.

Verified: 64/64 unit tests pass on the touched suites; biome lint clean;
tsc --noEmit clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

Adds LiteLLM as a new embedding provider: extends config types and validation, wires provider into factory, implements LiteLLM provider (client caching, auth/reachability checks, batching/truncation, optional dimensions), updates README, and adds unit tests.

Changes

LiteLLM Embedding Provider

Layer / File(s) Summary
Configuration Types & Validation
src/services/embedding-config.ts
EmbeddingProvider type extended to include "litellm". EmbeddingConfig interface gains litellmUrl and sendDimensions fields. Provider validation recognizes litellm.
Environment Validation & Config Loading
src/services/embedding-config.ts
LiteLLM-specific validation requires LITELLM_API_KEY, EMBEDDING_MODEL, and EMBEDDING_DIMENSIONS. Resolver sets litellmUrl from env (defaults to /v1) and sendDimensions from LITELLM_SEND_DIMENSIONS. hasApiKey derivation extended to check LITELLM_API_KEY.
Provider Factory Wiring
src/services/embedding-provider.ts
getEmbeddingProvider adds a case "litellm" branch that dynamically imports and instantiates LiteLLMEmbeddingProvider. Allowed-provider messaging updated to include litellm.
LiteLLM Provider Implementation
src/services/provider-litellm.ts
Implements LiteLLMEmbeddingProvider with cached SDK client keyed by base URL+API key, exported resetLiteLLMClient(), ensureReady() that distinguishes auth vs reachability via models.list(), embed()/embedSingle() with truncation and batching, healthCheck() reporting, and _embedBatch() using encoding_format: "float" with optional dimensions forwarding.
Configuration Tests
tests/unit/embedding-config.test.ts
Test setup clears LiteLLM env vars. Full-config expectation includes litellmUrl. Invalid provider error list includes litellm. Adds litellm test suite for defaults, required env validation, error precedence, example-dimensions in errors, and EMBEDDING_CONTEXT_LENGTH behavior.
Provider Factory & Instance Tests
tests/unit/embedding-provider.test.ts
Factory tests include litellm instantiation. New LiteLLMEmbeddingProvider test suite covers construction-time env validation, ensureReady() unreachable-proxy behavior, and healthCheck() reporting for missing API key and unreachable proxy.
Documentation
README.md
Updates features wording to mention LiteLLM, adds "LiteLLM (proxy gateway, 100+ providers)" configuration subsection with MCP JSON example and env-var docs, updates embedding-provider env table to include litellm, and adds a dedicated LiteLLM env-var reference section.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • giancarloerra/SocratiCode#42: Both PRs add a new embedding provider backend and make parallel changes across configuration, factory, provider implementation, tests, and README.
  • giancarloerra/SocratiCode#24: Both PRs modify src/services/embedding-config.ts for environment parsing and validation of embedding model/dimensions.

Poem

🐰 I hopped in with a tiny key,

LiteLLM waved, "Come play with me!"
Batches hum and tokens shrink,
Health checks chirp and logs all sync.
Config set, tests run free — embeddings bloom for you and me.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Litellm' is vague and generic, using only a single word that names the feature without describing the change (addition of a new embedding provider). Consider a more descriptive title like 'Add LiteLLM as embedding provider' or 'Support LiteLLM proxy gateway for embeddings' to clarify the main change.
✅ Passed checks (4 passed)
Check name Status Explanation
Description check ✅ Passed The pull request description is comprehensive and complete, covering all required template sections with detailed context about the implementation, testing results, and backward compatibility.
Docstring Coverage ✅ Passed Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/services/provider-litellm.ts (2)

144-181: ⚡ Quick win

models.list().data is first-page only — replace with auto-pagination in both ensureReady and healthCheck

List methods in the OpenAI SDK are paginated; .data holds only the current page — use for await … of to automatically fetch more pages as needed. For a LiteLLM proxy that returns all registered aliases in a single response this is safe today, but a large enterprise deployment whose /v1/models response spans multiple pages could produce a false "alias not registered" error on every ensureReady() and healthCheck() call.

♻️ Proposed fix — use auto-pagination in both methods

ensureReady() (line 146):

-    let modelList: Awaited<ReturnType<typeof client.models.list>>;
-    try {
-      modelList = await client.models.list();
-    } catch (err) {
+    const allModelIds: string[] = [];
+    try {
+      for await (const m of await client.models.list()) {
+        allModelIds.push(m.id);
+      }
+    } catch (err) {
       // ... error-handling block unchanged ...
     }
-    const modelRegistered = modelList.data.some((m) => m.id === config.embeddingModel);
+    const modelRegistered = allModelIds.some((id) => id === config.embeddingModel);
     if (!modelRegistered) {
-      const known = modelList.data.map((m) => m.id).slice(0, 10).join(", ");
+      const known = allModelIds.slice(0, 10).join(", ");

healthCheck() (line 239):

-      const models = await client.models.list();
-      lines.push(`${icon(true)} LiteLLM: Reachable at ${config.litellmUrl}`);
-      const modelRegistered = models.data.some((m) => m.id === config.embeddingModel);
+      const allModelIds: string[] = [];
+      for await (const m of await client.models.list()) {
+        allModelIds.push(m.id);
+      }
+      lines.push(`${icon(true)} LiteLLM: Reachable at ${config.litellmUrl}`);
+      const modelRegistered = allModelIds.some((id) => id === config.embeddingModel);

Also applies to: 237-260

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/provider-litellm.ts` around lines 144 - 181, The code
incorrectly assumes client.models.list().data contains all models; update both
ensureReady and healthCheck to use the SDK's auto-pagination (for await … of) to
iterate all returned pages/items from client.models.list(), accumulating model
ids or checking membership across pages instead of only checking modelList.data;
reference the existing client.models.list call and config.embeddingModel when
implementing the iteration and replace the .data-based membership check (and any
slicing of modelList.data) with logic that gathers ids (or tests equality) from
all pages/items before throwing the "model not registered" error.

144-181: ⚡ Quick win

Future-proof model alias check to handle paginated LiteLLM proxies

The OpenAI SDK's models.list() returns paginated results; .data contains only the first page. While most LiteLLM deployments are non-paginated (all aliases returned at once), larger enterprise proxies could have models on later pages, causing a false "not registered" error.

Iterate through all pages using for await:

Suggested fix

In ensureReady() (lines 144–150):

-    let modelList: Awaited<ReturnType<typeof client.models.list>>;
-    try {
-      modelList = await client.models.list();
+    const allModelIds: string[] = [];
+    try {
+      for await (const m of await client.models.list()) {
+        allModelIds.push(m.id);
+      }
-    const modelRegistered = modelList.data.some((m) => m.id === config.embeddingModel);
+    const modelRegistered = allModelIds.some((id) => id === config.embeddingModel);
-      const known = modelList.data.map((m) => m.id).slice(0, 10).join(", ");
+      const known = allModelIds.slice(0, 10).join(", ");

In healthCheck() (line 239):

-      const models = await client.models.list();
-      const modelRegistered = models.data.some((m) => m.id === config.embeddingModel);
+      const allModelIds: string[] = [];
+      for await (const m of await client.models.list()) {
+        allModelIds.push(m.id);
+      }
+      const modelRegistered = allModelIds.some((id) => id === config.embeddingModel);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/provider-litellm.ts` around lines 144 - 181, The model
registration check in ensureReady (and the similar check in healthCheck) only
inspects modelList.data from a single page returned by client.models.list(),
which can miss models on subsequent pages; replace the single-page logic by
iterating all pages from client.models.list() (use for await over
client.models.list() or the SDK's pager) to collect or search every model id
before setting modelRegistered and building the known list; update references in
ensureReady, healthCheck, client.models.list, modelRegistered, and
config.embeddingModel so the check and the "Currently registered models" message
reflect the merged results of all pages.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/services/provider-litellm.ts`:
- Around line 144-181: The code incorrectly assumes client.models.list().data
contains all models; update both ensureReady and healthCheck to use the SDK's
auto-pagination (for await … of) to iterate all returned pages/items from
client.models.list(), accumulating model ids or checking membership across pages
instead of only checking modelList.data; reference the existing
client.models.list call and config.embeddingModel when implementing the
iteration and replace the .data-based membership check (and any slicing of
modelList.data) with logic that gathers ids (or tests equality) from all
pages/items before throwing the "model not registered" error.
- Around line 144-181: The model registration check in ensureReady (and the
similar check in healthCheck) only inspects modelList.data from a single page
returned by client.models.list(), which can miss models on subsequent pages;
replace the single-page logic by iterating all pages from client.models.list()
(use for await over client.models.list() or the SDK's pager) to collect or
search every model id before setting modelRegistered and building the known
list; update references in ensureReady, healthCheck, client.models.list,
modelRegistered, and config.embeddingModel so the check and the "Currently
registered models" message reflect the merged results of all pages.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 514cf71c-320e-4089-9103-b23143a31b10

📥 Commits

Reviewing files that changed from the base of the PR and between aedfa85 and 1708510.

📒 Files selected for processing (6)
  • README.md
  • src/services/embedding-config.ts
  • src/services/embedding-provider.ts
  • src/services/provider-litellm.ts
  • tests/unit/embedding-config.test.ts
  • tests/unit/embedding-provider.test.ts

The OpenAI SDK's `client.models.list()` returns a `PagePromise` that
implements `AsyncIterable<Model>` and auto-paginates on demand. The
previous implementation read `modelList.data` directly, which only
contains the first page. Today's LiteLLM proxy returns the entire
`model_list` from `config.yaml` in a single response so the bug is
latent, but a future LiteLLM build (or an upstream proxy in front of
it) that paginates `/v1/models` would cause `ensureReady` and
`healthCheck` to throw a spurious "alias not registered" error for
any alias landing on a non-first page.

Switch both checks to `for await (const m of client.models.list())`
and accumulate ids into a single array. Equivalent to the SDK's
documented async-iteration pattern; `PagePromise` is itself the
iterable, so no extra `await` is needed before the loop. Inline
comment explains why the iteration matters even though today's
LiteLLM doesn't paginate, so the pattern survives future drive-by
"simplifications".

Surfaced by CodeRabbit on PR review of 1708510.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/services/provider-litellm.ts (2)

78-107: getClient() creates a new OpenAI instance on every API-key or URL change — consider the concurrent-call edge case

getClient() checks three variables to decide whether to reuse or replace the cached client. This is correct for sequential callers, but if two concurrent embed() calls reach the if (!litellmClient || ...) branch simultaneously (e.g., after resetLiteLLMClient()), both will construct a new OpenAI instance and the second write will silently win. In practice this is unlikely to matter (both instances are functionally identical for the same key/URL), but it means litellmApiKey could briefly hold a stale reference if env changes mid-flight. This is a design-level observation, not a correctness bug in the typical single-threaded Node.js event-loop model.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/provider-litellm.ts` around lines 78 - 107, getClient() can race
when two callers concurrently detect the cache miss and both construct an OpenAI
instance; change getClient to use a double-checked assignment or a short
critical section so only one constructed client is stored: compute the
apiKey/baseUrl as now, then if the cached litellmClient looks stale create a
local newClient, but before writing set litellmClient re-check (litellmClient,
litellmBaseUrl, litellmApiKey) and only assign the newClient if the cache is
still stale; alternatively implement a small Promise/mutex guard around the
cache-update so resetLiteLLMClient(), getClient(), and the variables
litellmClient, litellmBaseUrl, litellmApiKey are updated atomically.

275-306: 💤 Low value

response.data.sort() mutates the response object in place

Array.prototype.sort is destructive. While response.data is a local, not-shared reference here, a defensive spread avoids surprising any future code that retains a reference to response.

♻️ Proposed fix
-    const sorted = response.data.sort((a, b) => a.index - b.index);
+    const sorted = [...response.data].sort((a, b) => a.index - b.index);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/services/provider-litellm.ts` around lines 275 - 306, In _embedBatch,
avoid mutating response.data with Array.prototype.sort; instead make a shallow
copy (e.g., via [...response.data] or response.data.slice()) and sort that copy
before mapping so the original response object is not mutated; update the code
around the variable `response` and the `sorted` assignment to sort the copied
array and then return the embeddings from the sorted copy.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/services/provider-litellm.ts`:
- Around line 78-107: getClient() can race when two callers concurrently detect
the cache miss and both construct an OpenAI instance; change getClient to use a
double-checked assignment or a short critical section so only one constructed
client is stored: compute the apiKey/baseUrl as now, then if the cached
litellmClient looks stale create a local newClient, but before writing set
litellmClient re-check (litellmClient, litellmBaseUrl, litellmApiKey) and only
assign the newClient if the cache is still stale; alternatively implement a
small Promise/mutex guard around the cache-update so resetLiteLLMClient(),
getClient(), and the variables litellmClient, litellmBaseUrl, litellmApiKey are
updated atomically.
- Around line 275-306: In _embedBatch, avoid mutating response.data with
Array.prototype.sort; instead make a shallow copy (e.g., via [...response.data]
or response.data.slice()) and sort that copy before mapping so the original
response object is not mutated; update the code around the variable `response`
and the `sorted` assignment to sort the copied array and then return the
embeddings from the sorted copy.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b0a9e765-2e80-4591-aeac-89b9051fc06a

📥 Commits

Reviewing files that changed from the base of the PR and between 1708510 and 6c67965.

📒 Files selected for processing (1)
  • src/services/provider-litellm.ts

@airmonitor
Copy link
Copy Markdown
Contributor Author

Addressed CodeRabbit's pagination concern on client.models.list() in commit 6c67965.

Concern (lines 144–181 of src/services/provider-litellm.ts): modelList.data only contains the first page of /v1/models. Today's LiteLLM proxy returns the full model_list from config.yaml in a single response, but a future LiteLLM build — or an upstream proxy in front of it — that paginates /v1/models would cause ensureReady() and healthCheck() to throw a spurious "alias not registered" error for any alias landing on a non-first page.

Fix: Both ensureReady() and healthCheck() now use the SDK's documented async-iteration pattern (for await (const m of client.models.list())), which transparently fetches subsequent pages on demand. Model ids are accumulated into a single array before the membership check, so the "Currently registered models" log line in the missing-alias error reflects the merged result across all pages. An inline comment explains why iteration matters even though today's LiteLLM doesn't paginate, so the pattern survives future drive-by simplifications.

Diff: 1 file changed, +19 / −6. No behavioural change against the current LiteLLM proxy; future-proofs against paginated /v1/models responses.

@giancarloerra giancarloerra self-assigned this May 8, 2026
@giancarloerra giancarloerra merged commit f87f529 into giancarloerra:main May 8, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants