Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 22 additions & 4 deletions src/providers/embedding/openai.ts
Original file line number Diff line number Diff line change
@@ -1,16 +1,33 @@
import type { EmbeddingProvider } from "../../types.js";
import { getEnvVar } from "../../config.js";

const API_URL = "https://api.openai.com/v1/embeddings";

const DEFAULT_BASE_URL = "https://api.openai.com";
const DEFAULT_MODEL = "text-embedding-3-small";

/**
* OpenAI-compatible embedding provider.
*
* Required env vars:
* OPENAI_API_KEY — API key
*
* Optional:
* OPENAI_BASE_URL — base URL without path (default: https://api.openai.com)
* OPENAI_EMBEDDING_MODEL — model name (default: text-embedding-3-small)
*/
export class OpenAIEmbeddingProvider implements EmbeddingProvider {
readonly name = "openai";
readonly dimensions = 1536;
private apiKey: string;
private baseUrl: string;
private model: string;

constructor(apiKey?: string) {
this.apiKey = apiKey || getEnvVar("OPENAI_API_KEY") || "";
if (!this.apiKey) throw new Error("OPENAI_API_KEY is required");
this.baseUrl =
getEnvVar("OPENAI_BASE_URL") || DEFAULT_BASE_URL;
this.model =
getEnvVar("OPENAI_EMBEDDING_MODEL") || DEFAULT_MODEL;
Comment on lines 19 to +30
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

dimensions is hardcoded to 1536 but model is now configurable.

If a user sets OPENAI_EMBEDDING_MODEL=text-embedding-3-large (3072 dims) or text-embedding-ada-002 (1536) or any other compatible model, provider.dimensions will report a value inconsistent with the vectors actually returned by embed(). Downstream consumers (e.g. vector store schemas) relying on dimensions will silently get wrong sizing.

Consider either:

  • deriving dimensions from the configured model (lookup table with a sensible default), or
  • exposing OPENAI_EMBEDDING_DIMENSIONS env var, or
  • at minimum documenting the constraint that callers overriding the model are responsible for ensuring it returns 1536-dim vectors.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/providers/embedding/openai.ts` around lines 19 - 30, The class currently
hardcodes readonly dimensions = 1536 while allowing model to be configured in
the constructor (via model, getEnvVar("OPENAI_EMBEDDING_MODEL") or
DEFAULT_MODEL), causing a mismatch if a different model (e.g.
text-embedding-3-large) is used; change the implementation so dimensions is
computed from the configured model (use a small lookup map keyed by model name
to set the correct dimension) and allow an override via an environment variable
(OPENAI_EMBEDDING_DIMENSIONS) as a fallback; update the constructor to read
OPENAI_EMBEDDING_DIMENSIONS first, then map model→dimensions, and finally
default to 1536 if neither is present, keeping references to the existing
symbols dimensions, constructor, model, getEnvVar and DEFAULT_MODEL for locating
the change.

}

async embed(text: string): Promise<Float32Array> {
Expand All @@ -19,14 +36,15 @@ export class OpenAIEmbeddingProvider implements EmbeddingProvider {
}

async embedBatch(texts: string[]): Promise<Float32Array[]> {
const response = await fetch(API_URL, {
const url = `${this.baseUrl}/v1/embeddings`;
const response = await fetch(url, {
method: "POST",
headers: {
Authorization: `Bearer ${this.apiKey}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "text-embedding-3-small",
model: this.model,
input: texts,
}),
});
Expand Down
57 changes: 57 additions & 0 deletions test/embedding-provider.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -47,3 +47,60 @@ describe("createEmbeddingProvider", () => {
expect(provider).toBeInstanceOf(OpenAIEmbeddingProvider);
});
});

describe("OpenAIEmbeddingProvider", () => {
const originalEnv = { ...process.env };

beforeEach(() => {
process.env = { ...originalEnv };
delete process.env["OPENAI_BASE_URL"];
delete process.env["OPENAI_EMBEDDING_MODEL"];
});

afterEach(() => {
process.env = originalEnv;
});

it("uses default base URL and model when env vars are not set", () => {
const provider = new OpenAIEmbeddingProvider("test-key");
expect(provider.name).toBe("openai");
expect(provider.dimensions).toBe(1536);
});

it("throws when no API key is provided", () => {
delete process.env["OPENAI_API_KEY"];
expect(() => new OpenAIEmbeddingProvider()).toThrow("OPENAI_API_KEY is required");
});

it("respects OPENAI_BASE_URL env var", async () => {
process.env["OPENAI_BASE_URL"] = "https://my-proxy.example.com";
const provider = new OpenAIEmbeddingProvider("test-key");

const fetchSpy = vi.spyOn(globalThis, "fetch").mockResolvedValue(
new Response(JSON.stringify({ data: [{ embedding: [0.1, 0.2, 0.3] }] }), { status: 200 }),
);

await provider.embed("hello");
expect(fetchSpy).toHaveBeenCalledWith(
"https://my-proxy.example.com/v1/embeddings",
expect.any(Object),
);

fetchSpy.mockRestore();
});

it("respects OPENAI_EMBEDDING_MODEL env var", async () => {
process.env["OPENAI_EMBEDDING_MODEL"] = "text-embedding-3-large";
const provider = new OpenAIEmbeddingProvider("test-key");

const fetchSpy = vi.spyOn(globalThis, "fetch").mockResolvedValue(
new Response(JSON.stringify({ data: [{ embedding: [0.1, 0.2, 0.3] }] }), { status: 200 }),
);

await provider.embed("hello");
const body = JSON.parse((fetchSpy.mock.calls[0][1] as RequestInit).body as string);
expect(body.model).toBe("text-embedding-3-large");

fetchSpy.mockRestore();
});
});