fix(embedding): derive OpenAI provider dimensions from model#189
fix(embedding): derive OpenAI provider dimensions from model#189
Conversation
Follow-up to #186. The PR made model configurable via OPENAI_EMBEDDING_MODEL but left dimensions hardcoded at 1536. A user pointing at text-embedding-3-large (3072 dims) would see provider.dimensions return 1536, which vector-store schemas and hybrid-search weights rely on for correct sizing. - Add MODEL_DIMENSIONS table: text-embedding-3-small=1536, text-embedding-3-large=3072, text-embedding-ada-002=1536. - Make dimensions a computed readonly field. - New OPENAI_EMBEDDING_DIMENSIONS env var for custom / self-hosted OpenAI-compatible endpoints not in the table. Positive integers only; reject non-numeric / non-positive values with a clear error. - Unknown model names fall back to 1536 with the explicit override available if the server returns a different size. - Tests cover known models, dimension override, unknown-model fallback, and validation of the override env var.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughThe OpenAI embedding provider now dynamically resolves embedding vector dimensions from environment configuration or model defaults. A lookup table and validation helper determine dimensions based on the model and an optional Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
test/embedding-provider.test.ts (1)
108-150: Good coverage of the new behavior.The four new cases (known-model derivation, override, unknown-model fallback, invalid-override rejection) map cleanly onto the new logic. One gap worth considering: add a case for a partial-numeric override like
"768abc"or"1.5"to lock down stricter integer validation (see the comment onresolveDimensions).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/embedding-provider.test.ts` around lines 108 - 150, Add tests for partial-numeric and decimal OPENAI_EMBEDDING_DIMENSIONS to ensure resolveDimensions rejects them: in embedding-provider.test.ts extend the "rejects invalid OPENAI_EMBEDDING_DIMENSIONS values" case to set process.env["OPENAI_EMBEDDING_DIMENSIONS"] = "768abc" and = "1.5" (and assert new OpenAIEmbeddingProvider("test-key") throws the same /OPENAI_EMBEDDING_DIMENSIONS must be a positive integer/), referencing the OpenAIEmbeddingProvider constructor and resolveDimensions logic to validate stricter integer parsing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/providers/embedding/openai.ts`:
- Around line 20-31: The resolveDimensions function currently uses parseInt
which accepts partial/fractional inputs; change the override parsing to reject
any non-integer or extra-character strings by converting the trimmed override
with Number() and validating with Number.isInteger(parsed) && parsed > 0 (or use
a strict /^\d+$/ regex before Number conversion) instead of parseInt, and throw
the same error message when validation fails so fractional values like "1.5" or
"768abc" are rejected.
---
Nitpick comments:
In `@test/embedding-provider.test.ts`:
- Around line 108-150: Add tests for partial-numeric and decimal
OPENAI_EMBEDDING_DIMENSIONS to ensure resolveDimensions rejects them: in
embedding-provider.test.ts extend the "rejects invalid
OPENAI_EMBEDDING_DIMENSIONS values" case to set
process.env["OPENAI_EMBEDDING_DIMENSIONS"] = "768abc" and = "1.5" (and assert
new OpenAIEmbeddingProvider("test-key") throws the same
/OPENAI_EMBEDDING_DIMENSIONS must be a positive integer/), referencing the
OpenAIEmbeddingProvider constructor and resolveDimensions logic to validate
stricter integer parsing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9acee8a3-4b72-4363-882c-cc4d02e76848
📒 Files selected for processing (2)
src/providers/embedding/openai.tstest/embedding-provider.test.ts
| function resolveDimensions(model: string, override: string | undefined): number { | ||
| if (override !== undefined && override.trim().length > 0) { | ||
| const parsed = parseInt(override, 10); | ||
| if (!Number.isFinite(parsed) || parsed <= 0) { | ||
| throw new Error( | ||
| `OPENAI_EMBEDDING_DIMENSIONS must be a positive integer, got: ${override}`, | ||
| ); | ||
| } | ||
| return parsed; | ||
| } | ||
| return MODEL_DIMENSIONS[model] ?? DEFAULT_DIMENSIONS; | ||
| } |
There was a problem hiding this comment.
parseInt silently accepts partial-numeric and fractional inputs.
parseInt("768abc", 10) returns 768 and parseInt("1.5", 10) returns 1 — both bypass the validation and produce a silently-wrong dimensions value that will later cause dimension mismatches against the vector store. The current tests ("not-a-number", "-5", "0") don't catch this because fully non-numeric strings yield NaN.
Consider using Number() (or a regex pre-check) so any non-integer string is rejected.
🔧 Proposed fix
function resolveDimensions(model: string, override: string | undefined): number {
- if (override !== undefined && override.trim().length > 0) {
- const parsed = parseInt(override, 10);
- if (!Number.isFinite(parsed) || parsed <= 0) {
+ if (override !== undefined && override.trim().length > 0) {
+ const parsed = Number(override);
+ if (!Number.isInteger(parsed) || parsed <= 0) {
throw new Error(
`OPENAI_EMBEDDING_DIMENSIONS must be a positive integer, got: ${override}`,
);
}
return parsed;
}
return MODEL_DIMENSIONS[model] ?? DEFAULT_DIMENSIONS;
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/providers/embedding/openai.ts` around lines 20 - 31, The
resolveDimensions function currently uses parseInt which accepts
partial/fractional inputs; change the override parsing to reject any non-integer
or extra-character strings by converting the trimmed override with Number() and
validating with Number.isInteger(parsed) && parsed > 0 (or use a strict /^\d+$/
regex before Number conversion) instead of parseInt, and throw the same error
message when validation fails so fractional values like "1.5" or "768abc" are
rejected.
- README provider table: surface the no-op default and call out the opt-in Claude-subscription fallback with AGENTMEMORY_ALLOW_AGENT_SDK (from #187) instead of listing 'Claude subscription' as the default. - README env block: document OPENAI_BASE_URL / OPENAI_EMBEDDING_MODEL (#186) and OPENAI_EMBEDDING_DIMENSIONS (#189), plus MINIMAX_API_KEY. - Hero stat-tests SVG: 654 -> 827 (both dark and light variants) to match current suite size after recursion guard + idempotent lesson/crystal tests + openai dimension tests landed. - website/lib/generated-meta.json regenerated by website/scripts/gen-meta.mjs (v0.9.1, 51 tools, 12 hooks, 119 endpoints, 848 tests).
Rolls up #186 (OPENAI_BASE_URL / OPENAI_EMBEDDING_MODEL), #187 (Stop-hook recursion 5-layer defense + NoopProvider + AGENTMEMORY_ALLOW_AGENT_SDK opt-in), #188 (viewer empty-tabs + import-jsonl synthetic compression + auto-derived lessons/crystals + richer session detail + audit/replay/frontier shape fixes), #189 (OPENAI_EMBEDDING_DIMENSIONS + model-dimensions table), and #190 (README/website docs refresh). Bumps: package.json, plugin/.claude-plugin/plugin.json, src/version.ts, src/types.ts ExportData.version union, src/functions/export-import.ts supportedVersions, test/export-import.test.ts assertion, and packages/mcp/package.json shim (was stuck at 0.9.0).
Summary
CodeRabbit flagged this on #186: the PR made
OPENAI_EMBEDDING_MODELconfigurable but leftdimensionshardcoded at 1536. A user pointing the embedding provider attext-embedding-3-large(3072 dims) would still seeprovider.dimensions === 1536, silently breaking downstream vector-store schemas and hybrid-search weighting.Fix
MODEL_DIMENSIONSlookup for known OpenAI models:text-embedding-3-small→ 1536text-embedding-3-large→ 3072text-embedding-ada-002→ 1536dimensionsbecomes a computed readonly field set from the lookup in the constructor.OPENAI_EMBEDDING_DIMENSIONSenv var explicitly overrides for custom / self-hosted OpenAI-compatible endpoints. Positive integers only;not-a-number,-5,0throw a clear error at construction.Tests
827 / 827 pass (+4 new):
Notes
Non-breaking. Preserves #186's defaults for users who did not set any embedding env vars.
Summary by CodeRabbit
OPENAI_EMBEDDING_DIMENSIONSenvironment variable to customize embedding vector dimensions for OpenAI embeddings.