Skip to content

fix(embedding): derive OpenAI provider dimensions from model#189

Merged
rohitg00 merged 1 commit intomainfrom
fix/openai-embedding-dimensions
Apr 22, 2026
Merged

fix(embedding): derive OpenAI provider dimensions from model#189
rohitg00 merged 1 commit intomainfrom
fix/openai-embedding-dimensions

Conversation

@rohitg00
Copy link
Copy Markdown
Owner

@rohitg00 rohitg00 commented Apr 22, 2026

Summary

CodeRabbit flagged this on #186: the PR made OPENAI_EMBEDDING_MODEL configurable but left dimensions hardcoded at 1536. A user pointing the embedding provider at text-embedding-3-large (3072 dims) would still see provider.dimensions === 1536, silently breaking downstream vector-store schemas and hybrid-search weighting.

Fix

  • Add a MODEL_DIMENSIONS lookup for known OpenAI models:
    • text-embedding-3-small → 1536
    • text-embedding-3-large → 3072
    • text-embedding-ada-002 → 1536
  • dimensions becomes a computed readonly field set from the lookup in the constructor.
  • New OPENAI_EMBEDDING_DIMENSIONS env var explicitly overrides for custom / self-hosted OpenAI-compatible endpoints. Positive integers only; not-a-number, -5, 0 throw a clear error at construction.
  • Unknown model names fall back to 1536 (the old default) so existing deployments are unchanged.

Tests

827 / 827 pass (+4 new):

  • dimensions derived from 3 known models
  • override via env var
  • unknown-model fallback
  • invalid override values rejected

Notes

Non-breaking. Preserves #186's defaults for users who did not set any embedding env vars.

Summary by CodeRabbit

  • New Features
    • Added OPENAI_EMBEDDING_DIMENSIONS environment variable to customize embedding vector dimensions for OpenAI embeddings.
    • Embedding dimensions are now automatically determined based on your selected OpenAI embedding model, with built-in support for standard model types.
    • Enhanced configuration validation to detect invalid dimension settings with informative error messages, ensuring proper setup.

Follow-up to #186. The PR made model configurable via OPENAI_EMBEDDING_MODEL
but left dimensions hardcoded at 1536. A user pointing at
text-embedding-3-large (3072 dims) would see provider.dimensions
return 1536, which vector-store schemas and hybrid-search weights
rely on for correct sizing.

- Add MODEL_DIMENSIONS table: text-embedding-3-small=1536,
  text-embedding-3-large=3072, text-embedding-ada-002=1536.
- Make dimensions a computed readonly field.
- New OPENAI_EMBEDDING_DIMENSIONS env var for custom / self-hosted
  OpenAI-compatible endpoints not in the table. Positive integers only;
  reject non-numeric / non-positive values with a clear error.
- Unknown model names fall back to 1536 with the explicit override
  available if the server returns a different size.
- Tests cover known models, dimension override, unknown-model fallback,
  and validation of the override env var.
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agentmemory Ready Ready Preview, Comment Apr 22, 2026 11:39am

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

📝 Walkthrough

Walkthrough

The OpenAI embedding provider now dynamically resolves embedding vector dimensions from environment configuration or model defaults. A lookup table and validation helper determine dimensions based on the model and an optional OPENAI_EMBEDDING_DIMENSIONS environment variable, which must be a positive integer when provided.

Changes

Cohort / File(s) Summary
Embedding Provider Implementation
src/providers/embedding/openai.ts
Added MODEL_DIMENSIONS lookup table, resolveDimensions() helper function, and runtime validation for OPENAI_EMBEDDING_DIMENSIONS environment variable. Changed dimensions from a fixed constant to dynamically initialized value with fallback to 1536.
Embedding Provider Tests
test/embedding-provider.test.ts
Extended test setup to clear OPENAI_EMBEDDING_DIMENSIONS before each test. Added comprehensive unit tests for dimension selection across known models, override behavior, unknown model fallback, and validation error cases.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 Dimensions dance in a rabbit's way,
Embeddings configured day by day,
With lookup tables crisp and clean,
The finest vectors ever seen!
Environment whispers 1536 or more,
Validation guards the provider's door.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: deriving OpenAI embedding provider dimensions from the configured model instead of using a hardcoded value.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/openai-embedding-dimensions

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
test/embedding-provider.test.ts (1)

108-150: Good coverage of the new behavior.

The four new cases (known-model derivation, override, unknown-model fallback, invalid-override rejection) map cleanly onto the new logic. One gap worth considering: add a case for a partial-numeric override like "768abc" or "1.5" to lock down stricter integer validation (see the comment on resolveDimensions).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/embedding-provider.test.ts` around lines 108 - 150, Add tests for
partial-numeric and decimal OPENAI_EMBEDDING_DIMENSIONS to ensure
resolveDimensions rejects them: in embedding-provider.test.ts extend the
"rejects invalid OPENAI_EMBEDDING_DIMENSIONS values" case to set
process.env["OPENAI_EMBEDDING_DIMENSIONS"] = "768abc" and = "1.5" (and assert
new OpenAIEmbeddingProvider("test-key") throws the same
/OPENAI_EMBEDDING_DIMENSIONS must be a positive integer/), referencing the
OpenAIEmbeddingProvider constructor and resolveDimensions logic to validate
stricter integer parsing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/providers/embedding/openai.ts`:
- Around line 20-31: The resolveDimensions function currently uses parseInt
which accepts partial/fractional inputs; change the override parsing to reject
any non-integer or extra-character strings by converting the trimmed override
with Number() and validating with Number.isInteger(parsed) && parsed > 0 (or use
a strict /^\d+$/ regex before Number conversion) instead of parseInt, and throw
the same error message when validation fails so fractional values like "1.5" or
"768abc" are rejected.

---

Nitpick comments:
In `@test/embedding-provider.test.ts`:
- Around line 108-150: Add tests for partial-numeric and decimal
OPENAI_EMBEDDING_DIMENSIONS to ensure resolveDimensions rejects them: in
embedding-provider.test.ts extend the "rejects invalid
OPENAI_EMBEDDING_DIMENSIONS values" case to set
process.env["OPENAI_EMBEDDING_DIMENSIONS"] = "768abc" and = "1.5" (and assert
new OpenAIEmbeddingProvider("test-key") throws the same
/OPENAI_EMBEDDING_DIMENSIONS must be a positive integer/), referencing the
OpenAIEmbeddingProvider constructor and resolveDimensions logic to validate
stricter integer parsing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9acee8a3-4b72-4363-882c-cc4d02e76848

📥 Commits

Reviewing files that changed from the base of the PR and between 343b51c and a6ce50d.

📒 Files selected for processing (2)
  • src/providers/embedding/openai.ts
  • test/embedding-provider.test.ts

Comment on lines +20 to +31
function resolveDimensions(model: string, override: string | undefined): number {
if (override !== undefined && override.trim().length > 0) {
const parsed = parseInt(override, 10);
if (!Number.isFinite(parsed) || parsed <= 0) {
throw new Error(
`OPENAI_EMBEDDING_DIMENSIONS must be a positive integer, got: ${override}`,
);
}
return parsed;
}
return MODEL_DIMENSIONS[model] ?? DEFAULT_DIMENSIONS;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

parseInt silently accepts partial-numeric and fractional inputs.

parseInt("768abc", 10) returns 768 and parseInt("1.5", 10) returns 1 — both bypass the validation and produce a silently-wrong dimensions value that will later cause dimension mismatches against the vector store. The current tests ("not-a-number", "-5", "0") don't catch this because fully non-numeric strings yield NaN.

Consider using Number() (or a regex pre-check) so any non-integer string is rejected.

🔧 Proposed fix
 function resolveDimensions(model: string, override: string | undefined): number {
-  if (override !== undefined && override.trim().length > 0) {
-    const parsed = parseInt(override, 10);
-    if (!Number.isFinite(parsed) || parsed <= 0) {
+  if (override !== undefined && override.trim().length > 0) {
+    const parsed = Number(override);
+    if (!Number.isInteger(parsed) || parsed <= 0) {
       throw new Error(
         `OPENAI_EMBEDDING_DIMENSIONS must be a positive integer, got: ${override}`,
       );
     }
     return parsed;
   }
   return MODEL_DIMENSIONS[model] ?? DEFAULT_DIMENSIONS;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/providers/embedding/openai.ts` around lines 20 - 31, The
resolveDimensions function currently uses parseInt which accepts
partial/fractional inputs; change the override parsing to reject any non-integer
or extra-character strings by converting the trimmed override with Number() and
validating with Number.isInteger(parsed) && parsed > 0 (or use a strict /^\d+$/
regex before Number conversion) instead of parseInt, and throw the same error
message when validation fails so fractional values like "1.5" or "768abc" are
rejected.

@rohitg00 rohitg00 merged commit 50723e1 into main Apr 22, 2026
5 checks passed
@rohitg00 rohitg00 deleted the fix/openai-embedding-dimensions branch April 22, 2026 13:00
rohitg00 added a commit that referenced this pull request Apr 22, 2026
- README provider table: surface the no-op default and call out the
  opt-in Claude-subscription fallback with AGENTMEMORY_ALLOW_AGENT_SDK
  (from #187) instead of listing 'Claude subscription' as the default.
- README env block: document OPENAI_BASE_URL / OPENAI_EMBEDDING_MODEL
  (#186) and OPENAI_EMBEDDING_DIMENSIONS (#189), plus MINIMAX_API_KEY.
- Hero stat-tests SVG: 654 -> 827 (both dark and light variants) to
  match current suite size after recursion guard + idempotent
  lesson/crystal tests + openai dimension tests landed.
- website/lib/generated-meta.json regenerated by
  website/scripts/gen-meta.mjs (v0.9.1, 51 tools, 12 hooks,
  119 endpoints, 848 tests).
@rohitg00 rohitg00 mentioned this pull request Apr 22, 2026
rohitg00 added a commit that referenced this pull request Apr 22, 2026
Rolls up #186 (OPENAI_BASE_URL / OPENAI_EMBEDDING_MODEL), #187 (Stop-hook
recursion 5-layer defense + NoopProvider + AGENTMEMORY_ALLOW_AGENT_SDK opt-in),
#188 (viewer empty-tabs + import-jsonl synthetic compression + auto-derived
lessons/crystals + richer session detail + audit/replay/frontier shape fixes),
#189 (OPENAI_EMBEDDING_DIMENSIONS + model-dimensions table), and #190
(README/website docs refresh).

Bumps: package.json, plugin/.claude-plugin/plugin.json, src/version.ts,
src/types.ts ExportData.version union, src/functions/export-import.ts
supportedVersions, test/export-import.test.ts assertion, and
packages/mcp/package.json shim (was stuck at 0.9.0).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant