Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
caecd86
providers: add OpenAI compatible provider
saem Apr 22, 2026
79b408e
remove the use of mocks
saem Apr 22, 2026
3627620
restore openrouter
saem Apr 24, 2026
e233eaf
merge main and resolve local LLM provider conflicts surgically
saem Apr 24, 2026
bf12968
use actual base urls everywhere
saem Apr 24, 2026
5a5ac74
fix fs-watcher intermitent timeout
saem Apr 24, 2026
909f05a
restore openrouter
saem Apr 24, 2026
b631a7d
remove .gemini directory
saem Apr 24, 2026
ebd45af
undo further openrouter changes
saem Apr 24, 2026
585af16
missed a `!`
saem Apr 24, 2026
2e20a85
fix broken api key check
saem Apr 24, 2026
13da1a1
missed vllm
saem Apr 24, 2026
b5c4961
don't try to use llama3 as an embedding model
saem Apr 24, 2026
edf60f2
fix bad baseurl guidance in readme
saem Apr 24, 2026
b14480f
require url and model parameters for lm studio and vllm
saem Apr 24, 2026
f95cd2c
more forgiving baseUrl handling
saem Apr 24, 2026
0e93384
use a timeout abort signal in OpenAIProvider
saem Apr 24, 2026
e15c9b4
clean-up vllm in embedding providers test setup
saem Apr 24, 2026
22aae21
provide all necessary params
saem Apr 24, 2026
1fabbaf
openai reasoning models set correct max tokens param
saem Apr 24, 2026
144aa4a
set default base url for vllm provider
saem Apr 24, 2026
7acacb9
add a small wait to ensure chokidar is ready
saem Apr 24, 2026
6334687
clean-up mocks
saem Apr 24, 2026
ac16ed5
remove unused `extraHeaders` in OpenAIProviders
saem Apr 24, 2026
2238d12
require api key to connect to openapi proper
saem Apr 24, 2026
3979281
address feedback
saem Apr 25, 2026
e419ed8
Merge branch 'main' into openai-compatible-providers
saem Apr 25, 2026
ba6b298
fix the changelog
saem Apr 25, 2026
43f6780
reasoning model detection if there are prefixes
saem Apr 25, 2026
5994cfa
README: note that OpenAIEmbeddingProvider env var fallback remain
saem Apr 25, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,22 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),

## [Unreleased]

### Added

- **Support for local OpenAI-compatible providers.** Full support for Ollama, vLLM, and LM Studio as first-class providers. Added new configuration variables:
- **Ollama**: `OLLAMA_BASE_URL` (default: `http://localhost:11434`) and `OLLAMA_MODEL`.
- **LM Studio**: `LMSTUDIO_BASE_URL` and `LMSTUDIO_MODEL`.
- **vLLM**: `VLLM_BASE_URL` and `VLLM_MODEL`.
Auto-detection now supports these local endpoints while ensuring existing legacy keys (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`) remain prioritized for stable upgrades.
- **`AGENTMEMORY_PROVIDER` environment variable.** Allows explicit selection of the LLM provider (e.g., `AGENTMEMORY_PROVIDER=ollama`), bypassing auto-detection and supporting setups with multiple keys.
- **Improved `OpenAIEmbeddingProvider` flexibility.** The constructor now accepts optional `baseUrl` and `model` overrides, enabling direct instantiation for custom OpenAI-compatible embedding proxies. Environment variables remain as fallback.

### Changed

- **Provider detection order and MiniMax priority.** Detection order updated to prioritize legacy providers (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`) over local auto-detection. `MiniMax` detection now also honors this priority and supports `MINIMAX_MODEL`.
- **`OpenAIProvider` and `OpenAIEmbeddingProvider` key handling.** Both now support `null` API keys to natively support local-only endpoints that don't require an `Authorization` header.
- **Neutral OpenAI default model.** The OpenAI provider now defaults to `openai-default` instead of a specific commercial model to avoid opinionated pricing or capability assumptions in the library core.

## [0.9.3] — 2026-04-24

Developer-experience patch. Every disabled feature flag is now visible in the viewer, the CLI, and REST error responses, so devs no longer hit empty tabs wondering whether the install is broken or just opt-in. Adds a `doctor` command that diagnoses the whole stack in one shot and a first-run hero in the viewer that points at the magical-moment `demo` command.
Expand Down
11 changes: 10 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -534,7 +534,7 @@ Memories decay over time (Ebbinghaus curve). Frequently accessed memories streng
|------|----------|
| `SessionStart` | Project path, session ID |
| `UserPromptSubmit` | User prompts (privacy-filtered) |
| `PreToolUse` | File access patterns + enriched context |
| `PreToolUse` | File access pattern + enriched context |
| `PostToolUse` | Tool name, input, output |
| `PostToolUseFailure` | Error context |
| `PreCompact` | Re-injects memory before compaction |
Expand Down Expand Up @@ -782,6 +782,10 @@ agentmemory auto-detects from your environment. No API key needed if you have a
| **No-op (default)** | No config needed | LLM-backed compress/summarize is DISABLED. Synthetic BM25 compression + recall still work. See `AGENTMEMORY_ALLOW_AGENT_SDK` below if you used to rely on the Claude-subscription fallback. |
| Anthropic API | `ANTHROPIC_API_KEY` | Per-token billing |
| MiniMax | `MINIMAX_API_KEY` | Anthropic-compatible |
| OpenAI | `OPENAI_API_KEY` | Standard API |
| LM Studio | `LMSTUDIO_BASE_URL` | Local OpenAI-compatible |
| Ollama | `OLLAMA_BASE_URL` | Local OpenAI-compatible |
| vLLM | `VLLM_BASE_URL` | Local OpenAI-compatible |
| Gemini | `GEMINI_API_KEY` | Also enables embeddings |
| OpenRouter | `OPENROUTER_API_KEY` | Any model |
| Claude subscription fallback | `AGENTMEMORY_ALLOW_AGENT_SDK=true` | Opt-in only. Spawns `@anthropic-ai/claude-agent-sdk` sessions — used to cause unbounded Stop-hook recursion (#149 follow-up) so it is no longer the default. |
Expand All @@ -797,6 +801,9 @@ Create `~/.agentmemory/.env`:
# GEMINI_API_KEY=...
# OPENROUTER_API_KEY=...
# MINIMAX_API_KEY=...
# OPENAI_API_KEY=sk-...
# OLLAMA_MODEL=llama3
# LMSTUDIO_BASE_URL=http://localhost:1234
# Opt-in Claude-subscription fallback (spawns @anthropic-ai/claude-agent-sdk);
# leave OFF unless you understand the Stop-hook recursion risk (#149 follow-up):
# AGENTMEMORY_ALLOW_AGENT_SDK=true
Expand All @@ -808,6 +815,8 @@ Create `~/.agentmemory/.env`:
# OPENAI_BASE_URL=https://api.openai.com # Override for Azure / vLLM / LM Studio / proxies
# OPENAI_EMBEDDING_MODEL=text-embedding-3-small
# OPENAI_EMBEDDING_DIMENSIONS=1536 # Required when the model is not in the known-models table
# OLLAMA_EMBEDDING_MODEL=nomic-embed-text
# LMSTUDIO_EMBEDDING_BASE_URL=http://localhost:1234

# Search tuning
# BM25_WEIGHT=0.4
Expand Down
64 changes: 57 additions & 7 deletions src/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -48,24 +48,27 @@ function hasRealValue(v: string | undefined): v is string {
function detectProvider(env: Record<string, string>): ProviderConfig {
const maxTokens = parseInt(env["MAX_TOKENS"] || "4096", 10);

// MiniMax: Anthropic-compatible API, requires raw fetch to avoid SDK stainless headers
if (hasRealValue(env["MINIMAX_API_KEY"])) {
const explicitProvider = env["AGENTMEMORY_PROVIDER"];

if (explicitProvider === "openai" || (!explicitProvider && hasRealValue(env["OPENAI_API_KEY"]))) {
return {
provider: "minimax",
model: env["MINIMAX_MODEL"] || "MiniMax-M2.7",
provider: "openai",
model: env["OPENAI_MODEL"] || "openai-default",
maxTokens,
baseURL: env["OPENAI_BASE_URL"] || "https://api.openai.com",
};
}

if (hasRealValue(env["ANTHROPIC_API_KEY"])) {
if (explicitProvider === "anthropic" || (!explicitProvider && hasRealValue(env["ANTHROPIC_API_KEY"]))) {
return {
provider: "anthropic",
model: env["ANTHROPIC_MODEL"] || "claude-sonnet-4-20250514",
maxTokens,
baseURL: env["ANTHROPIC_BASE_URL"],
};
}
if (hasRealValue(env["GEMINI_API_KEY"]) || hasRealValue(env["GOOGLE_API_KEY"])) {

if (explicitProvider === "gemini" || (!explicitProvider && (hasRealValue(env["GEMINI_API_KEY"]) || hasRealValue(env["GOOGLE_API_KEY"])))) {
if (!hasRealValue(env["GEMINI_API_KEY"]) && hasRealValue(env["GOOGLE_API_KEY"])) {
process.stderr.write(
"[agentmemory] GOOGLE_API_KEY detected — treating as GEMINI_API_KEY. " +
Expand All @@ -78,14 +81,51 @@ function detectProvider(env: Record<string, string>): ProviderConfig {
maxTokens,
};
}
if (hasRealValue(env["OPENROUTER_API_KEY"])) {

if (explicitProvider === "openrouter" || (!explicitProvider && hasRealValue(env["OPENROUTER_API_KEY"]))) {
return {
provider: "openrouter",
model: env["OPENROUTER_MODEL"] || "anthropic/claude-sonnet-4-20250514",
maxTokens,
};
}

// Local/Compatible providers (moved after legacy ones to avoid flipping existing users)
if (explicitProvider === "minimax" || (!explicitProvider && hasRealValue(env["MINIMAX_API_KEY"]))) {
return {
provider: "minimax",
model: env["MINIMAX_MODEL"] || "MiniMax-M2.7",
maxTokens,
};
}

if (explicitProvider === "lmstudio" || (!explicitProvider && (hasRealValue(env["LMSTUDIO_BASE_URL"]) || hasRealValue(env["LMSTUDIO_MODEL"])))) {
return {
provider: "lmstudio",
model: env["LMSTUDIO_MODEL"] || "local-model",
maxTokens,
baseURL: env["LMSTUDIO_BASE_URL"],
};
}

if (explicitProvider === "ollama" || (!explicitProvider && (hasRealValue(env["OLLAMA_BASE_URL"]) || hasRealValue(env["OLLAMA_MODEL"])))) {
return {
provider: "ollama",
model: env["OLLAMA_MODEL"] || "llama3",
maxTokens,
baseURL: env["OLLAMA_BASE_URL"] || "http://localhost:11434",
};
}

if (explicitProvider === "vllm" || (!explicitProvider && (hasRealValue(env["VLLM_BASE_URL"]) || hasRealValue(env["VLLM_MODEL"])))) {
return {
provider: "vllm",
model: env["VLLM_MODEL"] || "local-model",
maxTokens,
baseURL: env["VLLM_BASE_URL"],
};
}

const allowAgentSdk = env["AGENTMEMORY_ALLOW_AGENT_SDK"] === "true";
if (!allowAgentSdk) {
process.stderr.write(
Expand Down Expand Up @@ -174,6 +214,12 @@ export function detectEmbeddingProvider(
if (source["VOYAGE_API_KEY"]) return "voyage";
if (source["COHERE_API_KEY"]) return "cohere";
if (source["OPENROUTER_API_KEY"]) return "openrouter";
if (source["OLLAMA_EMBEDDING_BASE_URL"] || source["OLLAMA_EMBEDDING_MODEL"])
return "ollama";
if (source["LMSTUDIO_EMBEDDING_BASE_URL"] || source["LMSTUDIO_EMBEDDING_MODEL"])
return "lmstudio";
if (source["VLLM_EMBEDDING_BASE_URL"] || source["VLLM_EMBEDDING_MODEL"])
return "vllm";
return null;
}

Expand Down Expand Up @@ -276,6 +322,10 @@ const VALID_PROVIDERS = new Set([
"openrouter",
"agent-sdk",
"minimax",
"lmstudio",
"openai",
"ollama",
"vllm",
]);

export function loadFallbackConfig(): FallbackConfig {
Expand Down
27 changes: 26 additions & 1 deletion src/providers/embedding/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,32 @@ export function createEmbeddingProvider(): EmbeddingProvider | null {
case "gemini":
return new GeminiEmbeddingProvider(getEnvVar("GEMINI_API_KEY")!);
case "openai":
return new OpenAIEmbeddingProvider(getEnvVar("OPENAI_API_KEY")!);
return new OpenAIEmbeddingProvider(
getEnvVar("OPENAI_API_KEY") || null,
getEnvVar("OPENAI_EMBEDDING_BASE_URL"),
getEnvVar("OPENAI_EMBEDDING_MODEL"),
);
case "ollama":
return new OpenAIEmbeddingProvider(
null,
getEnvVar("OLLAMA_EMBEDDING_BASE_URL") || "http://localhost:11434",
getEnvVar("OLLAMA_EMBEDDING_MODEL") || "nomic-embed-text",
);
Comment thread
saem marked this conversation as resolved.
case "lmstudio":
{
const base = getEnvVar("LMSTUDIO_EMBEDDING_BASE_URL") || "http://localhost:1234";
const model = getEnvVar("LMSTUDIO_EMBEDDING_MODEL");
if (!model) throw new Error("LMSTUDIO_EMBEDDING_MODEL is required for the lmstudio embedding provider");
return new OpenAIEmbeddingProvider(null, base, model);
}
case "vllm":
{
const base = getEnvVar("VLLM_EMBEDDING_BASE_URL");
const model = getEnvVar("VLLM_EMBEDDING_MODEL");
if (!base) throw new Error("VLLM_EMBEDDING_BASE_URL is required for the vllm embedding provider");
if (!model) throw new Error("VLLM_EMBEDDING_MODEL is required for the vllm embedding provider");
return new OpenAIEmbeddingProvider(null, base, model);
}
case "voyage":
return new VoyageEmbeddingProvider(getEnvVar("VOYAGE_API_KEY")!);
case "cohere":
Expand Down
23 changes: 13 additions & 10 deletions src/providers/embedding/openai.ts
Original file line number Diff line number Diff line change
Expand Up @@ -46,17 +46,16 @@ function resolveDimensions(model: string, override: string | undefined): number
export class OpenAIEmbeddingProvider implements EmbeddingProvider {
readonly name = "openai";
readonly dimensions: number;
private apiKey: string;
private apiKey: string | null;
private baseUrl: string;
private model: string;

constructor(apiKey?: string) {
this.apiKey = apiKey || getEnvVar("OPENAI_API_KEY") || "";
if (!this.apiKey) throw new Error("OPENAI_API_KEY is required");
constructor(apiKey?: string | null, baseUrl?: string, model?: string) {
this.apiKey = apiKey !== undefined ? apiKey : (getEnvVar("OPENAI_API_KEY") || null);
this.baseUrl =
getEnvVar("OPENAI_BASE_URL") || DEFAULT_BASE_URL;
baseUrl || getEnvVar("OPENAI_BASE_URL") || DEFAULT_BASE_URL;
this.model =
getEnvVar("OPENAI_EMBEDDING_MODEL") || DEFAULT_MODEL;
model || getEnvVar("OPENAI_EMBEDDING_MODEL") || DEFAULT_MODEL;
this.dimensions = resolveDimensions(
this.model,
getEnvVar("OPENAI_EMBEDDING_DIMENSIONS"),
Expand All @@ -70,12 +69,16 @@ export class OpenAIEmbeddingProvider implements EmbeddingProvider {

async embedBatch(texts: string[]): Promise<Float32Array[]> {
const url = `${this.baseUrl}/v1/embeddings`;
const headers: Record<string, string> = {
"Content-Type": "application/json",
};
if (this.apiKey) {
headers.Authorization = `Bearer ${this.apiKey}`;
}

const response = await fetch(url, {
method: "POST",
headers: {
Authorization: `Bearer ${this.apiKey}`,
"Content-Type": "application/json",
},
headers,
body: JSON.stringify({
model: this.model,
input: texts,
Expand Down
33 changes: 33 additions & 0 deletions src/providers/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ import { AgentSDKProvider } from "./agent-sdk.js";
import { AnthropicProvider } from "./anthropic.js";
import { MinimaxProvider } from "./minimax.js";
import { NoopProvider } from "./noop.js";
import { OpenAIProvider } from "./openai.js";
import { OpenRouterProvider } from "./openrouter.js";
import { ResilientProvider } from "./resilient.js";
import { FallbackChainProvider } from "./fallback-chain.js";
Expand Down Expand Up @@ -59,6 +60,38 @@ export function createFallbackProvider(

function createBaseProvider(config: ProviderConfig): MemoryProvider {
switch (config.provider) {
case "openai":
return new OpenAIProvider(
"openai",
getEnvVar("OPENAI_API_KEY") || null,
config.model,
config.maxTokens,
config.baseURL || "https://api.openai.com",
);
case "ollama":
return new OpenAIProvider(
"ollama",
null,
config.model,
config.maxTokens,
config.baseURL || "http://localhost:11434",
);
case "vllm":
return new OpenAIProvider(
"vllm",
null,
config.model,
config.maxTokens,
config.baseURL || "http://localhost:8000",
);
Comment thread
saem marked this conversation as resolved.
case "lmstudio":
return new OpenAIProvider(
"lmstudio",
null,
config.model,
config.maxTokens,
config.baseURL || "http://localhost:1234",
);
case "minimax":
return new MinimaxProvider(
requireEnvVar("MINIMAX_API_KEY"),
Expand Down
81 changes: 81 additions & 0 deletions src/providers/openai.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
import type { MemoryProvider } from "../types.js";

/**
* Generic OpenAI-compatible provider.
* Works with OpenAI, LM Studio, Ollama, vLLM, Groq, OpenRouter, etc.
*/
export class OpenAIProvider implements MemoryProvider {
constructor(
public name: string,
private apiKey: string | null,
private model: string,
private maxTokens: number,
private baseUrl: string,
private timeoutMs: number = 60_000,
) {}

async compress(systemPrompt: string, userPrompt: string): Promise<string> {
return this.call(systemPrompt, userPrompt);
}

async summarize(systemPrompt: string, userPrompt: string): Promise<string> {
return this.call(systemPrompt, userPrompt);
}

private async call(
systemPrompt: string,
userPrompt: string,
): Promise<string> {
const base = this.baseUrl.replace(/\/+$/, "");
const path = base.endsWith("/v1") ? "/chat/completions" : "/v1/chat/completions";
const url = `${base}${path}`;

// Detect reasoning models (o1, o3, etc) which require max_completion_tokens.
// We check for the pattern o1- or o3- anywhere in the string to handle prefixes.
const isReasoningModel = /\bo[13]-/.test(this.model);

const body: Record<string, any> = {
model: this.model,
messages: [
{ role: "system", content: systemPrompt },
{ role: "user", content: userPrompt },
],
};

if (isReasoningModel) {
body.max_completion_tokens = this.maxTokens;
} else {
body.max_tokens = this.maxTokens;
}

const headers: Record<string, string> = {
"Content-Type": "application/json",
};
if (this.apiKey) {
headers.Authorization = `Bearer ${this.apiKey}`;
}

const response = await fetch(url, {
method: "POST",
signal: AbortSignal.timeout(this.timeoutMs),
headers,
body: JSON.stringify(body),
});
Comment thread
coderabbitai[bot] marked this conversation as resolved.

if (!response.ok) {
const text = await response.text();
throw new Error(`${this.name} API error (${response.status}): ${text}`);
}

const data = (await response.json()) as {
choices?: Array<{ message?: { content?: string } }>;
};
const content = data.choices?.[0]?.message?.content;
if (!content) {
throw new Error(
`${this.name} returned unexpected response: ${JSON.stringify(data).slice(0, 200)}`,
);
}
return content;
}
}
3 changes: 1 addition & 2 deletions src/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ export interface ProviderConfig {
baseURL?: string;
}

export type ProviderType = "agent-sdk" | "anthropic" | "gemini" | "openrouter" | "minimax" | "noop";
export type ProviderType = "agent-sdk" | "anthropic" | "gemini" | "openrouter" | "minimax" | "lmstudio" | "openai" | "ollama" | "vllm" | "noop";

export interface MemoryProvider {
name: string;
Expand Down Expand Up @@ -839,7 +839,6 @@ export interface RetentionScore {
reinforcementBoost: number;
lastAccessed: string;
accessCount: number;
source?: "episodic" | "semantic";
}

export interface DecayConfig {
Expand Down
Loading