Releases · TadMSTR/ollama-queue-proxy

28 May 12:19

TadMSTR

v0.3.0

a8258ff

v0.3.0 — OpenAI-compat embeddings endpoint Latest

Latest

What's new

Added

POST /v1/embeddings — OpenAI-compat endpoint. Clients using the OpenAI Embeddings API (Graphiti, LlamaIndex, LangChain, etc.) can now route through OQP without changing their provider configuration — only the port changes. The request is translated to /api/embed internally; the response is wrapped back into OpenAI format. Auth, priority ceiling, and the embedding cache all apply identically to native /api/embed requests.

Graphiti example:
```
# Before — direct to Ollama
api_url: http://localhost:11434/v1
# After — through OQP (gains cache + queuing)
api_url: http://localhost:11435/v1
```

Fixed

asyncio.get_event_loop() → get_running_loop() — eliminates DeprecationWarning in Python 3.10+ and RuntimeError in 3.12+.
Streaming response generator now closes the underlying httpx connection in a finally block — prevents connection leaks when a client disconnects mid-stream.
_apply_env_overrides no longer crashes with TypeError when an env var contains a numeric path component (e.g. OQP_OLLAMA__HOSTS__0__URL); the key is silently skipped with a comment in config.

Docker

ghcr.io/tadmstr/ollama-queue-proxy:v0.3.0
ghcr.io/tadmstr/ollama-queue-proxy:latest

Full changelog

See CHANGELOG.md.

Assets 2

23 Apr 01:37

TadMSTR

v0.2.0

1daa615

v0.2.0 — client injection, model-aware routing, embedding cache

Highlights

Client injection — port-based auth bypass for clients that can't send Bearer headers; loopback by default, non-loopback bind requires allow_public_injection: true.
Model-aware routing — weighted round-robin across Ollama hosts that already have the requested model loaded, with fast-path invalidation on miss.
Embedding response cache — SHA256-keyed Valkey/Dragonfly cache for /api/embed and /api/embeddings; hits bypass the queue and upstream entirely.
keep_alive defaulting — proxy-level body injection so Ollama doesn't unload models between bursty requests.
Per-client concurrency caps — max_concurrent on auth.keys[], enforced via per-client async semaphore with fairness bound.

Security

Pre-release audit clean (0 critical / high / medium). Two low findings and one informational note remediated before tag:

L1 — allow_public_injection now fails config validation on non-loopback bind; warning also fires when auth is enabled (injection bypasses Bearer auth).
L2 — /metrics escapes Prometheus label values, preventing label-injection via client-supplied model names.
N1 — Dockerfile CMD switched to ollama-queue-proxy console script so main:run() orchestrates the N+1 server gather in containerized deployments.

Compatibility

All v0.1.x configs continue to work unchanged. New fields default to v0.1.x-equivalent behavior.

Full changelog in CHANGELOG.md.

Assets 2

21 Apr 18:39

TadMSTR

v0.1.2

37c2c2f

v0.1.2

What's Changed

Patch release fixing two bugs discovered during claudebox deployment.

Bug fixes

Streaming response detection now handles application/x-ndjson content-type — Ollama uses this for /api/generate and /api/chat streaming responses; the previous check only matched text/event-stream and chunked application/json
Webhook SSRF check now supports an allowed_hosts list in config — enables webhook delivery to internal hostnames (e.g., ntfy on a LAN IP) without disabling the SSRF guard entirely

Full Changelog: v0.1.1...v0.1.2

Assets 2

21 Apr 12:11

TadMSTR

v0.1.1

81e9476

v0.1.1

What's Changed

Security fixes

SSRF validation bypass via hostnames — validate_webhook_url() previously only checked raw IP literals; hostnames (e.g., http://localhost/hook) bypassed the blocklist. Now resolves hostnames to IP via socket.getaddrinfo() before blocklist comparison. Added 169.254.0.0/16 (link-local / cloud metadata) and fe80::/10 to _PRIVATE_NETWORKS.
Dockerfile missing USER instruction — container now runs as appuser (non-root) by default, consistent with the compose user: 1000:1000 override.
Queue management tier parameter now validated — ?tier=bogus returns HTTP 400 instead of unhandled KeyError → 500. Accepts high, normal, low.
CI action versions updated — actions/checkout → v6.0.2, actions/setup-python → v6.2.0 with correct SHA pins.

Full Changelog: https://github.com/TadMSTR/ollama-queue-proxy/blob/main/CHANGELOG.md

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's new

Added

Fixed

Docker

Full changelog

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

Security

Compatibility

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

Bug fixes

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

Security fixes

Uh oh!

Releases: TadMSTR/ollama-queue-proxy

v0.3.0 — OpenAI-compat embeddings endpoint

What's new

Added

Fixed

Docker

Full changelog

Uh oh!

v0.2.0 — client injection, model-aware routing, embedding cache

Highlights

Security

Compatibility

Uh oh!

v0.1.2

What's Changed

Bug fixes

Uh oh!

v0.1.1

What's Changed

Security fixes

Uh oh!