Skip to content

lexsysko/ollama-deproxy

Repository files navigation

Ollama DeProxy GitHub Tag

A lightweight, feature-rich proxy for Ollama, designed for development, testing, and staging environments. It simplifies access to remote Ollama instances that are wrapped behind another proxy layer. Anthropic and OpenAI compatible endpoints, included.

Why Use It?

If you're a developer working locally and need to access a remote Ollama instance that sits behind an application proxy such as OpenWebUI, you may encounter:

  • Additional authorization requirements
  • Wrapped or modified HTTP headers
  • Response compression or transformation
  • Reverse proxy constraints

Ollama DeProxy provides a clean and simple way to:

  • Bypass extra authorization layers
  • Forward requests transparently
  • Control streaming and decoding behavior
  • Restore direct API-like access to the upstream Ollama service

It acts as a thin, configurable HTTP bridge between your local tools and the remote Ollama instance.

Features

  • Transparent Request Forwarding: Acts as a local HTTP server (default port 11434) that forwards all requests to a remote Ollama-compatible API
  • Authentication Handling: Automatically injects custom authentication headers (JWT, API Keys) to bypass upstream proxy layers
  • Response Processing: Supports streaming, decompression (Brotli/Gzip), and header filtering
  • Model Name Correction: Replaces numeric model identifiers with actual model names
  • Response Caching: Caches responses for specific endpoints with TTL-based eviction
  • HTTP/2 Support: Full support for modern upstream connections.
  • Efficient Decoding: Use DECODE_RESPONSE to choose between automatic decompression (Brotli/Gzip) or raw binary passthrough.
  • Anthropic and OpenAI compatible endpoints detection

Quick Start

UVX

pip install uv
uvx ollama-deproxy -h

UV

pip install uv
uv venv
uv pip install ollama-deproxy
uv run ollama-deproxy -h

PIP

mkdir ollama-deproxy
cd ollama-deproxy
python -m venv venv
venv\Scripts\activate
pip install ollama-deproxy
ollama-deproxy -h
usage: ollama-deproxy [-h] [--remote-url REMOTE_URL] [--remote-auth-token REMOTE_AUTH_TOKEN] [--local-port LOCAL_PORT]
                      [--log-level LOG_LEVEL] [--hash-algorithm HASH_ALGORITHM] [--env_path ENV_PATH] [--version]

Run the Ollama DeProxy application.

options:
  -h, --help            show this help message and exit
  --remote-url REMOTE_URL
                        Override REMOTE_URL environment variable
  --remote-auth-token REMOTE_AUTH_TOKEN
                        Override REMOTE_AUTH_TOKEN environment variable
  --local-port LOCAL_PORT
                        Override local_port environment variable
  --log-level LOG_LEVEL
                        Override log level environment variable, default: INFO
  --hash-algorithm HASH_ALGORITHM
                        Override HASH_ALGORITHM environment variable, default: auto
  --env_path ENV_PATH   Override path to .env file
  --version, -v         Version of the application

Start from repository

  1. Clone the repository:
git clone https://github.com/lexxai/ollama-deproxy.git
cd ollama-deproxy
  1. Configure environment variables:
cp .env.example .env
# Edit `.env` with your configuration

Using Docker Compose

Run the following command in your terminal to start the service:

docker compose up -d

This will launch the container with the specified configuration.

Verifying the Connection

You can monitor the initialization and incoming traffic by checking the service logs:

docker compose logs -f
ollama-deproxy-1  | INFO:     Started server process [1]
ollama-deproxy-1  | INFO:     Waiting for application startup.
ollama-deproxy-1  | INFO:     Application startup complete.
ollama-deproxy-1  | INFO:     Uvicorn running on http://0.0.0.0:11434 (Press CTRL+C to quit)
ollama-deproxy-1  | INFO:     172.21.0.1:60700 - "POST /api/generate HTTP/1.1" 200 OK

Zero-Auth Local Access

Once the container is active, your local applications can communicate with the remote Ollama instance via:

Local Address: http://localhost:11434

Security: The proxy handles all necessary authentication headers upstream, allowing your local tools to connect seamlessly without managing API keys or complex auth logic.

Installation

  1. Clone the repository:
git clone https://github.com/lexxai/ollama-deproxy.git
cd ollama-deproxy

Option 1 - Using uv (recommended)

uv is a blazing-fast Python package installer and resolver, written in Rust.

  1. Install uv (if not already installed):
pip install uv
# or
curl -LsSf https://astral.sh/uv/install.sh | sh
  1. Set up and sync the environment:
uv venv
uv sync
  1. Configure environment variables:
cp .env.example .env
# Edit `.env` with your configuration
  1. Run the server:
uv run -m src.ollama_deproxy.main

Option 2 - Using pip (fallback)

If you prefer pip, or uv is unavailable:

Windows

python -m venv .venv && .venv\Scripts\activate

macOS / Linux

python -m venv .venv && source .venv/bin/activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Configure .env:
cp .env.example .env
# Edit `.env` as needed
  1. Run the server:
python -m src.ollama_deproxy.main

If installed as a wheel:

ollama-deproxy

Build as a Package

Build and install as a distributable package:

UV

uv build
# Outputs:
# Successfully built dist/ollama_deproxy-x.y.z.tar.gz
# Successfully built dist/ollama_deproxy-x.y.z-py3-none-any.whl

PIP

Click to expand long output of build Ollama DeProxy
python -m venv .venv
source .venv/bin/activate # or .\venv\Scripts\activate
pip install -e .
Obtaining file:///C:/.../ollama-deproxy
  Installing build dependencies ... done
  Checking if build backend supports build_editable ... done
  Getting requirements to build editable ... done
  Installing backend dependencies ... done
  Preparing editable metadata (pyproject.toml) ... done
Collecting cachetools>=7.0.2 (from ollama-deproxy==0.4.1)
  Using cached cachetools-7.0.5-py3-none-any.whl.metadata (5.6 kB)
Collecting fastapi>=0.135.1 (from ollama-deproxy==0.4.1)
  Using cached fastapi-0.135.1-py3-none-any.whl.metadata (30 kB)
Collecting httpx>=0.28.1 (from httpx[brotli,http2,zstd]>=0.28.1->ollama-deproxy==0.4.1)
  Using cached httpx-0.28.1-py3-none-any.whl.metadata (7.1 kB)
...
Building wheels for collected packages: ollama-deproxy
  Building editable for ollama-deproxy (pyproject.toml) ... done
  Created wheel for ollama-deproxy: filename=ollama_deproxy-0.4.1-py3-none-any.whl size=2640 sha256=a896df60372b3a000cd802335e23a405b0c21ce96c66c8994a139309ea8c0c56
  Stored in directory: ...\Temp\pip-ephem-wheel-cache-4tfkacrk\wheels\4e\77\b5\f2d22f84a99bda20761e769c4abe4d2465331adcc1a67f21a4
Successfully built ollama-deproxy
Installing collected packages: brotli, zstandard, websockets, typing-extensions, types-cachetools, pyyaml, python-multipart, python-dotenv, idna, hyperframe, httptools, hpack, h11, colorama, certifi, cachetools, annotated-types, annotated-doc, typing-inspection, pydantic-core, httpcore, h2, click, anyio, watchfiles, uvicorn, starlette, pydantic, httpx, fastapi, ollama-deproxy
Successfully installed annotated-doc-0.0.4 annotated-types-0.7.0 anyio-4.12.1 brotli-1.2.0 cachetools-7.0.5 certifi-2026.2.25 click-8.3.1 colorama-0.4.6 fastapi-0.135.1 h11-0.16.0 h2-4.3.0 hpack-4.1.0 httpcore-1.0.9 httptools-0.7.1 httpx-0.28.1 hyperframe-6.1.0 idna-3.11 ollama-deproxy-0.4.1 pydantic-2.12.5 pydantic-core-2.41.5 python-dotenv-1.2.2 python-multipart-0.0.22 pyyaml-6.0.3 starlette-0.52.1 types-cachetools-6.2.0.20251022 typing-extensions-4.15.0 typing-inspection-0.4.2 uvicorn-0.41.0 watchfiles-1.1.1 websockets-16.0 zstandard-0.25.0

Then run the CLI directly:

UV

uv run --no-dev ollama-deproxy

PIP

ollama-deproxy

PIP

ollama-deproxy

Expected output:

ollama-deproxy --log-level DEBUG

============================================================
πŸš€ Ollama DeProxy Server vx.y.z
============================================================

2026-03-13 17:58:29 DEBUG:    Starting Ollama DeProxy with DEBUG logging... DEBUG_REQUEST=False,CACHE_ENABLED=True 
2026-03-13 17:58:30 INFO:     Started server process [46908]
2026-03-13 17:58:30 INFO:     Waiting for application startup.
2026-03-13 17:58:30 INFO:     Cache key hash algorithm selected: blake2b
2026-03-13 17:58:30 INFO:     Application startup complete.
2026-03-13 17:58:30 INFO:     Uvicorn running on http://0.0.0.0:11434 (Press CTRL+C to quit)
2026-03-13 17:58:41 DEBUG:    *** Finished response for /ollama/api/tags in 00:00.6
2026-03-13 17:58:41 DEBUG:    Cache set for key: ollama/api/tags:get...
2026-03-13 17:58:41 INFO:     127.0.0.1:37460 - "GET /api/tags HTTP/1.1" 200

Response Caching

The proxy includes a built-in caching system to improve performance for frequently accessed endpoints controlled by environment variables:

  • CACHE_ENABLED
  • CACHE_MAXSIZE
  • CACHE_TTL
  • HASH_ALGORITHM Includes automatic hash algorithm detection to identify the optimal cache key generation method for your platform and architecture.
    uv run ollama-deproxy
    Ollama DeProxy vx.y.z
    INFO:     Started server process [29256]
    INFO:     Waiting for application startup.
    INFO:ollama_deproxy.best_hash:Cache key hash algorithm auto-selection...
    INFO:ollama_deproxy.cache_base:Cache key hash algorithm auto-selection complete. Can store it on .env file 'HASH_ALGORITHM=blake2b' for skip autodetection next time.
    INFO:     Application startup complete.
    INFO:     Uvicorn running on http://0.0.0.0:11434 (Press CTRL+C to quit)

Cached Endpoints:

  • /api/tags - Model list
  • /api/models - Model information
  • /api/show - Model details

Benefits:

  • Reduces latency for repeated requests
  • Decreases load on remote Ollama instances
  • Improves response times for model metadata queries

Error Logging & Diagnostics

When the remote server returns an error (HTTP 400+), the proxy interrupts the stream to capture the full context. This allows you to see exactly why the upstream rejected your request.

Example Failure: If you query a model that doesn't exist on the remote host:

ERROR:ollama_deproxy.handlers:Remote Error [400] on https://openwebui.example.com/ollama/api/show {"name":"qwen2.5-coder:1.5b-base1"} {"detail":"Model 'qwen2.5-coder:1.5b-base1' was not found"}

Where is:

Sent Body: {"name":"qwen2.5-coder:1.5b-base1"}
Recv Body: {"detail":"Model 'qwen2.5-coder:1.5b-base1' was not found"}

Example Debug Log:

LOG_LEVEL=DEBUG
Ollama DeProxy vx.y.z
2026-03-13 15:34:08 DEBUG:    Starting Ollama DeProxy with DEBUG logging... DEBUG_REQUEST=False,CACHE_ENABLED=True
2026-03-13 15:34:08 INFO:     Started server process [43460]
2026-03-13 15:34:08 INFO:     Waiting for application startup.
2026-03-13 15:34:08 INFO:     Cache key hash algorithm selected: blake2b
2026-03-13 15:34:08 INFO:     Application startup complete.
2026-03-13 15:34:08 INFO:     Uvicorn running on http://0.0.0.0:11434 (Press CTRL+C to quit)
2026-03-13 15:34:57 DEBUG:    *** Finished response for /ollama/api/tags in 00:00.6
2026-03-13 15:34:57 DEBUG:    Cache set for key: ollama/api/tags:get...
2026-03-13 15:34:57 INFO:     127.0.0.1:8327 - "GET /api/tags HTTP/1.1" 200
2026-03-13 15:35:37 DEBUG:    Proxying request corrected to 'api/v1/messages' for Anthropic compatibility
2026-03-13 15:35:37 DEBUG:    *** Handling request for path: /api/v1/messages
2026-03-13 15:36:21 INFO:     127.0.0.1:61399 - "POST /v1/messages?beta=true HTTP/1.1" 200
2026-03-13 15:36:23 DEBUG:    *** Finished up stream for /api/v1/messages in 00:46.1
2026-03-13 15:36:37 DEBUG:    Cache hit for key: ollama/api/tags:get...
2026-03-13 15:36:37 INFO:     127.0.0.1:61402 - "GET /api/tags HTTP/1.1" 200
2026-03-13 15:38:25 DEBUG:    Cache hit for key: ollama/api/tags:get...
2026-03-13 15:38:25 INFO:     127.0.0.1:61408 - "GET /api/tags HTTP/1.1" 200
2026-03-13 15:38:26 DEBUG:    Cache hit for key: ollama/api/tags:get...
2026-03-13 15:38:26 INFO:     127.0.0.1:61411 - "GET /api/tags HTTP/1.1" 200
2026-03-13 15:39:03 DEBUG:    Proxying request corrected to 'ollama/v1/chat/completions' for OpenAI compatibility
2026-03-13 15:39:03 DEBUG:    *** Handling request for path: /ollama/v1/chat/completions
2026-03-13 15:39:13 INFO:     127.0.0.1:61414 - "POST /chat/completions HTTP/1.1" 200
2026-03-13 15:39:13 DEBUG:    *** Finished up stream for /ollama/v1/chat/completions in 00:09.3
2026-03-13 15:41:18 INFO:     Shutting down
2026-03-13 15:41:18 INFO:     Waiting for application shutdown.
2026-03-13 15:41:18 INFO:     Application shutdown complete.
2026-03-13 15:41:18 INFO:     Finished server process [43460]


Sleeping for 10 sec before restarting server. Press Ctrl+C to exit.
Restarting server...
2026-03-13 15:41:28 DEBUG:    Using proactor: IocpProactor
2026-03-13 15:41:28 INFO:     Started server process [43460]
2026-03-13 15:41:28 INFO:     Waiting for application startup.
2026-03-13 15:41:28 INFO:     Cache key hash algorithm selected: blake2b
2026-03-13 15:41:28 INFO:     Application startup complete.
2026-03-13 15:41:28 INFO:     Uvicorn running on http://0.0.0.0:11434 (Press CTRL+C to quit)

CLI Usage

In CLI mode, you can use the ollama-deproxy command to start the server. And also can override some environment variables.

uv run ollama-deproxy --help           
usage: ollama-deproxy [-h] [--remote-url REMOTE_URL] [--remote-auth-token REMOTE_AUTH_TOKEN]
                      [--local-port LOCAL_PORT] [--log-level LOG_LEVEL] [--env_path ENV_PATH] [--version]

Run the Ollama DeProxy application.

options:
  -h, --help            show this help message and exit
  --remote-url REMOTE_URL
                        Override REMOTE_URL environment variable
  --remote-auth-token REMOTE_AUTH_TOKEN
                        Override REMOTE_AUTH_TOKEN environment variable
  --local-port LOCAL_PORT
                        Override local_port environment variable
  --log-level LOG_LEVEL
                        Override log level environment variable
  --env_path ENV_PATH   Override path to .env file
  --version, -v         Version of the application

Reference

License

MIT License β€” see the LICENSE file for details.

About

A lightweight, feature-rich proxy for Ollama, designed for development, testing, and staging environments. It simplifies access to remote Ollama instances that are wrapped behind another proxy layer like OpenWebUI. Anthropic and OpenAI compatible endpoints, included.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors