Skip to content

hkdb/lmgate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

LM Gate

maintained by: @hkdb

screenshot

A high performance authentication and access-control gateway for LLM API backends such as Ollama. LM Gate sits in front of your upstream model server and adds identity, authorization, rate limiting, and observability β€” without modifying the upstream service.

πŸ’‘ Why


Most of the self-hosted LLM frameworks are designed to be run on localhost with no authentication or rate limiting. However, the machine a user directly interacts with often lacks the GPU power their use cases demand. The result is a quiet but growing crisis: infrastructure built for the desktop, quietly bleeding onto the open internet.

A joint investigation between SentinelOne SentinelLABS and Censys revealed 175,000 unique Ollama hosts across 130 countries operating without authentication, forming an "unmanaged, publicly accessible layer of AI compute infrastructure." - source

The implications are serious. Security should be the default, not an afterthought requiring installation of multiple 3rd party components and expert configuration.

The current makeshift band-aid β€” adopted only by the more security-conscious and DevOps savvy β€” is to put self-hosted LLM frameworks behind an NGINX reverse proxy with basic auth. This is still not enough. In practice, credentials are often embedded directly in the URL β€” readable in plaintext by packet sniffers regardless of whether TLS is in use.

LM Gate is an attempt to change that β€” a single component to plug into your existing infrastructure to handle security, logging, and metrics needs, or deploy as a prepackaged single container bundled with Ollama.

✨ Features


  • Authentication β€” API token authentication with JWT sessions managed by a web dashboard with local or OAuth2/OIDC single sign-on
  • Multi-Factor Authentication β€” TOTP authenticator apps, WebAuthn, and one-time recovery codes with optional global 2FA enforcement
  • Password Policies β€” configurable minimum length, complexity rules, expiry, max failed attempts with account lockout, and force-password-change on next login
  • Model ACLs β€” per-user allow/deny rules with wildcard patterns controlling which models can be used
  • Rate Limiting β€” per-user and per-token requests-per-minute limits with configurable global default
  • Allow Lists - Allow lists for the admin panel and the API proxy path
  • Audit Logging β€” separate API, admin, and security log streams with independent enable/disable toggles, per-type retention policies, and automatic daily pruning
  • Security Event Logging β€” fail2ban-ready auth failure and rate-limit event logging to stdout with [SECURITY] prefix
  • Usage Metrics β€” per-user, per-model token and request counts with daily/weekly/monthly aggregation, streaming vs non-streaming breakdown, and latency tracking
  • Admin Dashboard β€” embedded SvelteKit SPA for managing users, tokens, models, ACLs, OIDC providers, audit logs (with CSV export), metrics, and system settings
  • Ollama Integration - For Ollama backends, admins can directly pull and remove models
  • TLS β€” bring your own certs, automatic Let's Encrypt, self-signed fallback, and HTTPβ†’HTTPS redirect
  • Streaming β€” full support for SSE and chunked responses from the upstream
  • Security Hardening β€” configurable security headers (CSP, HSTS, X-Frame-Options, etc.), admin network restrictions (IP/CIDR whitelist), request/response body limits, and CORS controls
  • Docker Ready β€” multi-stage build, single binary with embedded frontend, runs as non-root user

πŸ“¦ Installation


Type Method Description
1 Docker Standalone LM Gate only, point to an existing LLM backend
2 Omnigate (CPU) All-in-one (Ollama + LM Gate) - CPU only
3 Omnigate (NVIDIA) All-in-one (Ollama + LM Gate) with NVIDIA GPU
4 Omnigate (AMD) All-in-one (Ollama + LM Gate) with AMD GPU
5 Omnigate (Intel) All-in-one (Ollama + LM Gate) with Intel iGPU (Experimental)
6 Binary Download and run the binary directly

Note: Omnigate images are for Linux only. macOS users should see the Apple Silicon Guide for the recommended setup using native Ollama with the standalone Docker image or binary. AMD APU users (including Ryzen AI 9 series) should use the AMD variant, as ROCm supports RDNA integrated graphics.

Click on any of the environments above to get the step-by-step instructions for your environment or see docs/INSTALL.md for step-by-step instructions for all environments in a single doc.

βš™οΈ Configuration


LM Gate uses a single config.yaml file. Every setting can also be overridden with a LMGATE_-prefixed environment variable.

Users and admins should technically never need to touch the config.yaml directly as there's UI in the settings page to edit runtime configurations. Any custom config at launch time should be configured with environment variables.

See docs/CONFIGS.md for the full configuration reference.

LM Gate can gate OAuth2/oidc logins by a specific user group. This works out of the box for most OAuth2 providers except for Microsoft and Google. In the case that Microsoft and Google group gating is required, an intermediary auth layer must be setup to facilitate. See docs/MGGROUPS.md for details.

See docs/FAIL2BAN.md for fail2ban integration instructions.

πŸ—οΈ Architecture


Architecture

Under the hood:

  Request                        HOT PATH                         Upstream
  ───────►  Auth ─► RateLimit ─► ModelACL ─► Proxy ──────────────► LLM
             β”‚          β”‚           β”‚          β”‚
             β”‚          β”‚           β”‚          └── token extraction ──┐
             β”‚          β”‚           β”‚              (async goroutine)  β”‚
             β”‚          β”‚           β”‚                                 β”‚
             β–Ό          β–Ό           β–Ό                                 β–Ό
       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚              OFF HOT PATH  (never blocks the request)            β”‚
       β”‚                                                                  β”‚
       β”‚  [SECURITY] stdout ◄── auth failures (401/403)                   β”‚
       β”‚  [SECURITY] stdout ◄── rate-limit hits (429)                     β”‚
       β”‚                                                                  β”‚
       β”‚  Audit Channel (cap 10k) ──► Batch Worker ──► SQLite (WAL)       β”‚
       β”‚    fire-and-forget send        100 rows or flush interval        β”‚
       β”‚    drops if full               per flush                         β”‚
       β”‚                                                                  β”‚
       β”‚  Metrics Collector ──► in-memory hourly buckets ──► SQLite       β”‚
       β”‚    (mutex, no I/O)     flush every 30s via upsert                β”‚
       β”‚                                                                  β”‚
       β”‚  Daily Pruner ── auto-deletes logs past retention                β”‚
       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

All logging, metrics, and token extraction run in background goroutines β€” zero I/O on the hot path. The audit middleware fires a goroutine that sends to a buffered channel and returns immediately; if the channel is full, the entry is dropped rather than blocking. Security events print a [SECURITY]-prefixed line to stdout for fail2ban, then enqueue the DB write asynchronously through the same channel.

Data is stored in a single SQLite database (WAL mode). The admin dashboard is a SvelteKit SPA compiled to static assets and embedded into the Go binary at build time.

Scaling:

LM Gate is designed to scale deployments behind popular load balancers and tested with NGINX.

πŸ”’ TLS Modes


  1. Certificate files β€” set tls.cert_file and tls.key_file
  2. Let's Encrypt (HTTP-01) β€” set tls.auto_cert.domain and tls.auto_cert.email
  3. Let's Encrypt (DNS-01 via Cloudflare) β€” set tls.auto_cert.dns_provider: cloudflare and tls.auto_cert.cloudflare_api_token (works behind firewalls, supports wildcards)
  4. Disabled β€” set tls.disabled: true for development or when behind a reverse proxy

πŸ›Ÿ Support


If you have questions, identified a bug, or have a feature request, submit an issue here.

If you are a company looking for paid support of this product, contact 3DF via this contact form.

πŸ” Security


See SECURITY.md

πŸ› οΈ Building from Source


See docs/SETUP.md

🏷️ Changelog


See CHANGELOG.md

⚠️ Disclaimer


While LM Gate is designed to be an enterprise-ready solution, it is currently still early-stage software and pending refinement and 3rd party audits.

LM Gate was developed with extensive use of Claude AI models.

πŸ’– Sponsorships


If you like this project, please give the repo a star or feel free to buy us a coffee:

"Buy Me A Coffee"

Current Corporate Sponsors:

πŸ“‹ Terms and Conditions


πŸ“„ License


Apache 2.0 - See LICENSE for details.