Skip to content

feat: add NVIDIA NIM (has free tier), Google Gemini (has free tier), local vLLM + embed providers#1

Open
Marinski wants to merge 1 commit into
psyb0t:mainfrom
Marinski:feat/nvidia-gemini-providers
Open

feat: add NVIDIA NIM (has free tier), Google Gemini (has free tier), local vLLM + embed providers#1
Marinski wants to merge 1 commit into
psyb0t:mainfrom
Marinski:feat/nvidia-gemini-providers

Conversation

@Marinski
Copy link
Copy Markdown

@Marinski Marinski commented Jun 1, 2026

Description

Adds four new provider fragments for models accessible outside the existing free-tier flags, wiring them into the build system, tests, docs, and .env.example.

Why

NVIDIA NIM (build.nvidia.com) and Google Gemini (aistudio.google.com) both offer free-rate-limited tiers - similar to Groq, Cerebras, and OpenRouter that aigate already supports. Together they unlock ~10 more models at no cost, including reasoning (kimi-k2), finance-specialized (palmyra), large vision (llama-3.2-90b), MoE code models (qwen3-coder), and Google's latest Gemini 2.5/3 series.

Local vLLM and Embed are optional placeholders for users who self-host their own inference containers. They're wired as custom_llm_provider: openai - no LiteLLM code changes needed.

New providers

Provider File Models Auth Tier
NVIDIA NIM providers/nvidia.yaml 7 NVIDIA_API_KEY + NVIDIA_API_BASE free-rate-limited
Google Gemini providers/gemini.yaml 6 GEMINI_API_KEY free-rate-limited
Local vLLM providers/vllm-local.yaml 2 LOCAL_VLLM_API_KEY + per-model base URLs self-hosted
Local Embed providers/embed-local.yaml 1 EMBED_LOCAL_API_BASE self-hosted

Build integration

  • build-config.py - registers nvidia, gemini, vllm-local, embed-local in active_providers()
  • .env.example - adds NVIDIA, GEMINI, VLLM_LOCAL, EMBED_LOCAL flags + credential variables
  • recommend-limits.sh - detects the four new flags for the enabled summary line
  • tests/test_litellm.sh - gated EXPECTED_MODELS blocks for all four providers
  • docs/providers.md - provider documentation tables with model aliases, underlying models, and notes

Notes

  • NVIDIA free tier is rate-limited. Set NVIDIA_API_BASE to https://integrate.api.nvidia.com/v1 (the free-tier endpoint). Implementation adds just a few of the models, there are a lot more that can be use within the free tier of NVIDIA.
  • nvidia-kimi-k2 uses moonshotai/kimi-k2-thinking which hit EOL 2026-05-12 — included as a placeholder; swap in a replacement model when available.
  • Local providers assume existing Docker containers at documented ports (default API bases use Docker host gateway 172.17.0.1).
  • No changes to Makefile, docker-compose.yml, or existing provider fragments.

Add four new provider fragments and wire them into the build system:

- **NVIDIA NIM** (`nvidia.yaml`): 7 models via api.nvidia.com
  (kimi-k2, palmyra-fin-70b, llama-3.2-90b, qwen3-80b,
  qwen3-coder, deepseek-v3.2, nv-embedqa-e5-v5)
- **Google Gemini** (`gemini.yaml`): 6 models via Gemini API
  (2.5-pro, 2.5-flash, 2.5-flash-lite, 3-flash-preview,
  3.1-flash-lite-preview, embedding-001)
- **Local vLLM** (`vllm-local.yaml`): 2 existing Docker vLLM
  instances (Gemma 4 on :8000, Qwen 3.6 on :8001)
- **Local Embed** (`embed-local.yaml`): Nomic Embed v2 on :8010

Build integration:
- build-config.py: register all 4 providers in active_providers()
- .env.example: add flags (NVIDIA, GEMINI, VLLM_LOCAL, EMBED_LOCAL)
  and credential variables
- recommend-limits.sh: detect new flags for enabled summary
- tests/test_litellm.sh: add gated EXPECTED_MODELS blocks
- docs/providers.md: document all 4 provider sections
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant