feat: add NVIDIA NIM (has free tier), Google Gemini (has free tier), local vLLM + embed providers#1
Open
Marinski wants to merge 1 commit into
Open
Conversation
Add four new provider fragments and wire them into the build system: - **NVIDIA NIM** (`nvidia.yaml`): 7 models via api.nvidia.com (kimi-k2, palmyra-fin-70b, llama-3.2-90b, qwen3-80b, qwen3-coder, deepseek-v3.2, nv-embedqa-e5-v5) - **Google Gemini** (`gemini.yaml`): 6 models via Gemini API (2.5-pro, 2.5-flash, 2.5-flash-lite, 3-flash-preview, 3.1-flash-lite-preview, embedding-001) - **Local vLLM** (`vllm-local.yaml`): 2 existing Docker vLLM instances (Gemma 4 on :8000, Qwen 3.6 on :8001) - **Local Embed** (`embed-local.yaml`): Nomic Embed v2 on :8010 Build integration: - build-config.py: register all 4 providers in active_providers() - .env.example: add flags (NVIDIA, GEMINI, VLLM_LOCAL, EMBED_LOCAL) and credential variables - recommend-limits.sh: detect new flags for enabled summary - tests/test_litellm.sh: add gated EXPECTED_MODELS blocks - docs/providers.md: document all 4 provider sections
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Adds four new provider fragments for models accessible outside the existing free-tier flags, wiring them into the build system, tests, docs, and .env.example.
Why
NVIDIA NIM (build.nvidia.com) and Google Gemini (aistudio.google.com) both offer free-rate-limited tiers - similar to Groq, Cerebras, and OpenRouter that aigate already supports. Together they unlock ~10 more models at no cost, including reasoning (kimi-k2), finance-specialized (palmyra), large vision (llama-3.2-90b), MoE code models (qwen3-coder), and Google's latest Gemini 2.5/3 series.
Local vLLM and Embed are optional placeholders for users who self-host their own inference containers. They're wired as
custom_llm_provider: openai- no LiteLLM code changes needed.New providers
providers/nvidia.yamlNVIDIA_API_KEY+NVIDIA_API_BASEproviders/gemini.yamlGEMINI_API_KEYproviders/vllm-local.yamlLOCAL_VLLM_API_KEY+ per-model base URLsproviders/embed-local.yamlEMBED_LOCAL_API_BASEBuild integration
build-config.py- registersnvidia,gemini,vllm-local,embed-localinactive_providers()NVIDIA,GEMINI,VLLM_LOCAL,EMBED_LOCALflags + credential variablesrecommend-limits.sh- detects the four new flags for the enabled summary linetests/test_litellm.sh- gatedEXPECTED_MODELSblocks for all four providersdocs/providers.md- provider documentation tables with model aliases, underlying models, and notesNotes
NVIDIA_API_BASEtohttps://integrate.api.nvidia.com/v1(the free-tier endpoint). Implementation adds just a few of the models, there are a lot more that can be use within the free tier of NVIDIA.nvidia-kimi-k2usesmoonshotai/kimi-k2-thinkingwhich hit EOL 2026-05-12 — included as a placeholder; swap in a replacement model when available.172.17.0.1).