A high-performance Rust-based Model Context Protocol (MCP) server that connects Google Antigravity to local LLMs via LM Studio. This lets a cloud model handle orchestration and review while your local model handles code generation, editing, completion, and optional local explanation.
Optimized for openai/gpt-oss-20b and tested on an RTX 5070 Ti (64GB RAM).
- LM Studio running locally with its Server enabled.
To achieve maximum performance with gpt-oss-20b, use the following hardware and inference settings in LM Studio:
- Context Length:
32768 - GPU Offload:
24(Ensures the entire model sits in the 5070 Ti's high-bandwidth GDDR7 memory) - Unified KV Cache:
ON(Allows system RAM to act as a spillover for large context windows) - Offload KV Cache to GPU Memory:
ON(Prioritizes keeping the active context on the GPU for faster inference) - Number of Experts:
4(Maintains optimal speed-to-intelligence ratio) - Evaluation Batch Size:
512(Optimal balance for Tensor Cores)
Set these in the right-hand panel of LM Studio to force the model into a strict, deterministic "Worker" mode:
- Temperature:
0.1or0.2(Lowers creativity to prevent syntax errors or hallucinations in code logic) - Top P Sampling:
0.8(Balances precision without getting stuck in loops) - Min P Sampling:
0.05(Prunes low-probability noise for higher quality snippets) - Reasoning Section Parsing:
ON(Allows you to see the worker's internal logic before generating the code block)
GPT-OSS is trained on the Harmony Chat Format. By default, lm-bridge's included config.toml injects the necessary "Worker" persona into every code generation prompt automatically. You do not need to configure a custom System Prompt in LM Studio. The bridge handles the architecture delegation natively.
While this bridge is optimized natively for openai/gpt-oss-20b, you can swap in other top-tier local coding models.
If you change models, you must update the following:
- The Model Name: Change
model = "..."inside yourconfig.tomlto exactly match the new badge name in LM Studio. - Stop Sequences: Update the
stop_sequencesarray inconfig.tomlto use the new model's tokenizer limits (e.g.["<|im_end|>"]for Qwen or["<|eot_id|>"]for Llama-3). - Prompt Templates: You must rewrite the
[prompt_templates]insideconfig.toml, as the default templates inject a"Worker"persona specific to GPT-OSS's Harmony Chat architecture. - LM Studio Chat Format: Ensure LM Studio is properly set to
ChatML,Llama3, orDeepSeekin the right-hand panel, as the bridge relies on LM Studio to properly structure the/v1/chat/completionsREST request.
Recommended Alternative Coding Models:
- Qwen2.5-Coder (7B or 32B): Widely considered the best local open-source coder right now, with a massive context window and speed.
- DeepSeek-Coder-V2-Lite: An incredibly efficient Mixture-of-Experts model tailored for instruction-following and code-fixing.
- Mistral Codestral: Designed purely for developer code-generation and multi-file workflows.
Warning
Do NOT use Reasoning Models (e.g., DeepSeek-R1, QwQ)
Models that natively output <think> blocks or print chain-of-thought reasoning will completely break the MCP tool JSON parser. The cloud coordinator expects raw code and strings back. Stick to standard Instruct or Coder variants!
- Go to the Releases screen on GitHub and download the
.exe(or macOS/Linux binary) for your operating system. - Place the binary inside a new folder (e.g.,
lm-bridge). - Double-click the
.exe. It will automatically generate your defaultconfig.tomland yourmcp_registration.jsonsnippet in the same folder. - Important Model Note: The auto-generated config is strictly tailored for
openai/gpt-oss-20bout of the box. To use a different model, you do not need to recompile the bridge; you simply need to open theconfig.tomlfile in editor and manually edit the model name, stop sequences, and prompts. Read the Alternative Models section above for exact instructions. - Install to Antigravity: Now that your registration snippet has been generated, scroll down to the Antigravity Setup section below to see how to copy it into your IDE Configuration.
- Set Privacy Rules: Finally, scroll down to the Agent Behavior & Privacy Rules section to install the custom
local_llmrouting instructions.
Build Requirements:
- Rust & Cargo (edition 2021)
- Windows Users: You MUST have Visual Studio Build Tools installed with the "Desktop development with C++" workload selected. This provides the
link.exelinker required for compilation. - macOS Users: You MUST have Xcode Command Line Tools installed. You can install them by running
xcode-select --installin your terminal.
- Clone/Open the
lm-bridgefolder. - Ensure Antigravity is closed (if you've previously run the server, Windows cannot overwrite the binary while it is running).
- Compile the binary:
cargo build --release
- Generate your Registration Snippet:
Run the binary once to automatically generate your configuration JSON and exit:
This will generate your
cargo run --release -- --register
mcp_registration.jsonfile in your project root and quit immediately.
Open LM Studio and look at your loaded model. You will see a small badge (e.g., openai/gpt-oss-20b.
You must copy this string exactly.
You have two ways to set the model:
- Method A (Easiest): Edit the
modelfield in yourconfig.tomlfile next to the server. - Method B (Active): Set the
LM_STUDIO_MODELenvironment variable in the MCP client configuration. This overridesconfig.toml.
# In config.toml
model = "your-copied-model-name-here"Open the generated mcp_registration.json in your project root. It contains the exact absolute path of your lm-bridge.exe and your current model name. You can copy this block directly into your Antigravity configuration.
- Open Antigravity and go to the "Manage MCP servers" screen.
- In the top-right corner, click the "View raw config" 📄 icon. This will open your
mcp_config.jsondirectly in the editor. - Paste the block from your
mcp_registration.jsoninto the"mcpServers"object. - Save the file.
- Go back to the MCP screen and click "Refresh" 🔄. Your
local_llmnode should now be green and active!
{
"mcpServers": {
"local_llm": {
"command": "C:\\Path\\To\\Projects\\lm-bridge\\target\\release\\lm-bridge.exe", // macOS: "/Users/Name/Projects/lm-bridge/target/release/lm-bridge"
"args": [],
"env": {
"LM_STUDIO_MODEL": "openai/gpt-oss-20b"
}
}
}
}To make Gemini intelligently use your local model without being prompted every time, you should add the Google Antigravity Global Rules to your Agent's configuration.
- Click the
+(or...) menu in the top-right of your chat window. - Select Customization (or Rules).
- Copy and paste the exact rules found in
integrations/google_antigravity/GEMINI.md.
In an Antigravity chat, type:
"Use the local_generate tool from local_llm to write a Python script that prints 'Hello World'."
If the model is loaded in LM Studio, you will see the logs pop up in the LM Studio server console, and Gemini will present the resulting code. The MCP server works without mentioning the tool name in the prompt.
- Orchestrated Generation: Antigravity acts as the Architect (planning and review), while your local LLM acts as the Builder (writing code).
- Tools Exposed:
local_generate: For new files and modules.local_edit: For modifying existing code.local_complete: For filling in snippets.local_explain: For privacy-focused, local architectural analysis.
- Make it work with codex app
- Make it work with multiple local models