Local autonomous coding agent stack for AMD RX 6700 XT (RDNA2) on Windows. OpenCode + LM Studio (Vulkan) + Gemma 4 E4B Q6_K + oh-my-opencode plugin. Runs at ~30 tok/s on 12GB VRAM. Free, private, GPU-accelerated.
If you have an AMD RDNA2 GPU (RX 6600/6700/6800 series) on Windows and want a Claude Code-style autonomous coding agent for free, locally, with GPU acceleration, you'll hit a wall:
- Ollama Vulkan + Gemma 4 = broken upstream (issues #15328, #15248, #15285). Q4 outputs
ktktktkt..., Q8 outputs garbage. CPU-only works but ~5 tok/s. - vLLM ROCm is unofficial for
gfx1031. Build is hell. - llama.cpp Vulkan direct build works but needs cmake/MSVC/Vulkan SDK toolchain (30-60 min).
- LM Studio (Vulkan) uses a different ggml build → Gemma 4 works perfectly, GPU 100%.
This repo packages the validated config so you skip the 6+ hour debugging session.
| File | Purpose |
|---|---|
configs/opencode.json |
Global OpenCode config — LM Studio + Ollama providers, MCP servers, default model |
configs/AGENTS.md |
OpenCode global guide — Gemma 4 environment hints, camelCase tool params |
configs/oh-my-openagent.json |
omo plugin config (Claude max20 + opencode/gpt-5-nano agent assignments) |
configs/Modelfile.gemma4-fast |
Ollama Modelfile (legacy, unused with LM Studio path) |
INSTALL.md |
Step-by-step install guide |
TROUBLESHOOTING.md |
Vulkan + Gemma 4 issue notes, model selection matrix |
| Test | Result |
|---|---|
| Gemma 4 E4B Q6_K load | 5.0s on RX 6700 XT |
| Coding (Python fib) | clean code + docstring + edge cases |
| Tool calling | native, finish_reason: tool_calls |
| GPU usage | 3.8GB VRAM (Q6_K, 8K context) |
| Inference speed | ~30 tok/s |
┌─────────────────────────────────────┐
│ OpenCode v1.4.11+ (TUI/CLI) │
│ + oh-my-opencode plugin │
└──────────────┬──────────────────────┘
│ OpenAI-compat HTTP
▼
┌─────────────────────────────────────┐
│ LM Studio (port 1234) │
│ Vulkan backend │
│ google/gemma-4-e4b Q6_K (6.33GB) │
└──────────────┬──────────────────────┘
│ Vulkan API
▼
┌─────────────────────────────────────┐
│ AMD RX 6700 XT (RDNA2 / gfx1031) │
│ 12GB VRAM, Windows 11 │
└─────────────────────────────────────┘
| Workload | Model | Provider |
|---|---|---|
| General coding | lmstudio/google/gemma-4-e4b |
LM Studio (Q6_K, GPU) ← default |
| Smarter cloud work | claude-opus-4-6 |
omo + Claude Pro/Max20 |
| Korean-heavy text | exaone-deep:7.8b |
Ollama (CPU/GPU) |
| Larger thinking model | qwen3.5:9b |
Ollama (GPU works for non-Gemma) |
| Multimodal (vision/audio) | Gemma 4 E4B only | LM Studio |
See INSTALL.md for step-by-step.
TL;DR:
# 1. Install OpenCode
bun install -g opencode-ai
# 2. Install LM Studio (winget) — first-time GUI launch required
winget install -e --id ElementLabs.LMStudio
# Open LM Studio once to init ~/.lmstudio
# 3. Download Gemma 4 E4B Q6 in LM Studio (Hub UI or `lms get`)
# 4. Copy configs
mkdir -p ~/.config/opencode
cp configs/opencode.json ~/.config/opencode/
cp configs/AGENTS.md ~/.config/opencode/
# Edit opencode.json: replace ${HOME}/your/project/root with your actual project path
# 5. Install oh-my-opencode plugin
bunx oh-my-opencode install --no-tui --claude=<yes|max20|no> --gemini=no --copilot=no
# 6. Start LM Studio server
lms server start
lms load google/gemma-4-e4b --gpu max --context-length 8192
# 7. Run
opencodeSee TROUBLESHOOTING.md. Most common: garbage output → you're using Ollama Vulkan + Gemma 4. Switch to LM Studio.
MIT — see LICENSE.
- OpenCode by SST
- oh-my-opencode by code-yeongyu
- LM Studio by Element Labs
- Gemma 4 by Google DeepMind