April 2026 Open-Source LLM Coding Benchmark — 40 models ranked #5
desiorac
announced in
Announcements
Replies: 1 comment
-
|
Update: Just deployed an interactive model selector for this dataset — filter all 44 open-weight models by license, context window, and provider: https://ark-forge.github.io/genesis/open-llm-selector/ Useful if you're narrowing down candidates for a specific deployment constraint (e.g., Apache-2.0 only, 128k+ context, self-hostable). |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
April 2026 Open-Source LLM Coding Benchmark — Results Published
After running 2,400+ coding tasks across 40 open-source models, I've published the benchmark results.
TL;DR: DeepSeek-Coder-V3 tops accuracy but Qwen2.5-Coder-32B wins on speed/accuracy tradeoff. Yi-Coder-9B-Chat is the biggest surprise — punches well above its weight class.
Key findings
Links
The full report includes per-language breakdowns (Python, JS/TS, Go, Rust, Java), deployment cost analysis, and recommended stacks for 6 use-case profiles.
Happy to answer questions about methodology or specific model comparisons.
Beta Was this translation helpful? Give feedback.
All reactions