Populating the benchmarks with SOTA

As an introduction to the issue, Qwen3-Next (mixed attention model) came out and claims to be as good as their large MoE models. https://huggingface.co/collections/Qwen/qwen3-next-68c25fd6838e585db8eeea9d

Other leaderboards that are up to date (as an example) https://livebench.ai/ https://www.swebench.com/ https://gorilla.cs.berkeley.edu/leaderboard.html