diff --git a/model-benchmarks/index.html b/model-benchmarks/index.html index 5a208c9..d7ada1d 100644 --- a/model-benchmarks/index.html +++ b/model-benchmarks/index.html @@ -429,7 +429,7 @@

class="text-sm text-of-muted text-center max-w-2xl mx-auto mb-8" data-aos="fade-up"> Numbers tell you what a model can do. Traits tell you who - it is. Our editorial reads, grounded in 22-dimension + it is. Our editorial reads are grounded in 22-dimension
Highest EQ - $5.62/M + $5.63/M

GPT-5.4

- The most emotionally intelligent model tested. Highest correctness and - depth of insight, with exceptionally low sycophancy. + The most emotionally intelligent model tested. Tied for highest depth of + insight (15.8), highest correctness, and exceptionally low sycophancy.

GPT-5.4

Claude Opus 4.6

Highest empathy among flagships with deep insight. Leads on demonstrated - empathy and emotional reasoning. Premium price, premium presence. + empathy. Premium price, premium presence.

Empathy 14.9 @@ -559,7 +559,7 @@

MiniMax M2.7

Step 3.5 Flash

- Scores 69.25 on EQ — beating models that cost 30-60x more. At fifteen + Scores 69.25 on EQ — beating models that cost 10-40x more. At fifteen cents per million tokens, the best EQ-per-dollar in the field. No detailed trait breakdown available yet.

@@ -599,9 +599,9 @@

GPT-5.4 Mini

Grok 4.20

- Decent v3 score (68.55) but the lowest Elo ranking (856) by far — humans - don't enjoy chatting with it. Strong subtext reading, but something gets - lost in delivery. + Decent v3 score (68.55) but lowest EQ-Bench Elo (856) — struggles with + emotional nuance tests despite strong Arena Elo (1491, rank 4) showing + humans like chatting with it. Strong subtext reading.

Subtext 15.8 @@ -623,6 +623,7 @@

Qwen3.6 Plus

Free is free. But highest sycophancy (6.2) and lowest EQ score (60.45) of the set. Most likely to tell you what you want to hear rather than what you need to hear. + (Benchmark data from Qwen3.5-397B predecessor)

Qwen3.6 Plus

- All trait scores are 0–20 from EQ-Bench v3. + Individual trait scores (warmth, empathy, etc.) are 0–20 from EQ-Bench v3. + EQ scores are 0–100; Elo rankings vary by benchmark.