diff --git a/model-benchmarks/index.html b/model-benchmarks/index.html index 5a208c9..d7ada1d 100644 --- a/model-benchmarks/index.html +++ b/model-benchmarks/index.html @@ -429,7 +429,7 @@
- The most emotionally intelligent model tested. Highest correctness and - depth of insight, with exceptionally low sycophancy. + The most emotionally intelligent model tested. Tied for highest depth of + insight (15.8), highest correctness, and exceptionally low sycophancy.
Highest empathy among flagships with deep insight. Leads on demonstrated - empathy and emotional reasoning. Premium price, premium presence. + empathy. Premium price, premium presence.
- Scores 69.25 on EQ — beating models that cost 30-60x more. At fifteen + Scores 69.25 on EQ — beating models that cost 10-40x more. At fifteen cents per million tokens, the best EQ-per-dollar in the field. No detailed trait breakdown available yet.
@@ -599,9 +599,9 @@- Decent v3 score (68.55) but the lowest Elo ranking (856) by far — humans - don't enjoy chatting with it. Strong subtext reading, but something gets - lost in delivery. + Decent v3 score (68.55) but lowest EQ-Bench Elo (856) — struggles with + emotional nuance tests despite strong Arena Elo (1491, rank 4) showing + humans like chatting with it. Strong subtext reading.
- All trait scores are 0–20 from EQ-Bench v3. + Individual trait scores (warmth, empathy, etc.) are 0–20 from EQ-Bench v3. + EQ scores are 0–100; Elo rankings vary by benchmark.