Skip to content

Add Model Personalities section and compress hero#15

Merged
TechNickAI merged 2 commits intomainfrom
model-personalities
Apr 7, 2026
Merged

Add Model Personalities section and compress hero#15
TechNickAI merged 2 commits intomainfrom
model-personalities

Conversation

@TechNickAI
Copy link
Copy Markdown
Owner

Summary

  • Compressed hero from ~400px to ~150px — data table now visible above the fold on first load
  • Added Model Personalities section with 9 curated editorial insight cards between the table and methodology, grounded in EQ-Bench v3 trait data
  • Relocated data source attribution to footer (removed from hero but preserved with links)
  • Added insight card CSS with 5 tag variants and 3 trait chip variants matching the Organic Flow design system
  • Updated AGENTS.md with maintenance docs for the personality section

Review process

6 parallel review agents (logic, UX/empathy, coding, security, design, architecture) identified and fixed:

  • Data bug: Opus warmth was 12.0, corrected to 13.6 from model-data.json
  • False claim: Sonnet "lowest sycophancy of any Claude" was wrong (Haiku is lower at 2.9)
  • Trust calibration: Softened caution tags (Grok → "The Enigma" neutral, Qwen → "The People-Pleaser" neutral)
  • Editorial framing: Added "Our editorial reads" to personality intro for transparency
  • Methodology contrast: Restored bg-of-cream so white cards remain visible
  • Tag differentiation: Deepened --top green to distinguish from --value sage

Test plan

  • Verify data table visible above fold on desktop
  • Check personality cards render correctly on mobile (1-col stack)
  • Hover EQ cells to confirm tooltips still work
  • Verify footer shows all 5 data source links
  • Check methodology section cards have visible borders on cream background

🤖 Generated with Claude Code

Compress hero from ~400px to ~150px so the benchmark table is visible
on first load. Add editorial "Model Personalities" section with 9
curated insight cards based on EQ-Bench v3 trait data, placed between
the data table and methodology.

Cards highlight personality profiles (Highest EQ, Warmest Flagship,
Most Humanlike, etc.) with trait chips grounded in the 22-dimension
EQ-Bench data. Reviewed by 6 parallel agents (logic, UX/empathy,
coding, security, design, architecture) — fixed data accuracy bugs
(Opus warmth, Sonnet sycophancy claim), softened editorial tone on
caution tags, restored data source attribution to footer, improved
trait footnote clarity, and added maintenance docs to AGENTS.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude Bot commented Apr 7, 2026

PR Review — Add Model Personalities section and compress hero

Overall this is solid work. The hero compression is clean, the card component system is well-structured, and the editorial framing is appropriately humble ("Our editorial reads"). A few issues to address:


Bugs / Issues

1. Dead CSS — insight-tag--caution is defined but never used

styles.css defines .insight-tag--caution but no card in the HTML uses it. The prior design had caution tags; they were softened to --neutral per the PR description. The dead rule should be removed.

/* Remove this — never referenced in HTML */
.insight-tag--caution {
    background: rgba(180, 83, 9, 0.1);
    color: #92400e;
}

2. Step 3.5 Flash trait chips mix data types

The trait chips for this card show $0.15/M and 85 t/s — these are pricing and speed metadata, not EQ traits. Every other card uses trait chips for EQ-Bench dimensions. This card lacks full trait breakdown (acknowledged in the copy), but using pricing/speed as chips is inconsistent and will read oddly when users hover or compare. Consider either showing just EQ 69.25 with a note chip like Full traits pending, or using the neutral chip styling with a label that makes the data type clear.


UX / Content Concerns

3. EQ explanation lost above the fold

The hero previously contained: "Emotional intelligence shapes how AI listens, responds to vulnerability, and holds space..." This was removed to compress the hero. First-time visitors now land on a data table with no explanation of what EQ-Bench measures or why it matters before seeing the personalities section. The footnote at the bottom of the personalities section partially covers this, but the motivating context is gone from above the fold. Consider preserving a one-line version in the hero tagline.

4. Sycophancy chip coloring may confuse without footnote context

Green chips labeled Sycophancy 3.6 or Sycophancy 2.7 read as "high sycophancy is good" without the footnote. The footnote ("lower is better — green means less of it") is below the grid and easy to miss. A small ↓ better indicator inline, or a (lower is better) note in the section intro paragraph, would catch users who don't reach the footnote.

5. "Safety First" tag uses --neutral styling for a clearly positive framing

GPT-5.4 Mini's tag reads "Safety First" — the copy is enthusiastic ("firm, principled companion — not a people-pleaser") — but the tag renders in the muted neutral style. This is the only card where the tag sentiment and tag color don't match. Either use --top styling or adjust the editorial copy to match the neutral tone.


Minor

6. Footer attribution links missing underline class

The hero previously had class="underline hover:..." on data source links. The new footer links use class="hover:..." only — no underline in resting state. Inconsistent with footer link convention elsewhere (the "HeartCentered AI" link uses font-medium but no underline, so this might be intentional). Worth a quick check for visual consistency.


What's Good

  • Data bug fixes (Opus warmth 12.0 → 13.6, Sonnet sycophancy correction) are well-documented in the PR summary
  • All external links in footer correctly have target="_blank" rel="noopener noreferrer"
  • CSS component naming follows existing BEM-like conventions cleanly
  • "Our editorial reads" framing is the right call for transparency
  • Footnote explaining inverted scoring ("lower is better — green means less of it") addresses the most likely point of confusion
  • AGENTS.md update gives future maintainers clear guidance on when to revisit the cards

Priority: Items 1 and 2 are concrete bugs. Items 3–5 are UX judgment calls worth considering before merge. Item 6 is cosmetic.

…ag, sycophancy hint

- Remove dead .insight-tag--caution CSS (softened to neutral in PR, never used)
- Step 3.5 Flash: simplify trait chips to EQ 69.25 + "Traits pending"
- GPT-5.4 Mini "Safety First" tag: neutral → top (positive framing deserves positive color)
- Move sycophancy inversion hint into section intro (before chip grid, not just footnote)
- Trim redundant footnote now that intro covers it

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@TechNickAI
Copy link
Copy Markdown
Owner Author

Fixed (7004d18):

Declined:

  • ✨ Amplify the light: Add urgency and hope without the doom #3 (EQ context above fold): The hero compression was intentional; the personality section intro ("Traits tell you who it is") and the EQ column tooltip provide sufficient context. Restoring a paragraph defeats the above-fold goal.
  • 💅 Design polish improvements #6 (footer link underlines): Footer links on this page conventionally use hover-only styling — the HeartCentered AI link and JSON/llms.txt links have no resting underline either.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 7, 2026

PR Review — Add Model Personalities section and compress hero

Overall this is a solid, well-executed PR. The hero compression is clearly the right call for UX, and the personality cards are a compelling editorial layer on top of the raw benchmark data. The six-agent review pass caught real issues (the Opus warmth bug, the sycophancy false claim). Here's what I found on a second pass:


Data accuracy

Cross-checked all trait values against model-data.json. Almost everything checks out. Two issues:

1. GPT-5.4 cost rounding — blended price in JSON is 5.625, card shows $5.62. Should round to $5.63. Minor but visible to users comparing to OpenRouter pricing.

2. GPT-5.4 "Highest depth of insight" — tied, not solo — the card claims "Highest correctness and depth of insight" but MiMo-V2-Pro also has depth_of_insight: 15.8 (identical). Either say "tied for highest" or swap in a different differentiator for one of the two cards.

3. Qwen3.6 Plus — benchmark provenance not disclosedmodel-data.json has a clear note: "Benchmark run on Qwen3.5-397B (predecessor)" for all EQ data. The card presents sycophancy, EQ score, and warmth as if they're Qwen3.6 Plus measurements. At minimum, a chip like "Data from predecessor" or a tooltip would match the transparency standard set elsewhere on the page.

4. Step 3.5 Flash "30-60x" cost claim — At $0.15/M, 30x is $4.50 (Gemini, EQ 68.95 — yes, beaten). But 60x is $9.00 — only Opus is priced above that, and Opus EQ is 71.85 which Step doesn't beat. The upper bound of the range isn't supported by data in the table.


Design / UX

5. insight-tag--top (green badge) covers three semantically different categories — "Highest EQ", "Sharpest Social Reader", and "Safety First" all share the same green pill. The value/warmth/neutral variants are meaningfully differentiated, but the overloaded --top badge may confuse readers about what "top" means. Consider a --safety or --social variant for the latter two, or relabel the badge text to be more self-explanatory.

6. AOS removed from hero but still active on personality section — intentional and fine, but the personality data-aos attributes include a data-aos-delay="100" on the grid that fires on scroll. On fast connections the section may already be in view on load; the delay can cause a jarring pop-in. Consider removing the delay or the AOS attribute from the grid entirely since the cards themselves don't have individual delays.


Maintainability

7. AGENTS.md step 6 is good — the maintenance doc is clear. One addition worth making: a comment block in the HTML above the personality grid noting which JSON fields map to which chip values (e.g. <!-- v3_traits.depth_of_insight -->) would make it much easier to keep cards in sync when EQ-Bench reruns without needing to cross-reference the schema doc.


What's working well

  • Hero compression achieves the stated goal cleanly — no layout regressions visible in the diff
  • CSS component naming is consistent with the existing Organic Flow design system
  • Footer attribution with rel="noopener noreferrer" on all links is correct
  • Warmth bug fix (13.6) verified against JSON
  • "Our editorial reads" framing sets appropriate expectations — good judgment call
  • Trait chip color logic (green = good, amber = negative, muted = neutral) is coherent and explained in the section intro

Summary

Items 1–4 are data accuracy issues worth fixing before merge; items 5–7 are lower priority polish. The hardest one to leave unaddressed is #3 (Qwen predecessor data) since it affects the integrity of the editorial framing the PR explicitly invested in.

🤖 Generated with Claude Code

@TechNickAI TechNickAI merged commit 3ee7bc4 into main Apr 7, 2026
2 checks passed
@TechNickAI TechNickAI deleted the model-personalities branch April 7, 2026 13:41
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 7004d18. Configure here.

>EQ-Bench v3</a
>
personality profiles. Trait scores are 0–20; for traits like sycophancy,
green means <em>less</em> of it.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personality intro has incomplete sentence fragment

Medium Severity

The sentence "Our editorial reads, grounded in 22-dimension EQ-Bench v3 personality profiles." in the "Model Personalities" introductory text is a fragment. It lacks a main verb, making the user-facing description incomplete.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 7004d18. Configure here.


<p class="text-xs text-of-muted text-center mt-6">
All trait scores are 0–20 from EQ-Bench v3.
</p>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Footer "all 0–20" claim contradicts non-trait chip values

Medium Severity

In the "Model Personalities" section, the footnote claims "All trait scores are 0–20 from EQ-Bench v3." This is inaccurate because some displayed trait values, such as EQ 69.25, EQ 60.45, and Elo 856, are on different scales, and "Traits pending" is not a score, which could confuse readers.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 7004d18. Configure here.

<p class="insight-read">
Decent v3 score (68.55) but the lowest Elo ranking (856) by far — humans
don't enjoy chatting with it. Strong subtext reading, but something gets
lost in delivery.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grok editorial conflates EQ-Bench Elo with human chat preference

High Severity

The "Model Personalities" section contains editorial claims that misinterpret benchmark data. The Grok 4.20 card incorrectly states humans dislike it based on its EQ-Bench Elo, when its Arena Elo indicates strong human preference. Similarly, the Claude Opus 4.6 card inaccurately claims it leads in emotional reasoning, despite ranking 7th among models.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 7004d18. Configure here.

TechNickAI pushed a commit that referenced this pull request Apr 9, 2026
Fixed 8 issues flagged by Claude Code Review and Cursor Bugbot:

**Data Accuracy:**
- GPT-5.4 cost: $5.62 → $5.63 (correct rounding of 5.625)
- GPT-5.4 insight claim: "Highest" → "Tied for highest" (shares 15.8 with MiMo-V2-Pro)
- Qwen3.6 Plus: Add disclosure that EQ data from predecessor model
- Step 3.5 Flash cost claim: "30-60x" → "10-40x" (upper bound now supported by data)

**Editorial Accuracy:**
- Grok 4.20: Clarify EQ-Bench Elo (856) vs Arena Elo (1491, rank 4) - low EQ-Bench Elo doesn't mean humans dislike chatting with it
- Opus 4.6: Remove false claim about leading on emotional reasoning (ranks 7th in that trait)

**Content Quality:**
- Fix incomplete sentence fragment: "Our editorial reads, grounded" → "Our editorial reads are grounded"
- Clarify footer: "All trait scores are 0–20" → distinguish individual traits (0–20) from EQ scores (0–100) and Elo rankings

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant