Add Model Personalities section and compress hero by TechNickAI · Pull Request #15 · TechNickAI/heartcentered.ai

TechNickAI · 2026-04-07T13:30:34Z

Summary

Compressed hero from ~400px to ~150px — data table now visible above the fold on first load
Added Model Personalities section with 9 curated editorial insight cards between the table and methodology, grounded in EQ-Bench v3 trait data
Relocated data source attribution to footer (removed from hero but preserved with links)
Added insight card CSS with 5 tag variants and 3 trait chip variants matching the Organic Flow design system
Updated AGENTS.md with maintenance docs for the personality section

Review process

6 parallel review agents (logic, UX/empathy, coding, security, design, architecture) identified and fixed:

Data bug: Opus warmth was 12.0, corrected to 13.6 from model-data.json
False claim: Sonnet "lowest sycophancy of any Claude" was wrong (Haiku is lower at 2.9)
Trust calibration: Softened caution tags (Grok → "The Enigma" neutral, Qwen → "The People-Pleaser" neutral)
Editorial framing: Added "Our editorial reads" to personality intro for transparency
Methodology contrast: Restored bg-of-cream so white cards remain visible
Tag differentiation: Deepened --top green to distinguish from --value sage

Test plan

Verify data table visible above fold on desktop
Check personality cards render correctly on mobile (1-col stack)
Hover EQ cells to confirm tooltips still work
Verify footer shows all 5 data source links
Check methodology section cards have visible borders on cream background

🤖 Generated with Claude Code

Compress hero from ~400px to ~150px so the benchmark table is visible on first load. Add editorial "Model Personalities" section with 9 curated insight cards based on EQ-Bench v3 trait data, placed between the data table and methodology. Cards highlight personality profiles (Highest EQ, Warmest Flagship, Most Humanlike, etc.) with trait chips grounded in the 22-dimension EQ-Bench data. Reviewed by 6 parallel agents (logic, UX/empathy, coding, security, design, architecture) — fixed data accuracy bugs (Opus warmth, Sonnet sycophancy claim), softened editorial tone on caution tags, restored data source attribution to footer, improved trait footnote clarity, and added maintenance docs to AGENTS.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude · 2026-04-07T13:32:29Z

PR Review — Add Model Personalities section and compress hero

Overall this is solid work. The hero compression is clean, the card component system is well-structured, and the editorial framing is appropriately humble ("Our editorial reads"). A few issues to address:

Bugs / Issues

1. Dead CSS — insight-tag--caution is defined but never used

styles.css defines .insight-tag--caution but no card in the HTML uses it. The prior design had caution tags; they were softened to --neutral per the PR description. The dead rule should be removed.

/* Remove this — never referenced in HTML */
.insight-tag--caution {
    background: rgba(180, 83, 9, 0.1);
    color: #92400e;
}

2. Step 3.5 Flash trait chips mix data types

The trait chips for this card show $0.15/M and 85 t/s — these are pricing and speed metadata, not EQ traits. Every other card uses trait chips for EQ-Bench dimensions. This card lacks full trait breakdown (acknowledged in the copy), but using pricing/speed as chips is inconsistent and will read oddly when users hover or compare. Consider either showing just EQ 69.25 with a note chip like Full traits pending, or using the neutral chip styling with a label that makes the data type clear.

UX / Content Concerns

3. EQ explanation lost above the fold

The hero previously contained: "Emotional intelligence shapes how AI listens, responds to vulnerability, and holds space..." This was removed to compress the hero. First-time visitors now land on a data table with no explanation of what EQ-Bench measures or why it matters before seeing the personalities section. The footnote at the bottom of the personalities section partially covers this, but the motivating context is gone from above the fold. Consider preserving a one-line version in the hero tagline.

4. Sycophancy chip coloring may confuse without footnote context

Green chips labeled Sycophancy 3.6 or Sycophancy 2.7 read as "high sycophancy is good" without the footnote. The footnote ("lower is better — green means less of it") is below the grid and easy to miss. A small ↓ better indicator inline, or a (lower is better) note in the section intro paragraph, would catch users who don't reach the footnote.

5. "Safety First" tag uses --neutral styling for a clearly positive framing

GPT-5.4 Mini's tag reads "Safety First" — the copy is enthusiastic ("firm, principled companion — not a people-pleaser") — but the tag renders in the muted neutral style. This is the only card where the tag sentiment and tag color don't match. Either use --top styling or adjust the editorial copy to match the neutral tone.

Minor

6. Footer attribution links missing underline class

The hero previously had class="underline hover:..." on data source links. The new footer links use class="hover:..." only — no underline in resting state. Inconsistent with footer link convention elsewhere (the "HeartCentered AI" link uses font-medium but no underline, so this might be intentional). Worth a quick check for visual consistency.

What's Good

Data bug fixes (Opus warmth 12.0 → 13.6, Sonnet sycophancy correction) are well-documented in the PR summary
All external links in footer correctly have target="_blank" rel="noopener noreferrer" ✓
CSS component naming follows existing BEM-like conventions cleanly
"Our editorial reads" framing is the right call for transparency
Footnote explaining inverted scoring ("lower is better — green means less of it") addresses the most likely point of confusion
AGENTS.md update gives future maintainers clear guidance on when to revisit the cards

Priority: Items 1 and 2 are concrete bugs. Items 3–5 are UX judgment calls worth considering before merge. Item 6 is cosmetic.

…ag, sycophancy hint - Remove dead .insight-tag--caution CSS (softened to neutral in PR, never used) - Step 3.5 Flash: simplify trait chips to EQ 69.25 + "Traits pending" - GPT-5.4 Mini "Safety First" tag: neutral → top (positive framing deserves positive color) - Move sycophancy inversion hint into section intro (before chip grid, not just footnote) - Trim redundant footnote now that intro covers it Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

TechNickAI · 2026-04-07T13:35:42Z

Fixed (7004d18):

Add npm package info across site #1: Removed dead .insight-tag--caution CSS
Add Claude Code GitHub Workflow #2: Step Flash chips → EQ 69.25 + Traits pending
✨ Implement new logo and create brand page #4: Sycophancy inversion hint moved into section intro paragraph, footnote trimmed
🔍 Comprehensive SEO improvements #5: GPT-5.4 Mini tag --neutral → --top (positive framing deserved positive color)

Declined:

✨ Amplify the light: Add urgency and hope without the doom #3 (EQ context above fold): The hero compression was intentional; the personality section intro ("Traits tell you who it is") and the EQ column tooltip provide sufficient context. Restoring a paragraph defeats the above-fold goal.
💅 Design polish improvements #6 (footer link underlines): Footer links on this page conventionally use hover-only styling — the HeartCentered AI link and JSON/llms.txt links have no resting underline either.

claude · 2026-04-07T13:37:57Z

PR Review — Add Model Personalities section and compress hero

Overall this is a solid, well-executed PR. The hero compression is clearly the right call for UX, and the personality cards are a compelling editorial layer on top of the raw benchmark data. The six-agent review pass caught real issues (the Opus warmth bug, the sycophancy false claim). Here's what I found on a second pass:

Data accuracy

Cross-checked all trait values against model-data.json. Almost everything checks out. Two issues:

1. GPT-5.4 cost rounding — blended price in JSON is 5.625, card shows $5.62. Should round to $5.63. Minor but visible to users comparing to OpenRouter pricing.

2. GPT-5.4 "Highest depth of insight" — tied, not solo — the card claims "Highest correctness and depth of insight" but MiMo-V2-Pro also has depth_of_insight: 15.8 (identical). Either say "tied for highest" or swap in a different differentiator for one of the two cards.

3. Qwen3.6 Plus — benchmark provenance not disclosed — model-data.json has a clear note: "Benchmark run on Qwen3.5-397B (predecessor)" for all EQ data. The card presents sycophancy, EQ score, and warmth as if they're Qwen3.6 Plus measurements. At minimum, a chip like "Data from predecessor" or a tooltip would match the transparency standard set elsewhere on the page.

4. Step 3.5 Flash "30-60x" cost claim — At $0.15/M, 30x is $4.50 (Gemini, EQ 68.95 — yes, beaten). But 60x is $9.00 — only Opus is priced above that, and Opus EQ is 71.85 which Step doesn't beat. The upper bound of the range isn't supported by data in the table.

Design / UX

5. insight-tag--top (green badge) covers three semantically different categories — "Highest EQ", "Sharpest Social Reader", and "Safety First" all share the same green pill. The value/warmth/neutral variants are meaningfully differentiated, but the overloaded --top badge may confuse readers about what "top" means. Consider a --safety or --social variant for the latter two, or relabel the badge text to be more self-explanatory.

6. AOS removed from hero but still active on personality section — intentional and fine, but the personality data-aos attributes include a data-aos-delay="100" on the grid that fires on scroll. On fast connections the section may already be in view on load; the delay can cause a jarring pop-in. Consider removing the delay or the AOS attribute from the grid entirely since the cards themselves don't have individual delays.

Maintainability

7. AGENTS.md step 6 is good — the maintenance doc is clear. One addition worth making: a comment block in the HTML above the personality grid noting which JSON fields map to which chip values (e.g. ) would make it much easier to keep cards in sync when EQ-Bench reruns without needing to cross-reference the schema doc.

What's working well

Hero compression achieves the stated goal cleanly — no layout regressions visible in the diff
CSS component naming is consistent with the existing Organic Flow design system
Footer attribution with rel="noopener noreferrer" on all links is correct
Warmth bug fix (13.6) verified against JSON
"Our editorial reads" framing sets appropriate expectations — good judgment call
Trait chip color logic (green = good, amber = negative, muted = neutral) is coherent and explained in the section intro

Summary

Items 1–4 are data accuracy issues worth fixing before merge; items 5–7 are lower priority polish. The hardest one to leave unaddressed is #3 (Qwen predecessor data) since it affects the integrity of the editorial framing the PR explicitly invested in.

🤖 Generated with Claude Code

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 7004d18. Configure here.}

cursor · 2026-04-07T13:44:47Z

+              >EQ-Bench v3</a
+            >
+            personality profiles. Trait scores are 0–20; for traits like sycophancy,
+            green means <em>less</em> of it.


Personality intro has incomplete sentence fragment

Medium Severity

The sentence "Our editorial reads, grounded in 22-dimension EQ-Bench v3 personality profiles." in the "Model Personalities" introductory text is a fragment. It lacks a main verb, making the user-facing description incomplete.

^{Reviewed by Cursor Bugbot for commit 7004d18. Configure here.}

cursor · 2026-04-07T13:44:47Z

+
+          <p class="text-xs text-of-muted text-center mt-6">
+            All trait scores are 0–20 from EQ-Bench v3.
+          </p>


Footer "all 0–20" claim contradicts non-trait chip values

Medium Severity

In the "Model Personalities" section, the footnote claims "All trait scores are 0–20 from EQ-Bench v3." This is inaccurate because some displayed trait values, such as EQ 69.25, EQ 60.45, and Elo 856, are on different scales, and "Traits pending" is not a score, which could confuse readers.

^{Reviewed by Cursor Bugbot for commit 7004d18. Configure here.}

cursor · 2026-04-07T13:44:47Z

+              <p class="insight-read">
+                Decent v3 score (68.55) but the lowest Elo ranking (856) by far — humans
+                don't enjoy chatting with it. Strong subtext reading, but something gets
+                lost in delivery.


Grok editorial conflates EQ-Bench Elo with human chat preference

High Severity

The "Model Personalities" section contains editorial claims that misinterpret benchmark data. The Grok 4.20 card incorrectly states humans dislike it based on its EQ-Bench Elo, when its Arena Elo indicates strong human preference. Similarly, the Claude Opus 4.6 card inaccurately claims it leads in emotional reasoning, despite ranking 7th among models.

Additional Locations (1)

model-benchmarks/index.html#L469-L471

^{Reviewed by Cursor Bugbot for commit 7004d18. Configure here.}

Fixed 8 issues flagged by Claude Code Review and Cursor Bugbot: **Data Accuracy:** - GPT-5.4 cost: $5.62 → $5.63 (correct rounding of 5.625) - GPT-5.4 insight claim: "Highest" → "Tied for highest" (shares 15.8 with MiMo-V2-Pro) - Qwen3.6 Plus: Add disclosure that EQ data from predecessor model - Step 3.5 Flash cost claim: "30-60x" → "10-40x" (upper bound now supported by data) **Editorial Accuracy:** - Grok 4.20: Clarify EQ-Bench Elo (856) vs Arena Elo (1491, rank 4) - low EQ-Bench Elo doesn't mean humans dislike chatting with it - Opus 4.6: Remove false claim about leading on emotional reasoning (ranks 7th in that trait) **Content Quality:** - Fix incomplete sentence fragment: "Our editorial reads, grounded" → "Our editorial reads are grounded" - Clarify footer: "All trait scores are 0–20" → distinguish individual traits (0–20) from EQ scores (0–100) and Elo rankings Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

TechNickAI merged commit 3ee7bc4 into main Apr 7, 2026
2 checks passed

TechNickAI deleted the model-personalities branch April 7, 2026 13:41

cursor Bot reviewed Apr 7, 2026

View reviewed changes

TechNickAI mentioned this pull request Apr 9, 2026

Address bot review feedback from PR #15 #16

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Model Personalities section and compress hero#15

Add Model Personalities section and compress hero#15
TechNickAI merged 2 commits intomainfrom
model-personalities

TechNickAI commented Apr 7, 2026

Uh oh!

claude Bot commented Apr 7, 2026

Uh oh!

TechNickAI commented Apr 7, 2026

Uh oh!

claude Bot commented Apr 7, 2026

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Apr 7, 2026

Uh oh!

cursor Bot Apr 7, 2026

Uh oh!

cursor Bot Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

TechNickAI commented Apr 7, 2026

Summary

Review process

Test plan

Uh oh!

claude Bot commented Apr 7, 2026

PR Review — Add Model Personalities section and compress hero

Bugs / Issues

UX / Content Concerns

Minor

What's Good

Uh oh!

TechNickAI commented Apr 7, 2026

Uh oh!

claude Bot commented Apr 7, 2026

PR Review — Add Model Personalities section and compress hero

Data accuracy

Design / UX

Maintainability

What's working well

Summary

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 7, 2026

Choose a reason for hiding this comment

Personality intro has incomplete sentence fragment

Uh oh!

cursor Bot Apr 7, 2026

Choose a reason for hiding this comment

Footer "all 0–20" claim contradicts non-trait chip values

Uh oh!

cursor Bot Apr 7, 2026

Choose a reason for hiding this comment

Grok editorial conflates EQ-Bench Elo with human chat preference

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant