fix(confidence-badge): lower threshold 80→75 for actual visibility#169
Merged
Conversation
PR #168 shipped the "Higher confidence" badge at score>=80 (measured 70% precision, 2.5% coverage). The user-perspective failure: 2.5% of 4321 corpus findings = ~1 finding per 40-finding draft = badge effectively invisible on real-world drafts. Lowered threshold to 75 (measured 69% precision, 7.2% coverage): - Precision: 1pp below the 70% bar, disclosed honestly in title attr - Coverage: 3× wider → ~1-3 findings/typical draft visible - vs 38% baseline absolute precision: still a meaningful lift Updated i18n title attribute strings across all 6 locales to reflect the actual measured precision (~69%, baseline 38%). "Higher confidence" badge text unchanged — language is still honest about ranking-vs- guarantee semantics. No walker behavior change. Pure UX threshold + i18n string update. Tests: pytest 2704 passed, 11 skipped. Frontend build clean.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
PR #168 shipped the badge at
score >= 80. Measured 70% precision but only 2.5% coverage (106/4321 findings). User test on real drafts: badge effectively invisible (1 finding/40-finding draft typical).Fix
Lower threshold to 75: measured 69% precision, 7.2% coverage (3× wider).
Trade-off: 1pp precision below the original 70% target. Honest in the title attribute (updated across 6 locales to disclose actual measurement + baseline). Still 31pp above the 38% absolute baseline.
No walker change
Pure threshold + i18n string update. Walker scoring formula unchanged.
Tests
pytest -q→ 2704 passed, 11 skipped.