feat(scoring): hybrid v2.0.0 — absolute log index + within-era tier, CPU/GPU/SoC scored#37
Merged
Merged
Conversation
…CPU/GPU/SoC scored
Replaces the Phase-0 placeholder (hand-picked 2025-flagship fixed bounds, smartphones
only) with a benchmark-based hybrid model across smartphones + CPUs + GPUs + SoCs.
- `app/services/scoring/` package (common/config/stats/phones/cpu/gpu/soc/calibrate).
Each compute axis exposes an absolute capability index (0-100, log-calibrated against
pinned dataset p01-p99 reference scales in config/scoring.yaml) AND a within-generation
relative percentile + letter tier (S-F) computed from per-era cohorts (DatasetStats).
- Benchmark-only: performance/compute axes use real benchmarks via priority chains
(e.g. cinebench_r23 -> geekbench -> passmark -> legacy); no benchmark -> null (never 0).
Phone camera/battery/display stay spec-derived (no benchmark exists for them).
- Provenance: each index carries the source benchmark NAME (raw values still hidden, ADR-006).
- New `/v1/{cpus,gpus,socs}/{slug}/score` endpoints + `score` embedded in details; dump
emits score files for all four categories + a `scored` manifest count.
- algorithm_version 1.0.0 -> 2.0.0; weights/scales/chains/eras/tiers in config/scoring.yaml
(pyyaml dep). ADR-012. Scoring unit + integration tests; ruff/mypy strict green.
Refs #1
5f824ba to
3ff2cd0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replaces the Phase-0 placeholder scoring (hand-picked 2025-flagship fixed bounds, smartphones only) with a benchmark-based hybrid model across smartphones + CPUs + GPUs + SoCs.
What
app/services/scoring/package (common/config/stats/phones/cpu/gpu/soc/calibrate). Each compute axis exposes:config/scoring.yaml→ comparable across all eras (1996 part low, 2026 flagship high);DatasetStats, process-cached).cinebench_r23 → geekbench → passmark → legacy); no benchmark →null(never 0). CPU/GPU/SoC overall isnullwith no benchmark. Phone camera/battery/display stay spec-derived (no benchmark exists)./v1/{cpus,gpus,socs}/{slug}/score+scoreembedded in details; dump emits score files for all four categories + ascoredmanifest count.algorithm_version1.0.0 → 2.0.0; weights/scales/chains/eras/tiers inconfig/scoring.yaml(pyyaml). ADR-012.Verify
Sanity (seeded DB): core-i9-14900k 80.6 (single S / multi B), RTX 5090 100 (S), Snapdragon 8 Elite 96.7 (cpu A / system A), an old budget phone 5.8 with null perf (benchmark gate). Scoring unit + integration tests added; ruff + mypy strict green; 201 passed locally (20 pre-existing Windows-only tmp-permission errors unrelated to this change).
Refs #1