Skip to content

feat(scoring): hybrid v2.0.0 — absolute log index + within-era tier, CPU/GPU/SoC scored#37

Merged
Seungpyo1007 merged 1 commit into
mainfrom
feat/scoring-v2
Jun 24, 2026
Merged

feat(scoring): hybrid v2.0.0 — absolute log index + within-era tier, CPU/GPU/SoC scored#37
Seungpyo1007 merged 1 commit into
mainfrom
feat/scoring-v2

Conversation

@Seungpyo1007

Copy link
Copy Markdown
Member

Replaces the Phase-0 placeholder scoring (hand-picked 2025-flagship fixed bounds, smartphones only) with a benchmark-based hybrid model across smartphones + CPUs + GPUs + SoCs.

What

  • app/services/scoring/ package (common/config/stats/phones/cpu/gpu/soc/calibrate). Each compute axis exposes:
    • an absolute index (0-100), log-calibrated against pinned dataset p01-p99 reference scales in config/scoring.yaml → comparable across all eras (1996 part low, 2026 flagship high);
    • a within-generation relative percentile + letter tier (S-F) from per-era cohorts (DatasetStats, process-cached).
  • Benchmark-only perf: priority chains (e.g. cinebench_r23 → geekbench → passmark → legacy); no benchmark → null (never 0). CPU/GPU/SoC overall is null with no benchmark. Phone camera/battery/display stay spec-derived (no benchmark exists).
  • Provenance: each index carries the source benchmark name (raw values still hidden, ADR-006).
  • New /v1/{cpus,gpus,socs}/{slug}/score + score embedded in details; dump emits score files for all four categories + a scored manifest count.
  • algorithm_version 1.0.0 → 2.0.0; weights/scales/chains/eras/tiers in config/scoring.yaml (pyyaml). ADR-012.

Verify

Sanity (seeded DB): core-i9-14900k 80.6 (single S / multi B), RTX 5090 100 (S), Snapdragon 8 Elite 96.7 (cpu A / system A), an old budget phone 5.8 with null perf (benchmark gate). Scoring unit + integration tests added; ruff + mypy strict green; 201 passed locally (20 pre-existing Windows-only tmp-permission errors unrelated to this change).

Refs #1

…CPU/GPU/SoC scored

Replaces the Phase-0 placeholder (hand-picked 2025-flagship fixed bounds, smartphones
only) with a benchmark-based hybrid model across smartphones + CPUs + GPUs + SoCs.

- `app/services/scoring/` package (common/config/stats/phones/cpu/gpu/soc/calibrate).
  Each compute axis exposes an absolute capability index (0-100, log-calibrated against
  pinned dataset p01-p99 reference scales in config/scoring.yaml) AND a within-generation
  relative percentile + letter tier (S-F) computed from per-era cohorts (DatasetStats).
- Benchmark-only: performance/compute axes use real benchmarks via priority chains
  (e.g. cinebench_r23 -> geekbench -> passmark -> legacy); no benchmark -> null (never 0).
  Phone camera/battery/display stay spec-derived (no benchmark exists for them).
- Provenance: each index carries the source benchmark NAME (raw values still hidden, ADR-006).
- New `/v1/{cpus,gpus,socs}/{slug}/score` endpoints + `score` embedded in details; dump
  emits score files for all four categories + a `scored` manifest count.
- algorithm_version 1.0.0 -> 2.0.0; weights/scales/chains/eras/tiers in config/scoring.yaml
  (pyyaml dep). ADR-012. Scoring unit + integration tests; ruff/mypy strict green.

Refs #1
@Seungpyo1007 Seungpyo1007 merged commit 3c3627c into main Jun 24, 2026
1 check passed
@Seungpyo1007 Seungpyo1007 deleted the feat/scoring-v2 branch June 24, 2026 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant