Skip to content

data(cpu): import Kaggle unverified specs#16

Merged
Seungpyo1007 merged 1 commit into
mainfrom
data/import-staging
Jun 17, 2026
Merged

data(cpu): import Kaggle unverified specs#16
Seungpyo1007 merged 1 commit into
mainfrom
data/import-staging

Conversation

@Seungpyo1007

Copy link
Copy Markdown
Member

Summary

  • Import Kaggle CPU specification records as unverified seed data.
  • Keep benchmark fields null because the Kaggle rank column is an ordering/rank, not a benchmark score.
  • Exclude rows with unknown manufacturer/category or names that duplicate existing curated records.
  • Remove 5 duplicate GPU records introduced by the previous Kaggle GPU import so strict integrity validation can pass.
  • Regenerate the static public dump under site/public.

Source

Validation

  • python -m app.validate passed
  • python ../TechEngine/integrity_check.py data --strict passed

Notes

  • All imported CPU records use verified: false.
  • This PR uses the reusable data/import-staging branch for future data imports.

@TechEngineBot

TechEngineBot commented Jun 17, 2026

Copy link
Copy Markdown
Member

TechEngine change review: PASS

Check Result
python -m app.validate PASS
python integrity_check.py TechAPI/data --strict PASS

Changed data

Category Added Modified Deleted Added verified Added unverified Added Kaggle-sourced
brand 0 0 0 0 0 0
soc 0 0 0 0 0 0
smartphone 0 0 0 0 0 0
gpu 0 0 5 0 0 0
cpu 3001 0 0 0 3001 3001

Changed record examples

gpu deleted

  • gpu/nvidia/2014/enterprise/tesla-k80.json - Tesla K80
  • gpu/nvidia/2016/enterprise/tesla-p4.json - Tesla P4
  • gpu/nvidia/2016/enterprise/tesla-p40.json - Tesla P40
  • gpu/nvidia/2018/enterprise/tesla-t4.json - Tesla T4
  • gpu/nvidia/2022/consumer/rtx-6000-ada-generation.json - RTX 6000 Ada Generation

cpu added

  • cpu/amd/2006/enterprise/amd-opteron-885.json - AMD Opteron 885
  • cpu/amd/2008/consumer/amd-athlon-64-3700-plus.json - AMD Athlon 64 3700+
  • cpu/amd/2008/consumer/amd-athlon-64-x2-dual-core-3600-plus.json - AMD Athlon 64 X2 Dual Core 3600+
  • cpu/amd/2008/consumer/amd-athlon-64-x2-dual-core-3800-plus.json - AMD Athlon 64 X2 Dual Core 3800+
  • cpu/amd/2008/consumer/amd-athlon-64-x2-dual-core-4000-plus.json - AMD Athlon 64 X2 Dual Core 4000+
  • cpu/amd/2008/consumer/amd-athlon-64-x2-dual-core-4200-plus.json - AMD Athlon 64 X2 Dual Core 4200+
  • cpu/amd/2008/consumer/amd-athlon-64-x2-dual-core-4400-plus.json - AMD Athlon 64 X2 Dual Core 4400+
  • cpu/amd/2008/consumer/amd-athlon-64-x2-dual-core-4600-plus.json - AMD Athlon 64 X2 Dual Core 4600+
  • cpu/amd/2008/consumer/amd-athlon-64-x2-dual-core-4800-plus.json - AMD Athlon 64 X2 Dual Core 4800+
  • cpu/amd/2008/consumer/amd-athlon-64-x2-dual-core-5000-plus.json - AMD Athlon 64 X2 Dual Core 5000+
  • cpu/amd/2008/consumer/amd-athlon-64-x2-dual-core-5200-plus.json - AMD Athlon 64 X2 Dual Core 5200+
  • cpu/amd/2008/consumer/amd-athlon-64-x2-dual-core-5400-plus.json - AMD Athlon 64 X2 Dual Core 5400+
  • cpu/amd/2008/consumer/amd-athlon-64-x2-dual-core-5600-plus.json - AMD Athlon 64 X2 Dual Core 5600+
  • cpu/amd/2008/consumer/amd-athlon-64-x2-dual-core-6000-plus.json - AMD Athlon 64 X2 Dual Core 6000+
  • cpu/amd/2008/consumer/amd-athlon-64-x2-dual-core-6400-plus.json - AMD Athlon 64 X2 Dual Core 6400+
  • ... 2986 more

Heuristic review

  • Added records by manufacturer/brand: intel: 2085, amd: 902, via: 14
  • Added records by source class: kaggle: 3001
  • Heuristic warnings: none found.

@TechEngineBot

Copy link
Copy Markdown
Member

TechEngine validation stats: PASS

Data summary

Category Total Verified Unverified Missing verified Verified %
brand 129 0 0 129 n/a
soc 123 58 65 0 47.2%
smartphone 367 184 183 0 50.1%
gpu 1972 0 1972 0 0.0%
cpu 3977 976 3001 0 24.5%
all 6568 1218 5221 129 18.9%

Validation notes

  • Full advisory outlier listings are suppressed on successful runs because they are dataset-wide and mostly stable between PRs.
  • Failure runs still include a detailed log excerpt for debugging.

Key output:

## app.validate
## integrity_check.py --strict
loaded CPU=3977 GPU=1972
✅ integrity gate: no hard anomalies.
Integrity section Flagged lines
structural 0
CPU name/tier consistency (desktop mainstream only) 0
CPU single>multi (cinebench/geekbench — should be multi>=single) 0
CPU era-vs-score outliers 8
CPU cross-source ratio outliers (possible wrong-variant) 152
GPU cross-source ratio outliers + sanity 18

@Seungpyo1007 Seungpyo1007 added the enhancement New feature or request label Jun 17, 2026
@Seungpyo1007 Seungpyo1007 moved this from Todo to Done in TechAPI-Project Jun 17, 2026
@Seungpyo1007 Seungpyo1007 moved this from Done to In Progress in TechAPI-Project Jun 17, 2026
@Seungpyo1007 Seungpyo1007 merged commit 928773b into main Jun 17, 2026
3 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in TechAPI-Project Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Massive dataset rebuild: CPU + brand + GPU + smartphone + SoC (1989-2026)

2 participants