Skip to content

data(mobile): import Kaggle smartphone and SoC seeds#22

Merged
Seungpyo1007 merged 22 commits into
mainfrom
data/import-staging
Jun 18, 2026
Merged

data(mobile): import Kaggle smartphone and SoC seeds#22
Seungpyo1007 merged 22 commits into
mainfrom
data/import-staging

Conversation

@Seungpyo1007

@Seungpyo1007 Seungpyo1007 commented Jun 18, 2026

Copy link
Copy Markdown
Member

Summary

  • import Kaggle smartphone seed records across Honor, OPPO, vivo, OnePlus, Infinix, Tecno, Motorola, Lenovo, and related brands
  • import Kaggle mobile SoC benchmark seed records for MediaTek, Qualcomm, Huawei/Kirin, Samsung/Exynos, and Unisoc
  • keep all newly imported records verified: false
  • exclude tablet-like rows from the smartphone catalog before publishing
  • regenerate the public mobile/SoC dump so /v1/socs and /v1/smartphones reflect the new records

Dataset impact

  • SoCs: 123 -> 195 (+72)
  • Smartphones: 367 -> 734 (+367)
  • Total public records: 6,626 -> 7,065 (+439)

Sources

  • Kaggle abdulmalik1518/mobiles-dataset-2025 (Apache-2.0)
  • Kaggle devgondaliya007/smartphone-specifications-dataset (Apache-2.0)
  • Kaggle alanjo/smartphone-processors-ranking (CC0-1.0)
  • Excluded sady36/mobile-phones-specs because the downloaded metadata did not expose a clear license

Commit structure

  • SoC imports split by manufacturer group
  • Smartphone imports split by brand/group
  • SoC process-node service compatibility fix split separately
  • Public dump regeneration split separately

Verification

  • python -m app.validate
  • python TechEngine\integrity_check.py data --strict
  • git diff --check
  • cd site && npm.cmd run build

Closes #1

@github-actions github-actions Bot added data Dataset changes enhancement New feature or request labels Jun 18, 2026
@Seungpyo1007 Seungpyo1007 moved this from Todo to In Progress in TechAPI-Project Jun 18, 2026
@TechEngineBot

TechEngineBot commented Jun 18, 2026

Copy link
Copy Markdown
Member

TechEngine change review: PASS

Check Result
python -m app.validate PASS
python integrity_check.py TechAPI/data --strict PASS

Changed data

Category Added Modified Deleted Added verified Added unverified Added Kaggle-sourced
brand 0 0 0 0 0 0
soc 72 0 0 0 72 72
smartphone 367 0 0 0 367 367
gpu 0 0 0 0 0 0
cpu 0 0 0 0 0 0

Changed record examples

soc added

  • soc/huawei/2015/kirin-950.json - Kirin 950
  • soc/huawei/2016/kirin-955.json - Kirin 955
  • soc/huawei/2016/kirin-960.json - Kirin 960
  • soc/huawei/2017/kirin-659.json - Kirin 659
  • soc/huawei/2018/kirin-710.json - Kirin 710
  • soc/huawei/2019/kirin-710f.json - Kirin 710F
  • soc/huawei/2019/kirin-810.json - Kirin 810
  • soc/huawei/2019/kirin-990-4g.json - Kirin 990 (4G)
  • soc/huawei/2020/kirin-710a.json - Kirin 710A
  • soc/huawei/2020/kirin-820.json - Kirin 820
  • soc/huawei/2020/kirin-9000e.json - Kirin 9000E
  • soc/huawei/2020/kirin-985.json - Kirin 985
  • soc/mediatek/2017/helio-p23.json - Helio P23
  • soc/mediatek/2018/helio-p35.json - Helio P35
  • soc/mediatek/2018/helio-p60.json - Helio P60
  • ... 57 more

smartphone added

  • smartphone/google/2019/pixel-3a-xl.json - Pixel 3a XL
  • smartphone/google/2019/pixel-3a.json - Pixel 3a
  • smartphone/honor/2020/10x-lite.json - 10X Lite
  • smartphone/honor/2020/30-pro.json - 30 Pro
  • smartphone/honor/2020/9x-lite.json - 9X Lite
  • smartphone/honor/2020/play-4-pro.json - Play 4 Pro
  • smartphone/honor/2020/play-4.json - Play 4
  • smartphone/honor/2020/x10-max.json - X10 Max
  • smartphone/honor/2021/50-pro.json - 50 Pro
  • smartphone/honor/2021/50-se.json - 50 SE
  • smartphone/honor/2021/50.json - 50
  • smartphone/honor/2021/60-pro.json - 60 Pro
  • smartphone/honor/2021/60-se.json - 60 SE
  • smartphone/honor/2021/60.json - 60
  • smartphone/honor/2021/magic3-pro.json - Magic3 Pro
  • ... 352 more

Heuristic review

  • Added records by manufacturer/brand: honor: 51, oppo: 51, infinix: 41, vivo: 40, oneplus: 33, mediatek: 29, tecno: 29, poco: 28
  • Added records by source class: kaggle: 439
  • Heuristic warnings: none found.

@TechEngineBot

TechEngineBot commented Jun 18, 2026

Copy link
Copy Markdown
Member

TechEngine validation stats: PASS

Data summary

Category Total Verified Unverified Missing verified Verified %
brand 129 0 0 129 n/a
soc 195 58 137 0 29.7%
smartphone 734 184 550 0 25.1%
gpu 2030 0 2030 0 0.0%
cpu 3977 976 3001 0 24.5%
all 7065 1218 5718 129 17.6%

Warning

Verified coverage is below 50% for gpu 0.0% (0/2030), all 17.6% (1218/6936), cpu 24.5% (976/3977), smartphone 25.1% (184/734), soc 29.7% (58/195).
This does not fail validation. Keep imported records verified: false until manual audit, but treat this as follow-up verification work before relying on the affected categories as curated data.

Validation notes

  • Full advisory outlier listings are suppressed on successful runs because they are dataset-wide and mostly stable between PRs.
  • Failure runs still include a detailed log excerpt for debugging.

Key output:

## app.validate
## integrity_check.py --strict
loaded CPU=3977 GPU=2030
✅ integrity gate: no hard anomalies.
Integrity section Flagged lines
structural 0
CPU name/tier consistency (desktop mainstream only) 0
CPU single>multi (cinebench/geekbench — should be multi>=single) 0
CPU era-vs-score outliers 8
CPU cross-source ratio outliers (possible wrong-variant) 152
GPU cross-source ratio outliers + sanity 18

@Seungpyo1007 Seungpyo1007 force-pushed the data/import-staging branch from 5f371df to 599c33e Compare June 18, 2026 07:37
@Seungpyo1007 Seungpyo1007 merged commit 1cb9b40 into main Jun 18, 2026
4 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in TechAPI-Project Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Dataset changes enhancement New feature or request

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Massive dataset rebuild: CPU + brand + GPU + smartphone + SoC (1989-2026)

2 participants