Skip to content

data(smartphone): import 5000 PhoneDB raw variants#25

Merged
Seungpyo1007 merged 11 commits into
mainfrom
data/import-staging
Jun 19, 2026
Merged

data(smartphone): import 5000 PhoneDB raw variants#25
Seungpyo1007 merged 11 commits into
mainfrom
data/import-staging

Conversation

@Seungpyo1007

Copy link
Copy Markdown
Member

Summary

  • import 45 curated smartphone seed records from the Mobiles Dataset 2025 Kaggle CSV
  • import 5,000 PhoneDB raw smartphone variant records from the mobile phones specs Kaggle CSV
  • keep all newly imported records as verified: false for later TechEngine/manual audit
  • refresh the published site/public/v1 smartphone indexes and detail JSON

Data sources

Import notes

  • this PR intentionally expands quantity using PhoneDB variant rows, including regional/model-code variants
  • obvious tablet-like rows and validator-out-of-range rows were skipped
  • fractional RAM values from PhoneDB were normalized upward to match TechEngine's integer response schema
  • all imported records remain unverified until manual or TechEngine follow-up review

Verification

  • python -m app.validate PASS
  • python TechEngine\integrity_check.py data --strict PASS
  • cd site && npm.cmd run build PASS
  • git diff --check PASS

Closes #1

@TechEngineBot

TechEngineBot commented Jun 19, 2026

Copy link
Copy Markdown
Member

TechEngine change review: PASS

Check Result
python -m app.validate PASS
python integrity_check.py TechAPI/data --strict PASS

Changed data

Category Added Modified Deleted Added verified Added unverified Added Kaggle-sourced
brand 0 0 0 0 0 0
soc 0 0 0 0 0 0
smartphone 5045 0 0 0 5045 5045
gpu 0 0 0 0 0 0
cpu 0 0 0 0 0 0

Changed record examples

smartphone added

  • smartphone/alcatel/2016/one-touch-pop-4-global-dual-sim-lte-5056d-pop-4-plus.json - One Touch Pop 4+ Global Dual SIM LTE 5056D / Pop 4 Plus
  • smartphone/alcatel/2016/one-touch-pop-4-lte-latam-5056a-pop-4-plus.json - One Touch Pop 4+ LTE LATAM 5056A / Pop 4 Plus
  • smartphone/alcatel/2016/one-touch-pop-4-plus-dual-sim-lte-am-5056e-pop-4.json - One Touch Pop 4 Plus Dual SIM LTE AM 5056E / Pop 4+
  • smartphone/alcatel/2016/one-touch-pop-4-plus-lte-am-5056g-pop-4.json - One Touch Pop 4 Plus LTE AM 5056G / Pop 4+
  • smartphone/alcatel/2018/5v-dual-sim-lte-emea.json - 5V Dual SIM LTE EMEA
  • smartphone/alcatel/2018/7-lte-am-6062w.json - 7 LTE AM 6062W
  • smartphone/alcatel/2019/3-2019-dual-sim-lte-emea-32gb-5053d.json - 3 2019 Dual SIM LTE EMEA 32GB 5053D
  • smartphone/alcatel/2019/3-2019-dual-sim-lte-emea-64gb-5053k.json - 3 2019 Dual SIM LTE EMEA 64GB 5053K
  • smartphone/alcatel/2019/3-2019-lte-emea-32gb-5053y.json - 3 2019 LTE EMEA 32GB 5053Y
  • smartphone/alcatel/2019/3x-2019-dual-sim-lte-emea-128gb-5048u.json - 3x 2019 Dual SIM LTE EMEA 128GB 5048U
  • smartphone/alcatel/2019/3x-2019-dual-sim-lte-emea-5048i.json - 3x 2019 Dual SIM LTE EMEA 5048I
  • smartphone/alcatel/2019/3x-2019-dual-sim-lte-emea-5048y.json - 3x 2019 Dual SIM LTE EMEA 5048Y
  • smartphone/alcatel/2019/3x-2019-lte-am-5048a.json - 3x 2019 LTE AM 5048A
  • smartphone/alcatel/2020/1b-2020-global-dual-sim-lte-5002d.json - 1B 2020 Global Dual SIM LTE 5002D
  • smartphone/alcatel/2020/1b-2020-global-lte-5002x.json - 1B 2020 Global LTE 5002X
  • ... 5030 more

Heuristic review

  • Added records by manufacturer/brand: samsung: 893, oppo: 600, xiaomi: 548, vivo: 492, huawei: 462, motorola: 317, lg: 313, apple: 215

  • Added records by source class: kaggle: 5045

  • Heuristic warnings: 3 total; showing first 3.

    • smartphone: smartphone/fujitsu/2019/raku-raku-me-f-01l-lte.json: repeated adjacent word in name
    • smartphone: smartphone/fujitsu/2022/raku-raku-easy-smartphone-td-lte-jp-f-52b.json: repeated adjacent word in name
    • smartphone: smartphone/sharp/2019/aquos-sense-3-wimax-2-shv45-shv45-u.json: repeated adjacent word in name

@TechEngineBot

TechEngineBot commented Jun 19, 2026

Copy link
Copy Markdown
Member

TechEngine validation stats: PASS

Data summary

Category Total Verified Unverified Missing verified Tracked Verified % of tracked
brand 129 0 0 129 0 n/a
soc 216 58 158 0 216 26.9%
smartphone 6544 184 6360 0 6544 2.8%
gpu 2030 0 2030 0 2030 0.0%
cpu 3977 976 3001 0 3977 24.5%
all 12896 1218 11549 129 12767 9.5%

Warning

Tracked verified coverage is below 50% for gpu 0.0% (0/2030), smartphone 2.8% (184/6544), all 9.5% (1218/12767), cpu 24.5% (976/3977), soc 26.9% (58/216).
Tracked coverage excludes records missing the verified field; see the Missing verified column for those records.
This does not fail validation. Keep imported records verified: false until manual audit, but treat this as follow-up verification work before relying on the affected categories as curated data.

Validation notes

  • Full advisory outlier listings are suppressed on successful runs because they are dataset-wide and mostly stable between PRs.
  • Failure runs still include a detailed log excerpt for debugging.

Key output:

## app.validate
## integrity_check.py --strict
loaded CPU=3977 GPU=2030
✅ integrity gate: no hard anomalies.
Integrity section Flagged lines
structural 0
CPU name/tier consistency (desktop mainstream only) 0
CPU single>multi (cinebench/geekbench — should be multi>=single) 0
CPU era-vs-score outliers 8
CPU cross-source ratio outliers (possible wrong-variant) 152
GPU cross-source ratio outliers + sanity 18

@github-actions github-actions Bot added data Dataset changes enhancement New feature or request labels Jun 19, 2026
@Seungpyo1007 Seungpyo1007 moved this from Todo to In Progress in TechAPI-Project Jun 19, 2026
Add automatic tracking-issue comments for data and site PRs so long-running issues show the current PR context, changed areas, priority, and file counts.

Closes #1
@github-actions github-actions Bot added the ci CI and workflow changes label Jun 19, 2026
@Seungpyo1007 Seungpyo1007 merged commit 7e3ecd8 into main Jun 19, 2026
4 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in TechAPI-Project Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci CI and workflow changes data Dataset changes enhancement New feature or request

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Massive dataset rebuild: CPU + brand + GPU + smartphone + SoC (1989-2026)

2 participants