Skip to content

data(smartphone): import PhoneDB smartphone batch#23

Merged
Seungpyo1007 merged 7 commits into
mainfrom
data/import-staging
Jun 18, 2026
Merged

data(smartphone): import PhoneDB smartphone batch#23
Seungpyo1007 merged 7 commits into
mainfrom
data/import-staging

Conversation

@Seungpyo1007

@Seungpyo1007 Seungpyo1007 commented Jun 18, 2026

Copy link
Copy Markdown
Member

Summary

  • import 665 unverified smartphone records from the Kaggle PhoneDB-derived mobile phones specs dataset
  • split the import across brand-focused commits instead of one giant data commit
  • refresh the public v1 smartphone dump so homepage/API static data reports 1,399 smartphones and 7,730 total records
  • keep noisy LG carrier-code rows and malformed Samsung/Nokia variants out of this batch

Changed data

Verification

  • python -m app.validate
  • python TechEngine\integrity_check.py data --strict
  • cd site && npm.cmd run build
  • git diff --check

Refs #1

@Seungpyo1007 Seungpyo1007 added the data Dataset changes label Jun 18, 2026
@github-actions github-actions Bot added the enhancement New feature or request label Jun 18, 2026
@Seungpyo1007 Seungpyo1007 moved this from Todo to In Progress in TechAPI-Project Jun 18, 2026
@TechEngineBot

TechEngineBot commented Jun 18, 2026

Copy link
Copy Markdown
Member

TechEngine change review: PASS

Check Result
python -m app.validate PASS
python integrity_check.py TechAPI/data --strict PASS

Changed data

Category Added Modified Deleted Added verified Added unverified Added Kaggle-sourced
brand 0 0 0 0 0 0
soc 0 0 0 0 0 0
smartphone 665 0 0 0 665 665
gpu 0 0 0 0 0 0
cpu 0 0 0 0 0 0

Changed record examples

smartphone added

  • smartphone/apple/2020/iphone-12-mini-uw.json - iPhone 12 Mini UW
  • smartphone/apple/2020/iphone-12-pro-max-uw.json - iPhone 12 Pro Max UW
  • smartphone/apple/2020/iphone-12-pro-uw.json - iPhone 12 Pro UW
  • smartphone/apple/2020/iphone-12-uw.json - iPhone 12 UW
  • smartphone/apple/2020/iphone-se-2020-2nd-gen.json - iPhone SE 2020 2nd gen
  • smartphone/apple/2021/iphone-13-mini-uw.json - iPhone 13 mini UW
  • smartphone/apple/2021/iphone-13-pro-max-uw.json - iPhone 13 Pro Max UW
  • smartphone/apple/2021/iphone-13-pro-uw.json - iPhone 13 Pro UW
  • smartphone/apple/2021/iphone-13-uw.json - iPhone 13 UW
  • smartphone/apple/2022/iphone-14-plus-uw.json - iPhone 14 Plus UW
  • smartphone/apple/2022/iphone-14-pro-max-uw.json - iPhone 14 Pro Max UW
  • smartphone/apple/2022/iphone-14-pro-uw.json - iPhone 14 Pro UW
  • smartphone/apple/2022/iphone-14-uw.json - iPhone 14 UW
  • smartphone/apple/2022/iphone-se.json - iPhone SE
  • smartphone/asus/2019/rog-phone-ii-strix-edition.json - ROG Phone II Strix Edition
  • ... 650 more

Heuristic review

  • Added records by manufacturer/brand: xiaomi: 173, oppo: 125, realme: 100, motorola: 54, honor: 51, oneplus: 29, huawei: 27, poco: 25
  • Added records by source class: kaggle: 665
  • Heuristic warnings: none found.

@TechEngineBot

TechEngineBot commented Jun 18, 2026

Copy link
Copy Markdown
Member

TechEngine validation stats: PASS

Data summary

Category Total Verified Unverified Missing verified Tracked Verified % of tracked
brand 129 0 0 129 0 n/a
soc 195 58 137 0 195 29.7%
smartphone 1399 184 1215 0 1399 13.2%
gpu 2030 0 2030 0 2030 0.0%
cpu 3977 976 3001 0 3977 24.5%
all 7730 1218 6383 129 7601 16.0%

Warning

Tracked verified coverage is below 50% for gpu 0.0% (0/2030), smartphone 13.2% (184/1399), all 16.0% (1218/7601), cpu 24.5% (976/3977), soc 29.7% (58/195).
Tracked coverage excludes records missing the verified field; see the Missing verified column for those records.
This does not fail validation. Keep imported records verified: false until manual audit, but treat this as follow-up verification work before relying on the affected categories as curated data.

Validation notes

  • Full advisory outlier listings are suppressed on successful runs because they are dataset-wide and mostly stable between PRs.
  • Failure runs still include a detailed log excerpt for debugging.

Key output:

## app.validate
## integrity_check.py --strict
loaded CPU=3977 GPU=2030
✅ integrity gate: no hard anomalies.
Integrity section Flagged lines
structural 0
CPU name/tier consistency (desktop mainstream only) 0
CPU single>multi (cinebench/geekbench — should be multi>=single) 0
CPU era-vs-score outliers 8
CPU cross-source ratio outliers (possible wrong-variant) 152
GPU cross-source ratio outliers + sanity 18

@Seungpyo1007 Seungpyo1007 force-pushed the data/import-staging branch 2 times, most recently from 37fcdb2 to 977c3e7 Compare June 18, 2026 08:41
@Seungpyo1007 Seungpyo1007 merged commit 62d050c into main Jun 18, 2026
4 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in TechAPI-Project Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Dataset changes enhancement New feature or request

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Massive dataset rebuild: CPU + brand + GPU + smartphone + SoC (1989-2026)

2 participants