Skip to content

data(smartphone): import 10000 GSMArena raw devices#26

Merged
Seungpyo1007 merged 50 commits into
mainfrom
data/import-staging
Jun 19, 2026
Merged

data(smartphone): import 10000 GSMArena raw devices#26
Seungpyo1007 merged 50 commits into
mainfrom
data/import-staging

Conversation

@Seungpyo1007

Copy link
Copy Markdown
Member

Summary

  • import 10,000 unverified smartphone records from Kaggle-backed GSMArena/PhoneDB source data
  • add 60 unverified OEM brand records and 1,344 unverified SoC stub records required by the smartphone batch
  • refresh the static site/public/v1 API dump after the data import
  • split the import into 50 commits: 1 brand/SoC seed, 48 smartphone chunks, 1 public dump refresh

Sources

Verification

  • python -m app.validate
  • python TechEngine\integrity_check.py data --strict
  • git diff --check
  • cd site && npm.cmd run build

Notes

  • All newly imported records are intentionally verified: false until TechEngine/manual audit can verify them.
  • TechEngine advisory outlier listings still contain existing CPU/GPU warnings, but the strict integrity gate reports no hard anomalies.

Closes #1

Add unverified brand and SoC records required by the 10k GSMArena and PhoneDB smartphone import batch.

Refs #1
Add 209 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 01 of 48; records remain verified: false pending manual audit.

Refs #1
Add 209 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 02 of 48; records remain verified: false pending manual audit.

Refs #1
Add 209 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 03 of 48; records remain verified: false pending manual audit.

Refs #1
Add 209 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 04 of 48; records remain verified: false pending manual audit.

Refs #1
Add 209 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 05 of 48; records remain verified: false pending manual audit.

Refs #1
Add 209 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 06 of 48; records remain verified: false pending manual audit.

Refs #1
Add 209 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 07 of 48; records remain verified: false pending manual audit.

Refs #1
Add 209 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 08 of 48; records remain verified: false pending manual audit.

Refs #1
Add 209 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 09 of 48; records remain verified: false pending manual audit.

Refs #1
Add 209 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 10 of 48; records remain verified: false pending manual audit.

Refs #1
Add 209 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 11 of 48; records remain verified: false pending manual audit.

Refs #1
Add 209 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 12 of 48; records remain verified: false pending manual audit.

Refs #1
Add 209 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 13 of 48; records remain verified: false pending manual audit.

Refs #1
Add 209 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 14 of 48; records remain verified: false pending manual audit.

Refs #1
Add 209 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 15 of 48; records remain verified: false pending manual audit.

Refs #1
Add 209 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 16 of 48; records remain verified: false pending manual audit.

Refs #1
Add 208 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 17 of 48; records remain verified: false pending manual audit.

Refs #1
Add 208 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 18 of 48; records remain verified: false pending manual audit.

Refs #1
Add 208 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 19 of 48; records remain verified: false pending manual audit.

Refs #1
Add 208 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 20 of 48; records remain verified: false pending manual audit.

Refs #1
Add 208 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 21 of 48; records remain verified: false pending manual audit.

Refs #1
Add 208 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 22 of 48; records remain verified: false pending manual audit.

Refs #1
Add 208 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 23 of 48; records remain verified: false pending manual audit.

Refs #1
Add 208 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 24 of 48; records remain verified: false pending manual audit.

Refs #1
Add 208 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 25 of 48; records remain verified: false pending manual audit.

Refs #1
Add 208 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 26 of 48; records remain verified: false pending manual audit.

Refs #1
Add 208 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 27 of 48; records remain verified: false pending manual audit.

Refs #1
Add 208 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 28 of 48; records remain verified: false pending manual audit.

Refs #1
Add 208 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 29 of 48; records remain verified: false pending manual audit.

Refs #1
Add 208 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 33 of 48; records remain verified: false pending manual audit.

Refs #1
Add 208 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 34 of 48; records remain verified: false pending manual audit.

Refs #1
Add 208 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 35 of 48; records remain verified: false pending manual audit.

Refs #1
Add 208 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 36 of 48; records remain verified: false pending manual audit.

Refs #1
Add 208 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 37 of 48; records remain verified: false pending manual audit.

Refs #1
Add 208 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 38 of 48; records remain verified: false pending manual audit.

Refs #1
Add 208 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 39 of 48; records remain verified: false pending manual audit.

Refs #1
Add 208 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 40 of 48; records remain verified: false pending manual audit.

Refs #1
Add 208 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 41 of 48; records remain verified: false pending manual audit.

Refs #1
Add 208 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 42 of 48; records remain verified: false pending manual audit.

Refs #1
Add 208 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 43 of 48; records remain verified: false pending manual audit.

Refs #1
Add 208 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 44 of 48; records remain verified: false pending manual audit.

Refs #1
Add 208 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 45 of 48; records remain verified: false pending manual audit.

Refs #1
Add 208 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 46 of 48; records remain verified: false pending manual audit.

Refs #1
Add 208 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 47 of 48; records remain verified: false pending manual audit.

Refs #1
Add 208 unverified smartphone records from the GSMArena and PhoneDB import batch.

Chunk 48 of 48; records remain verified: false pending manual audit.

Refs #1
Regenerate the static v1 API dump after adding unverified GSMArena and PhoneDB smartphone records.

Refs #1
@Seungpyo1007 Seungpyo1007 added enhancement New feature or request data Dataset changes labels Jun 19, 2026
@TechEngineBot

Copy link
Copy Markdown
Member

TechEngine change review: PASS

Check Result
python -m app.validate PASS
python integrity_check.py TechAPI/data --strict PASS

Changed data

Category Added Modified Deleted Added verified Added unverified Added Kaggle-sourced
brand 60 0 0 0 60 60
soc 1344 0 0 0 1344 1344
smartphone 10000 0 0 0 10000 10000
gpu 0 0 0 0 0 0
cpu 0 0 0 0 0 0

Changed record examples

brand added

  • brand/ae/i-mate.json - i-mate
  • brand/ae/thuraya.json - Thuraya
  • brand/at/tel-me.json - Tel.Me.
  • brand/cn/amoi.json - Amoi
  • brand/cn/bird.json - Bird
  • brand/cn/blackview.json - Blackview
  • brand/cn/chea.json - Chea
  • brand/cn/haier.json - Haier
  • brand/cy/prestigio.json - Prestigio
  • brand/de/benq-siemens.json - BenQ-Siemens
  • brand/de/bosch.json - Bosch
  • brand/de/fujitsu-siemens.json - Fujitsu Siemens
  • brand/de/t-mobile.json - T-Mobile
  • brand/fi/benefon.json - Benefon
  • brand/fr/orange.json - Orange
  • ... 45 more

soc added

  • soc/apple/2010/apple-a4.json - Apple A4
  • soc/apple/2012/apple-a5.json - Apple A5
  • soc/apple/2012/apple-a5x.json - Apple A5X
  • soc/apple/2012/apple-a6x.json - Apple A6X
  • soc/apple/2014/apple-a8x.json - Apple A8X
  • soc/apple/2014/apple-s1.json - Apple S1
  • soc/apple/2016/apple-a9x.json - Apple A9X
  • soc/apple/2016/apple-s1p.json - Apple S1P
  • soc/apple/2016/apple-s2.json - Apple S2
  • soc/apple/2017/apple-a10x-fusion.json - Apple A10X Fusion
  • soc/apple/2017/apple-s3.json - Apple S3
  • soc/apple/2018/apple-a12x-bionic.json - Apple A12X Bionic
  • soc/apple/2018/apple-s4.json - Apple S4
  • soc/apple/2019/apple-s5.json - Apple S5
  • soc/apple/2020/apple-a12z-bionic.json - Apple A12Z Bionic
  • ... 1329 more

smartphone added

  • smartphone/acer/2009/betouch-e100.json - beTouch E100
  • smartphone/acer/2009/betouch-e101.json - beTouch E101
  • smartphone/acer/2009/betouch-e200.json - beTouch E200
  • smartphone/acer/2009/dx650.json - DX650
  • smartphone/acer/2009/dx900.json - DX900
  • smartphone/acer/2009/f900.json - F900
  • smartphone/acer/2009/liquid.json - Liquid
  • smartphone/acer/2009/m900.json - M900
  • smartphone/acer/2009/neotouch.json - neoTouch
  • smartphone/acer/2009/x960.json - X960
  • smartphone/acer/2010/betouch-e110.json - beTouch E110
  • smartphone/acer/2010/betouch-e120.json - beTouch E120
  • smartphone/acer/2010/betouch-e130.json - beTouch E130
  • smartphone/acer/2010/betouch-e140.json - beTouch E140
  • smartphone/acer/2010/betouch-e400.json - beTouch E400
  • ... 9985 more

Heuristic review

  • Added records by manufacturer/brand: samsung: 1295, lg: 693, arm: 631, nokia: 502, motorola: 468, alcatel: 443, huawei: 369, blu: 311

  • Added records by source class: kaggle: 11404

  • Heuristic warnings: 19 total; showing first 19.

    • soc: soc/arm/2004/vk-mobile-mobile-platform-2004.json: repeated adjacent word in name
    • soc: soc/arm/2005/vk-mobile-mobile-platform-2005.json: repeated adjacent word in name
    • soc: soc/arm/2006/i-mobile-mobile-platform-2006.json: repeated adjacent word in name
    • soc: soc/arm/2006/t-mobile-mobile-platform-2006.json: repeated adjacent word in name
    • soc: soc/arm/2006/vk-mobile-mobile-platform-2006.json: repeated adjacent word in name
    • soc: soc/arm/2007/i-mobile-mobile-platform-2007.json: repeated adjacent word in name
    • soc: soc/arm/2007/t-mobile-mobile-platform-2007.json: repeated adjacent word in name
    • soc: soc/arm/2007/vk-mobile-mobile-platform-2007.json: repeated adjacent word in name
    • soc: soc/arm/2008/i-mobile-mobile-platform-2008.json: repeated adjacent word in name
    • soc: soc/arm/2008/t-mobile-mobile-platform-2008.json: repeated adjacent word in name
    • soc: soc/arm/2009/i-mobile-mobile-platform-2009.json: repeated adjacent word in name
    • soc: soc/arm/2009/t-mobile-mobile-platform-2009.json: repeated adjacent word in name
    • soc: soc/arm/2010/i-mobile-mobile-platform-2010.json: repeated adjacent word in name
    • soc: soc/arm/2010/t-mobile-mobile-platform-2010.json: repeated adjacent word in name
    • soc: soc/arm/2011/t-mobile-mobile-platform-2011.json: repeated adjacent word in name
    • smartphone: smartphone/oppo/2018/f9-f9-pro.json: repeated adjacent word in name
    • smartphone: smartphone/samsung/2010/smiley.json: unbalanced parentheses in name
    • smartphone: smartphone/samsung/2019/sm-a202j-galaxy-a20-wimax-2-scv46-scv46-u.json: repeated adjacent word in name
    • smartphone: smartphone/vivo/2018/v11-v11-pro.json: repeated adjacent word in name

@TechEngineBot

Copy link
Copy Markdown
Member

TechEngine validation stats: PASS

Data summary

Category Total Verified Unverified Missing verified Tracked Verified % of tracked
brand 189 0 60 129 60 0.0%
soc 1560 58 1502 0 1560 3.7%
smartphone 16544 184 16360 0 16544 1.1%
gpu 2030 0 2030 0 2030 0.0%
cpu 3977 976 3001 0 3977 24.5%
all 24300 1218 22953 129 24171 5.0%

Warning

Tracked verified coverage is below 50% for brand 0.0% (0/60), gpu 0.0% (0/2030), smartphone 1.1% (184/16544), soc 3.7% (58/1560), all 5.0% (1218/24171), cpu 24.5% (976/3977).
Tracked coverage excludes records missing the verified field; see the Missing verified column for those records.
This does not fail validation. Keep imported records verified: false until manual audit, but treat this as follow-up verification work before relying on the affected categories as curated data.

Validation notes

  • Full advisory outlier listings are suppressed on successful runs because they are dataset-wide and mostly stable between PRs.
  • Failure runs still include a detailed log excerpt for debugging.

Key output:

## app.validate
## integrity_check.py --strict
loaded CPU=3977 GPU=2030
✅ integrity gate: no hard anomalies.
Integrity section Flagged lines
structural 0
CPU name/tier consistency (desktop mainstream only) 0
CPU single>multi (cinebench/geekbench — should be multi>=single) 0
CPU era-vs-score outliers 8
CPU cross-source ratio outliers (possible wrong-variant) 152
GPU cross-source ratio outliers + sanity 18

@Seungpyo1007 Seungpyo1007 moved this from Todo to In Progress in TechAPI-Project Jun 19, 2026
@Seungpyo1007 Seungpyo1007 merged commit 33136f5 into main Jun 19, 2026
4 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in TechAPI-Project Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Dataset changes enhancement New feature or request

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Massive dataset rebuild: CPU + brand + GPU + smartphone + SoC (1989-2026)

2 participants