Skip to content

data(brand): backfill founded_year and website for known brands#46

Merged
Seungpyo1007 merged 2 commits into
mainfrom
data/brand-metadata
Jun 21, 2026
Merged

data(brand): backfill founded_year and website for known brands#46
Seungpyo1007 merged 2 commits into
mainfrom
data/brand-metadata

Conversation

@Seungpyo1007

Copy link
Copy Markdown
Member

Summary

Backfill brand metadata for brands with confidently-known facts:

  • founded_year for 25 brands (Casio 1957, HP 1939, Dell 1984, Amazon 1994, Bosch 1886, Mitsubishi 1921, Sagem 1925, Vodafone 1991, Sony Ericsson 2001, BenQ-Siemens 2005, Fujitsu Siemens 1999, Palm 1992, Thuraya 1997, Haier 1984, O2 2002, Benefon 1988, Sendo 1999, Neonode 2004, MiTAC 1982, Amoi 1997, Ningbo Bird 1992, Sonim 1999, Modu 2007, INQ 2008, Blackview 2013).
  • website for 6 active brands (Thuraya, Haier, Bosch, Orange, Prestigio, Vodafone).

Uncertain founding years and ambiguous/defunct websites were intentionally left blank (no wrong dates, no dead links). Matching site/public/v1/brands dump details refreshed.

Verification

  • python -m app.validate PASS
  • python TechEngine/integrity_check.py data --strict PASS
  • git diff --check origin/main...HEAD PASS
  • Local site build skipped (worktree without node_modules); change is build-safe metadata; TechEngine homepage validation builds the site.

Closes #1

Add factual founded_year (25 brands) and official website (6 brands) where confidently known; uncertain/defunct brands left untouched to avoid wrong dates or dead links.

Refs #1
@Seungpyo1007 Seungpyo1007 added the data Dataset changes label Jun 21, 2026
@Seungpyo1007 Seungpyo1007 self-assigned this Jun 21, 2026
@github-actions github-actions Bot added the enhancement New feature or request label Jun 21, 2026
@Seungpyo1007 Seungpyo1007 moved this from Todo to In Progress in TechAPI-Project Jun 21, 2026
@Seungpyo1007 Seungpyo1007 merged commit c47e6bb into main Jun 21, 2026
4 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in TechAPI-Project Jun 21, 2026
@TechEngineBot

Copy link
Copy Markdown
Member

TechEngine change review: PASS

Check Result
python -m app.validate PASS
python integrity_check.py TechAPI/data --strict PASS

Changed data

Category Added Modified Deleted Added verified Added unverified Added Kaggle-sourced
brand 0 0 0 0 0 0
soc 0 0 0 0 0 0
smartphone 0 0 0 0 0 0
tablet 0 0 0 0 0 0
watch 0 0 0 0 0 0
pda 0 0 0 0 0 0
gpu 0 0 0 0 0 0
cpu 0 0 0 0 0 0

Changed record examples

  • No data file changes detected.

Heuristic review

  • Heuristic warnings: none found.

@TechEngineBot

Copy link
Copy Markdown
Member

TechEngine validation stats: PASS

Data summary

Category Total Verified Unverified Missing verified Tracked Verified % of tracked
brand 189 0 189 0 189 0.0%
soc 2104 58 2046 0 2104 2.8%
smartphone 90118 184 89934 0 90118 0.2%
tablet 3048 0 3048 0 3048 0.0%
watch 378 0 378 0 378 0.0%
pda 110 0 110 0 110 0.0%
gpu 2030 0 2030 0 2030 0.0%
cpu 3977 976 3001 0 3977 24.5%
all 101954 1218 100736 0 101954 1.2%

Warning

Tracked verified coverage is below 50% for brand 0.0% (0/189), tablet 0.0% (0/3048), watch 0.0% (0/378), pda 0.0% (0/110), gpu 0.0% (0/2030), smartphone 0.2% (184/90118), all 1.2% (1218/101954), soc 2.8% (58/2104), and 1 more.
Tracked coverage excludes records missing the verified field; see the Missing verified column for those records.
This does not fail validation. Keep imported records verified: false until manual audit, but treat this as follow-up verification work before relying on the affected categories as curated data.

Validation notes

  • Full advisory outlier listings are suppressed on successful runs because they are dataset-wide and mostly stable between PRs.
  • Failure runs still include a detailed log excerpt for debugging.

Key output:

## app.validate
## integrity_check.py --strict
loaded CPU=3977 GPU=2030
✅ integrity gate: no hard anomalies.
Integrity section Flagged lines
structural 0
CPU name/tier consistency (desktop mainstream only) 0
CPU single>multi (cinebench/geekbench — should be multi>=single) 0
CPU era-vs-score outliers 8
CPU cross-source ratio outliers (possible wrong-variant) 152
GPU cross-source ratio outliers + sanity 18

Seungpyo1007 added a commit that referenced this pull request Jun 21, 2026
Fill missing brand logo_url (17), website (23), founded_year (9) from authoritative Wikidata (P154/P856/P571). Description-guard skips wrong-entity name collisions; gap-only so #46's hand-verified founded_years are preserved.

Refs #1
Seungpyo1007 added a commit that referenced this pull request Jun 21, 2026
Fill missing brand logo_url (17), website (23), founded_year (9) from authoritative Wikidata (P154/P856/P571). Description-guard skips wrong-entity name collisions; gap-only so #46's hand-verified founded_years are preserved.

Refs #1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Dataset changes enhancement New feature or request

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Massive dataset rebuild: CPU + brand + GPU + smartphone + SoC (1989-2026)

2 participants