Skip to content

feat(verify): tiered data verification layer (Tier 0 offline scoring)#52

Merged
Seungpyo1007 merged 3 commits into
mainfrom
feat/verify-layer
Jun 22, 2026
Merged

feat(verify): tiered data verification layer (Tier 0 offline scoring)#52
Seungpyo1007 merged 3 commits into
mainfrom
feat/verify-layer

Conversation

@Seungpyo1007

Copy link
Copy Markdown
Member

What

Adds app/verify/ — an existence/trust verification layer that sits on top of the structural validator (app/validate.py, left untouched). Where validate.py only checks "is this file well-formed?", this answers "does this record describe a real, actually-existing device/part — confidently enough to set verified:true?" — the goal being to lift the dataset's ~1.2% verified ratio.

Tiers

  • Tier 0 — offline, deterministic, all ~102k records (offline.py/signals.py/hosts.py): four sub-scores (completeness + cross-field consistency + source-host trust + provenance) → a green/yellow/red band. Hard contradictions (threads<cores, boost<base, future release) force red. Full scores cached to gitignored data/_verify/state/; the tracked data/_verify/ledger.jsonl is reserved for promotion decisions.
  • Tier 1 — http_check.py: source_urls HTTP liveness (stdlib urllib + ThreadPool, per-host rate limit, resumable TTL cache).
  • Tier 2 — crossref.py: external cross-reference under a strict exact-heading rule — no fuzzy matching (fuzzy serves the wrong SKU ~35% of the time), so ambiguous candidates never auto-promote.
  • Tier 3 — promote.py: hybrid escalation + surgical verified:false→true write-back (only that token, atomic, LF-preserved; never clobbers curated records or reformats inline arrays).

CLI

python -m app.verify score | report | check-urls | crossref | promote

CI

  • Non-blocking verify-offline job in validate-data.yml (continue-on-error, scores changed records + runs unit tests) — never gates a merge.
  • Scheduled/manual verify-network.yml for the network tiers, with a diff-scope guard that fails unless only verified flags + the ledger changed. No TechEngine/submodule ops.

Validation

40 tests pass. The golden-subset test confirms the offline scorer — blind to the verified flag — reproduces the human-curated verified-CPU set (976/976 land green), which is the empirical justification for using the score to drive promotion.

Tuning note: soc_not_after_device is a soft signal, not hard — the dataset's SoC release_date values are largely placeholder YYYY-01-01 that skew late, so a device-vs-SoC mismatch usually means the SoC date is wrong, not the device. Fixing those dates is a separate enrichment task.

Refs #1

…ing)

Adds app/verify/, an existence/trust verification layer that sits above the
structural validator (app/validate.py, untouched). It answers "does this record
describe a real, existing device/part — confidently enough to set verified:true?"
to lift the ~1.2% verified ratio.

- Tier 0 (offline, deterministic, all ~102k records): completeness + cross-field
  consistency (signals.py) + source-host trust (hosts.py) + provenance -> a
  green/yellow/red band. Full scores cached to gitignored data/_verify/state/;
  the tracked data/_verify/ledger.jsonl is reserved for promotion decisions.
- Tier 1 (http_check.py): source_urls HTTP liveness, urllib + ThreadPool,
  per-host rate limit, resumable TTL cache.
- Tier 2 (crossref.py): external cross-reference under a strict exact-heading
  rule (no fuzzy matching; ambiguous candidates never auto-promote).
- Tier 3 (promote.py): hybrid escalation + surgical verified:false->true
  write-back (only that token, atomic, LF-preserved, never clobbers curated data).

CLI: python -m app.verify score|report|check-urls|crossref|promote.
CI: non-blocking verify-offline job in validate-data.yml; scheduled/manual
verify-network.yml for network tiers with a diff-scope guard. Validates that the
offline scorer reproduces the human-curated verified CPU set (40 tests pass).

Refs #1
@github-actions github-actions Bot added app Validator or application code changes ci CI and workflow changes enhancement New feature or request labels Jun 22, 2026
@Seungpyo1007 Seungpyo1007 moved this from Todo to In Progress in TechAPI-Project Jun 22, 2026
Reworks how verification surfaces on PRs so TechEngineBot owns the analysis,
instead of TechAPI running its own (failing) job:

- Remove the self-run verify-offline job from validate-data.yml. It failed
  because the stdlib-only CI image has no pytest, and having TechAPI score its
  own PRs duplicated what the bot should own. validate-data.yml is back to the
  pure structural gate.
- Add verify-report.yml: runs `app.verify score` (changed records + full
  baseline) and has TechEngineBot post the band histogram as a PR comment via
  ENGINE_TOKEN. Dormant if the token is unset; same-repo PRs only; never gates a
  merge; updates one marked comment in place.
- Add app/verify/** to request-engine-pr-validation paths so the engine's PR
  validation (and its TechEngineBot comment) also covers verifier changes.

Refs #1
@TechEngineBot

TechEngineBot commented Jun 22, 2026

Copy link
Copy Markdown
Member

TechEngine change review: PASS

Check Result
python -m app.validate PASS
python integrity_check.py TechAPI/data --strict PASS

Changed data

Category Added Modified Deleted Added verified Added unverified Added Kaggle-sourced
brand 0 0 0 0 0 0
soc 0 0 0 0 0 0
smartphone 0 0 0 0 0 0
tablet 0 0 0 0 0 0
watch 0 0 0 0 0 0
pda 0 0 0 0 0 0
gpu 0 0 0 0 0 0
cpu 0 0 0 0 0 0

Changed record examples

  • No data file changes detected.

Heuristic review

  • Heuristic warnings: none found.

@TechEngineBot

TechEngineBot commented Jun 22, 2026

Copy link
Copy Markdown
Member

TechEngine validation stats: PASS

Data summary

Category Total Verified Unverified Missing verified Tracked Verified % of tracked
brand 189 0 189 0 189 0.0%
soc 2104 58 2046 0 2104 2.8%
smartphone 90118 184 89934 0 90118 0.2%
tablet 3048 0 3048 0 3048 0.0%
watch 378 0 378 0 378 0.0%
pda 110 0 110 0 110 0.0%
gpu 2030 0 2030 0 2030 0.0%
cpu 3977 976 3001 0 3977 24.5%
all 101954 1218 100736 0 101954 1.2%

Warning

Tracked verified coverage is below 50% for brand 0.0% (0/189), tablet 0.0% (0/3048), watch 0.0% (0/378), pda 0.0% (0/110), gpu 0.0% (0/2030), smartphone 0.2% (184/90118), all 1.2% (1218/101954), soc 2.8% (58/2104), and 1 more.
Tracked coverage excludes records missing the verified field; see the Missing verified column for those records.
This does not fail validation. Keep imported records verified: false until manual audit, but treat this as follow-up verification work before relying on the affected categories as curated data.

Validation notes

  • Full advisory outlier listings are suppressed on successful runs because they are dataset-wide and mostly stable between PRs.
  • Failure runs still include a detailed log excerpt for debugging.

Key output:

## app.validate
## integrity_check.py --strict
loaded CPU=3977 GPU=2030
✅ integrity gate: no hard anomalies.
Integrity section Flagged lines
structural 0
CPU name/tier consistency (desktop mainstream only) 0
CPU single>multi (cinebench/geekbench — should be multi>=single) 0
CPU era-vs-score outliers 8
CPU cross-source ratio outliers (possible wrong-variant) 152
GPU cross-source ratio outliers + sanity 18

Use TECHENGINEBOT_TOKEN (the bot's PAT) for the github-script step so the Tier 0
analysis comment is authored by TechEngineBot, falling back to ENGINE_TOKEN only
to keep the workflow running if the bot token is absent. Refs #1
@github-project-automation github-project-automation Bot moved this from In Progress to Done in TechAPI-Project Jun 22, 2026
@Seungpyo1007 Seungpyo1007 reopened this Jun 22, 2026
@Seungpyo1007 Seungpyo1007 moved this from Done to In Progress in TechAPI-Project Jun 22, 2026
@Seungpyo1007

Copy link
Copy Markdown
Member Author

🔎 Data verification — Tier 0 (offline existence/trust)

Scored by app.verify; posted by TechEngineBot. Informational only —
the structural gate (app.validate) is separate and authoritative for merge.

### Changed records in this PR
Tier 0 offline score — 0 record(s)

category        green   yellow      red    total
------------------------------------------------
------------------------------------------------
ALL                 0        0        0        0

bands: green 0.0%  yellow 0.0%  red 0.0%

### Full-dataset baseline
Tier 0 offline score — 101954 record(s)

category        green   yellow      red    total
------------------------------------------------
brand              10      179        0      189
soc               123      680     1301     2104
smartphone       8453    80547     1118    90118
tablet            174     2846       28     3048
watch              11      357       10      378
pda                27       77        6      110
gpu               245     1785        0     2030
cpu               976     3001        0     3977
------------------------------------------------
ALL             10019    89472     2463   101954

bands: green 9.8%  yellow 87.8%  red 2.4%

green = authoritative source + complete + consistent · yellow = plausible, needs confirmation · red = sparse/weak source or a hard contradiction. Promotion to verified runs in the scheduled verify-network workflow.

@Seungpyo1007 Seungpyo1007 merged commit 222ae80 into main Jun 22, 2026
5 of 6 checks passed
@Seungpyo1007 Seungpyo1007 deleted the feat/verify-layer branch June 22, 2026 05:00
@github-project-automation github-project-automation Bot moved this from In Progress to Done in TechAPI-Project Jun 22, 2026
@TechEngineBot

Copy link
Copy Markdown
Member

🔎 Data verification — Tier 0 (on demand)

Requested by @Seungpyo1007 via /verify · scored by app.verify, posted by TechEngineBot. Informational only — the structural gate (app.validate) is separate.

### Changed records in this PR
Tier 0 offline score — 0 record(s)

category        green   yellow      red    total
------------------------------------------------
------------------------------------------------
ALL                 0        0        0        0

bands: green 0.0%  yellow 0.0%  red 0.0%

### Full-dataset baseline
Tier 0 offline score — 101954 record(s)

category        green   yellow      red    total
------------------------------------------------
brand              10      179        0      189
soc               123      680     1301     2104
smartphone       8453    80547     1118    90118
tablet            174     2846       28     3048
watch              11      357       10      378
pda                27       77        6      110
gpu               245     1785        0     2030
cpu               976     3001        0     3977
------------------------------------------------
ALL             10019    89472     2463   101954

bands: green 9.8%  yellow 87.8%  red 2.4%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

app Validator or application code changes ci CI and workflow changes enhancement New feature or request

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants