Skip to content

Latest commit

 

History

History
104 lines (84 loc) · 4.88 KB

File metadata and controls

104 lines (84 loc) · 4.88 KB

TechEngine

Validation, ingestion, and serving engine for the TechAPI dataset.

test  Code: MIT - Data: lives in TechAPI (CC-BY-SA 4.0)

TechEngine owns everything around the data: schema validation, the FastAPI read API, the static JSON dump generator, the engine's own landing site, automated coverage checks, and the weekly ingestion/refresh crawlers.

The dataset and the public-facing playground site live in TechAPI so each can be versioned, mirrored, and licensed independently. The site shipped in this repo is the engine's own landing: what TechEngine is, what it runs, and links out to docs.

Layout

app/
  validate.py          # schema/range/uniqueness checks
  seed.py              # data/ -> SQLModel database
  dump.py              # API -> static JSON tree
  main.py              # FastAPI entrypoint
  models/              # SQLModel tables
  routers/             # /v1/{brands,socs,smartphones,gpus,cpus}
  schemas/             # Pydantic response models
  services/            # scoring (algorithm_version-tagged)
  coverage/            # upstream-vs-curated diff + Markdown report
  ingest/              # draft new records from upstream pages
tests/                 # unit + integration
site/                  # Astro engine landing (deploys to Pages)
docs/                  # SPEC / DATA_PIPELINE / DEVELOPMENT
TechAPI/               # submodule -> GetTechAPI/TechAPI (clickable @ <sha> link)
.github/workflows/
  validate-data.yml    # workflow_call: PR-time data validation for TechAPI
  weekly-refresh.yml   # cron: live-scrape -> integrity gate -> dump -> PR to TechAPI
  weekly-ingest.yml    # cron: draft new SKUs, open PR against TechAPI
  coverage-report.yml  # cron: gap report, sticky issue (TechEngine + TechAPI)
  refresh-data.yml     # smoke-test: rebuild the dump on engine (app/**) changes
  notify-techapi.yml   # push to main: ping TechAPI to bump its TechEngine submodule
  bump-techapi.yml     # dispatch: advance this repo's TechAPI submodule pointer
  deploy-pages.yml     # build & deploy engine site + dump
  test.yml             # lint + type-check + tests

How The Two Repos Connect

Both repos live in the GetTechAPI org and each includes the other as a git submodule (a clickable @ <sha> pin). Three automations keep them in step:

  • validate-data.yml (workflow_call) - TechAPI's PR-time check calls into TechEngine to validate its data.
  • weekly-refresh.yml - live-scrapes benchmarks, runs the full-dataset integrity gate (app.validate + integrity_check.py --strict), regenerates the static dump, and opens a dated refresh PR against TechAPI.
  • Submodule autosync - every push to TechEngine main fires notify-techapi.yml, which pings TechAPI to bump its TechEngine pointer; conversely bump-techapi.yml advances TechEngine's TechAPI pointer when TechAPI changes. Bumps are loop-guarded, so each real change converges to one.

Every Python entry point reads data from a TechAPI checkout. The location can be overridden via TECHAPI_DATA_DIR; by default TechEngine supports both layouts: TechAPI/TechEngine as a submodule and TechAPI beside TechEngine.

Quickstart

git clone https://github.com/GetTechAPI/TechAPI.git ../TechAPI   # data source
pip install -e ".[dev]"
python -m app.validate          # check data integrity
python -m app.seed              # data/ -> ./techapi.db (SQLite)
uvicorn app.main:app --reload   # serve; curl localhost:8000/v1/cpus/ryzen-9-9950x3d
python -m app.dump              # generate ./dump/v1/... static tree

TECHAPI_DATA_DIR=/path/to/TechAPI/data overrides the data location.

Docker Compose (Postgres)

docker compose up --build

Spins up Postgres 16, seeds from the mounted TechAPI checkout, serves on :8000.

Roadmap

  • Split out from TechAPI; sibling-checkout data pipeline
  • Coverage gap detector - diff curated dataset vs upstream catalogs and surface missing SKUs as a sticky weekly issue (#1)
  • Weekly ingestion crawler - scrape canonical sources and open PRs against TechAPI with new SKUs (requires the TECHAPI_TOKEN secret to push) (#2)
  • Weekly refresh pipeline - live benchmark enrichment -> full-dataset integrity gate -> static dump -> dated refresh PR (weekly-refresh.yml)
  • Bidirectional submodule autosync between TechEngine and TechAPI
  • More sources (Intel ARK, AMD product pages, TechPowerUp DB)

License

Code is licensed under the MIT License. The dataset (in TechAPI) is licensed CC-BY-SA 4.0 separately.