Recommendation: weekly→daily — decouple live-scrape from dump/report

## Context

Moving the weekly automation to a **daily** cadence. The three scheduled workflows differ a lot in weight/side-effects, so recommending a split rather than flipping all three to `* * *` uniformly.

Current schedules (all Mondays):
- `weekly-refresh.yml` — `0 6 * * 1` — enrich (12 live sources) → validate → integrity gate → dump → **PR to TechAPI**
- `coverage-report.yml` — `23 6 * * 1` — coverage diff → sticky issue (in-place)
- `weekly-ingest.yml` — `29 6 * * 1` — upstream catalog scrape → draft-SKU PR for curator review

## Recommended direction

**1. Cron: keep the minute/hour stagger, only flip day-of-week `1` → `*`** (preserves refresh→coverage→ingest ordering):
- refresh `0 6 * * *`, coverage `23 6 * * *`, ingest `29 6 * * *`

**2. Don't make all three daily as-is — they have different weight:**
- **coverage-report → daily ✅** (cheap, sticky issue in-place; no downside)
- **dump regeneration → daily ✅, but only after the timestamp fix below**
- **live enrich (12-source scrape) → keep weekly.** Daily scraping of Wikipedia/cpubenchmark/topcpu etc. is ToS/rate-limit/load-heavy. Decouple "scrape benchmarks" (weekly) from "regenerate dump + report from already-curated `data/`" (daily).
- **weekly-ingest (draft-SKU PRs) → keep weekly** (or switch to a single sticky branch/PR). A new curator-review PR every day will overwhelm review.

**3. Prerequisite for daily dump: make `app.dump` deterministic.**
Today the dump is a stateless rebuild that stamps `created_at`/`updated_at` = build-time on every run, so a daily refresh PR would carry ~1400-file timestamp churn even with zero data changes — daily PRs become unreviewable noise. Fix before going daily: preserve `created_at`, set `updated_at` only on real change (or drop build timestamps / pin `SOURCE_DATE_EPOCH`). Then daily PRs contain only real data deltas.

**4. Housekeeping for daily:**
- Rename `weekly-*` files / `concurrency.group` / comments → `daily-*` (or neutral) to avoid misleading names.
- Auto-merge the refresh PR (dated `refresh/<date>` branch) or commit directly, so daily PRs don't backlog.
- Add `concurrency:` groups to coverage-report and weekly-ingest (refresh already has one) so a slow run doesn't overlap the next day's.
- `integrity_check.py --strict` will be exercised daily — per-source enrich failures already `::warning::`-skip, so a flaky upstream just skips that day's PR (harmless), but worth monitoring.

## One-line summary

Split "scrape (weekly)" from "dump + coverage (daily)"; make the dump deterministic first so daily PRs aren't 1400-file timestamp churn; flip only the day-of-week field and keep the stagger.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Recommendation: weekly→daily — decouple live-scrape from dump/report #17

Context

Recommended direction

One-line summary

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Recommendation: weekly→daily — decouple live-scrape from dump/report #17

Description

Context

Recommended direction

One-line summary

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions