Context
Moving the weekly automation to a daily cadence. The three scheduled workflows differ a lot in weight/side-effects, so recommending a split rather than flipping all three to * * * uniformly.
Current schedules (all Mondays):
weekly-refresh.yml — 0 6 * * 1 — enrich (12 live sources) → validate → integrity gate → dump → PR to TechAPI
coverage-report.yml — 23 6 * * 1 — coverage diff → sticky issue (in-place)
weekly-ingest.yml — 29 6 * * 1 — upstream catalog scrape → draft-SKU PR for curator review
Recommended direction
1. Cron: keep the minute/hour stagger, only flip day-of-week 1 → * (preserves refresh→coverage→ingest ordering):
- refresh
0 6 * * *, coverage 23 6 * * *, ingest 29 6 * * *
2. Don't make all three daily as-is — they have different weight:
- coverage-report → daily ✅ (cheap, sticky issue in-place; no downside)
- dump regeneration → daily ✅, but only after the timestamp fix below
- live enrich (12-source scrape) → keep weekly. Daily scraping of Wikipedia/cpubenchmark/topcpu etc. is ToS/rate-limit/load-heavy. Decouple "scrape benchmarks" (weekly) from "regenerate dump + report from already-curated
data/" (daily).
- weekly-ingest (draft-SKU PRs) → keep weekly (or switch to a single sticky branch/PR). A new curator-review PR every day will overwhelm review.
3. Prerequisite for daily dump: make app.dump deterministic.
Today the dump is a stateless rebuild that stamps created_at/updated_at = build-time on every run, so a daily refresh PR would carry ~1400-file timestamp churn even with zero data changes — daily PRs become unreviewable noise. Fix before going daily: preserve created_at, set updated_at only on real change (or drop build timestamps / pin SOURCE_DATE_EPOCH). Then daily PRs contain only real data deltas.
4. Housekeeping for daily:
- Rename
weekly-* files / concurrency.group / comments → daily-* (or neutral) to avoid misleading names.
- Auto-merge the refresh PR (dated
refresh/<date> branch) or commit directly, so daily PRs don't backlog.
- Add
concurrency: groups to coverage-report and weekly-ingest (refresh already has one) so a slow run doesn't overlap the next day's.
integrity_check.py --strict will be exercised daily — per-source enrich failures already ::warning::-skip, so a flaky upstream just skips that day's PR (harmless), but worth monitoring.
One-line summary
Split "scrape (weekly)" from "dump + coverage (daily)"; make the dump deterministic first so daily PRs aren't 1400-file timestamp churn; flip only the day-of-week field and keep the stagger.
Context
Moving the weekly automation to a daily cadence. The three scheduled workflows differ a lot in weight/side-effects, so recommending a split rather than flipping all three to
* * *uniformly.Current schedules (all Mondays):
weekly-refresh.yml—0 6 * * 1— enrich (12 live sources) → validate → integrity gate → dump → PR to TechAPIcoverage-report.yml—23 6 * * 1— coverage diff → sticky issue (in-place)weekly-ingest.yml—29 6 * * 1— upstream catalog scrape → draft-SKU PR for curator reviewRecommended direction
1. Cron: keep the minute/hour stagger, only flip day-of-week
1→*(preserves refresh→coverage→ingest ordering):0 6 * * *, coverage23 6 * * *, ingest29 6 * * *2. Don't make all three daily as-is — they have different weight:
data/" (daily).3. Prerequisite for daily dump: make
app.dumpdeterministic.Today the dump is a stateless rebuild that stamps
created_at/updated_at= build-time on every run, so a daily refresh PR would carry ~1400-file timestamp churn even with zero data changes — daily PRs become unreviewable noise. Fix before going daily: preservecreated_at, setupdated_atonly on real change (or drop build timestamps / pinSOURCE_DATE_EPOCH). Then daily PRs contain only real data deltas.4. Housekeeping for daily:
weekly-*files /concurrency.group/ comments →daily-*(or neutral) to avoid misleading names.refresh/<date>branch) or commit directly, so daily PRs don't backlog.concurrency:groups to coverage-report and weekly-ingest (refresh already has one) so a slow run doesn't overlap the next day's.integrity_check.py --strictwill be exercised daily — per-source enrich failures already::warning::-skip, so a flaky upstream just skips that day's PR (harmless), but worth monitoring.One-line summary
Split "scrape (weekly)" from "dump + coverage (daily)"; make the dump deterministic first so daily PRs aren't 1400-file timestamp churn; flip only the day-of-week field and keep the stagger.