Parent: none
Follow-up from: #53 (closed), #126 (closed via #127)
Composes with: #47 (freshness detection research — may supersede this when it lands)
Depends on: none (all prerequisites merged)
Decision (locked 2026-04-17)
Add a dedicated cron workflow whose sole job is to touch the per-lib artifact caches via actions/cache/restore@v5 so that their last_accessed_at stays fresh under GH Actions' 7-day inactivity eviction. The workflow must not invoke deadzone scrape, deadzone consolidate, or deadzone dbrelease — keepalive is cache-only.
This preserves docs/research/batch-scrape-actions.md decision #2 (workflow_dispatch only for scrape-pack.yml, no cron rescrape until #47 lands). The keepalive is orthogonal to freshness detection: it prevents eviction, it does not refresh content.
Why
GitHub Actions evicts cache entries not accessed in 7 days (and LRU-evicts past 10 GB/repo — not the binding constraint here). scrape-pack.yml is workflow_dispatch-only, so any operator silent for a week loses every artifact-<slug>-<version>-… entry and the next dispatch pays full cold-scrape cost, defeating the freshness-shim property of batch-scrape-actions.md decision #3.
Acceptance criteria
Code skeleton (sketch — finalize in implementation)
name: cache-keepalive
on:
schedule:
- cron: '0 4 * * 1,4' # Mon + Thu 04:00 UTC, max gap 4 days vs 7-day GHA eviction
workflow_dispatch: {}
permissions:
contents: read
concurrency:
group: cache-keepalive
cancel-in-progress: false
jobs:
expand-libs:
runs-on: ubuntu-latest
outputs:
libs: ${{ steps.list.outputs.libs }}
steps:
- uses: actions/checkout@v6
- uses: actions/setup-go@v6
with: { go-version-file: go.mod }
- name: Install native deps
uses: ./.github/actions/install-native-deps
- id: list
shell: bash
run: |
set -euo pipefail
libs="$(go run -tags ORT ./cmd/deadzone scrape -list -config libraries_sources.yaml)"
echo "libs=$libs" >> "$GITHUB_OUTPUT"
refresh:
needs: expand-libs
runs-on: ubuntu-latest
strategy:
fail-fast: false
max-parallel: 20
matrix:
entry: ${{ fromJSON(needs.expand-libs.outputs.libs) }}
steps:
- uses: actions/checkout@v6
- name: Touch artifact cache
id: touch
# Mirror of scrape-pack.yml L129-130 — drift breaks the keepalive
# (different key = different cache entry = no refresh).
uses: actions/cache/restore@v5
with:
path: artifacts/${{ matrix.entry.slug }}
key: artifact-${{ matrix.entry.slug }}-${{ matrix.entry.version }}-${{ hashFiles('libraries_sources.yaml') }}-${{ hashFiles('internal/embed/hugot.go') }}
- name: Record hit/miss
shell: bash
run: |
set -euo pipefail
status="miss"; [ "${{ steps.touch.outputs.cache-hit }}" = "true" ] && status="hit"
echo "| \`${{ matrix.entry.lib_id }}\` | \`${{ matrix.entry.version }}\` | $status |" >> "$GITHUB_STEP_SUMMARY"
# Totals are rendered by a final `report` job that aggregates per-matrix notices —
# implementer's choice on pattern (needs.refresh.outputs via matrix isn't trivial;
# a final job that re-reads the needs.expand-libs.libs JSON + the action log is fine).
Concrete file pointers
Files to create:
.github/workflows/cache-keepalive.yml
Files to modify:
docs/research/batch-scrape-actions.md (§3 sub-section — ≤15 lines)
CLAUDE.md (1 line in Build & run)
Files to read as reference — do NOT refactor:
.github/workflows/scrape-pack.yml lines 55–78 (expand-libs pattern), 129–130 (artifact cache key)
internal/packs/paths.go — packs.Slug is used transitively via the JSON emitted by -list
cmd/deadzone/scrape.go — -list flag already ships; no new Go surface needed
Test commands (literal, for agent self-check)
mise exec -- go build -tags ORT ./... — builds clean (should, since no Go changes)
- Pre-run cache audit:
gh api repos/laradji/deadzone/actions/caches --jq '.actions_caches[] | select(.key|startswith("artifact-")) | {key, last_accessed_at}'
- After push, trigger manually:
gh workflow run cache-keepalive.yml --ref <branch>
- Watch:
gh run watch <run-id> --exit-status
- Post-run: re-run the cache audit —
last_accessed_at on the artifact-* entries should have advanced past the run's start time. Summary shows a hit/miss table + totals
- Verify no entry in the caches list has changed content (size / key should match pre-run) — keepalive must not resave the cache
Out of scope (fenced)
Open sub-decisions for the implementer
- Totals reporting shape: aggregating hit/miss counters across a matrix is awkward without
upload-artifact (ruled out above). Two acceptable patterns:
- A final
report job reads needs.expand-libs.outputs.libs and echoes totals without per-slot data — keeps keepalive zero-side-effect
- Use the matrix's
continue-on-error: false default + count non-failed slots — loses miss/hit distinction but simpler
- Implementer picks. Per-slot row in the markdown summary is the hard requirement; the totals line is nice-to-have.
workflow_dispatch input filter: adding inputs.lib (like scrape-pack.yml) to scope the refresh to a single lib is optional. Skip unless trivial.
Parent: none
Follow-up from: #53 (closed), #126 (closed via #127)
Composes with: #47 (freshness detection research — may supersede this when it lands)
Depends on: none (all prerequisites merged)
Decision (locked 2026-04-17)
Add a dedicated cron workflow whose sole job is to touch the per-lib artifact caches via
actions/cache/restore@v5so that theirlast_accessed_atstays fresh under GH Actions' 7-day inactivity eviction. The workflow must not invokedeadzone scrape,deadzone consolidate, ordeadzone dbrelease— keepalive is cache-only.This preserves
docs/research/batch-scrape-actions.mddecision #2 (workflow_dispatchonly forscrape-pack.yml, no cron rescrape until #47 lands). The keepalive is orthogonal to freshness detection: it prevents eviction, it does not refresh content.Why
GitHub Actions evicts cache entries not accessed in 7 days (and LRU-evicts past 10 GB/repo — not the binding constraint here).
scrape-pack.ymlisworkflow_dispatch-only, so any operator silent for a week loses everyartifact-<slug>-<version>-…entry and the next dispatch pays full cold-scrape cost, defeating the freshness-shim property ofbatch-scrape-actions.mddecision #3.Acceptance criteria
.github/workflows/cache-keepalive.ymlexists with:on.schedule: - cron: '0 4 * * 1,4'(Mon + Thu 04:00 UTC — max gap 4 days, well under the 7-day window with margin for cron lag / runner queueing)on.workflow_dispatch: {}for manual retriggerspermissions: contents: read(no writes — keepalive never publishes)concurrency: { group: cache-keepalive, cancel-in-progress: false }expand-libsreuses the exact pattern from.github/workflows/scrape-pack.ymllines 55–78 (checkout, setup-go, install-native-deps,go run -tags ORT ./cmd/deadzone scrape -list -config libraries_sources.yaml) and emitslibsas a job outputrefresh:needs: expand-libsstrategy: { matrix: { entry: ${{ fromJSON(needs.expand-libs.outputs.libs) }} }, fail-fast: false, max-parallel: 20 }actions/checkout@v6(required foractions/cachepath resolution)actions/cache/restore@v5with path + key mirrored verbatim fromscrape-pack.ymllines 129–130. Inline the key string with a comment# Mirror of scrape-pack.yml L129-130 — drift breaks the keepalive (different key = different cache entry = no refresh)Record hit/missbash step that writeshitormissto$GITHUB_STEP_SUMMARYbased onsteps.<cache-step-id>.outputs.cache-hitinstall-native-deps, no embedder/ORT cache restore, nodeadzoneinvocation, noupload-artifact$GITHUB_STEP_SUMMARYwith columnslib|version|status(hit/miss) plus a totals lineN hit, M miss. Amissis not a workflow failure — it just means that lib will be fully rescraped on the next operator dispatchdocs/research/batch-scrape-actions.md§3 gains a new sub-section Cache keepalive (≤15 lines) documenting:CLAUDE.md→ Build & run section gains one line:Cache keepalive: '.github/workflows/cache-keepalive.yml' refreshes artifact caches Mon+Thu to stay under GHA's 7-day eviction. A miss on cold libs is not a failure — next operator dispatch of 'scrape-pack' recovers.mainsucceeds and the summary reports ≥1hit(assuming the caches seeded by runs24560380908+24560472614are still within their 7-day window — if they've aged out,0 hit, N missis still a pass)Code skeleton (sketch — finalize in implementation)
Concrete file pointers
Files to create:
.github/workflows/cache-keepalive.ymlFiles to modify:
docs/research/batch-scrape-actions.md(§3 sub-section — ≤15 lines)CLAUDE.md(1 line in Build & run)Files to read as reference — do NOT refactor:
.github/workflows/scrape-pack.ymllines 55–78 (expand-libs pattern), 129–130 (artifact cache key)internal/packs/paths.go—packs.Slugis used transitively via the JSON emitted by-listcmd/deadzone/scrape.go—-listflag already ships; no new Go surface neededTest commands (literal, for agent self-check)
mise exec -- go build -tags ORT ./...— builds clean (should, since no Go changes)gh api repos/laradji/deadzone/actions/caches --jq '.actions_caches[] | select(.key|startswith("artifact-")) | {key, last_accessed_at}'gh workflow run cache-keepalive.yml --ref <branch>gh run watch <run-id> --exit-statuslast_accessed_aton theartifact-*entries should have advanced past the run's start time. Summary shows a hit/miss table + totalsOut of scope (fenced)
scrape-pack.yml— it staysworkflow_dispatch-only. Keepalive is a separate workflow filedeadzone scrape/consolidate/dbreleaseinvocation — keepalive is cache-access onlyscrape-pack.yml— mirror, don't extract-and-share. DRY-ing the key into a reusable workflow or composite action is a separate issue if ever wanted; premature herehugot-model-…,ort-lib-…) and are kept warm by everyci.ymlpush; not the keepalive's concernscrape-packdispatch. This is by design; automated rescrape is Research: automated freshness detection and refresh triggers at Context7-scale #47's scope-listflag already ships via PR feat(ci): add scrape-pack workflow for batch registry scrapes #127Open sub-decisions for the implementer
upload-artifact(ruled out above). Two acceptable patterns:reportjob readsneeds.expand-libs.outputs.libsand echoes totals without per-slot data — keeps keepalive zero-side-effectcontinue-on-error: falsedefault + count non-failed slots — loses miss/hit distinction but simplerworkflow_dispatchinput filter: addinginputs.lib(likescrape-pack.yml) to scope the refresh to a single lib is optional. Skip unless trivial.