Skip to content

feat(ci): cache-keepalive workflow to prevent 7-day artifact-cache eviction #128

@laradji

Description

@laradji

Parent: none
Follow-up from: #53 (closed), #126 (closed via #127)
Composes with: #47 (freshness detection research — may supersede this when it lands)
Depends on: none (all prerequisites merged)

Decision (locked 2026-04-17)

Add a dedicated cron workflow whose sole job is to touch the per-lib artifact caches via actions/cache/restore@v5 so that their last_accessed_at stays fresh under GH Actions' 7-day inactivity eviction. The workflow must not invoke deadzone scrape, deadzone consolidate, or deadzone dbrelease — keepalive is cache-only.

This preserves docs/research/batch-scrape-actions.md decision #2 (workflow_dispatch only for scrape-pack.yml, no cron rescrape until #47 lands). The keepalive is orthogonal to freshness detection: it prevents eviction, it does not refresh content.

Why

GitHub Actions evicts cache entries not accessed in 7 days (and LRU-evicts past 10 GB/repo — not the binding constraint here). scrape-pack.yml is workflow_dispatch-only, so any operator silent for a week loses every artifact-<slug>-<version>-… entry and the next dispatch pays full cold-scrape cost, defeating the freshness-shim property of batch-scrape-actions.md decision #3.

Acceptance criteria

  • .github/workflows/cache-keepalive.yml exists with:
    • on.schedule: - cron: '0 4 * * 1,4' (Mon + Thu 04:00 UTC — max gap 4 days, well under the 7-day window with margin for cron lag / runner queueing)
    • on.workflow_dispatch: {} for manual retriggers
    • permissions: contents: read (no writes — keepalive never publishes)
    • concurrency: { group: cache-keepalive, cancel-in-progress: false }
  • Job expand-libs reuses the exact pattern from .github/workflows/scrape-pack.yml lines 55–78 (checkout, setup-go, install-native-deps, go run -tags ORT ./cmd/deadzone scrape -list -config libraries_sources.yaml) and emits libs as a job output
  • Job refresh:
    • needs: expand-libs
    • strategy: { matrix: { entry: ${{ fromJSON(needs.expand-libs.outputs.libs) }} }, fail-fast: false, max-parallel: 20 }
    • Each slot runs exactly these steps and nothing else:
      • actions/checkout@v6 (required for actions/cache path resolution)
      • actions/cache/restore@v5 with path + key mirrored verbatim from scrape-pack.yml lines 129–130. Inline the key string with a comment # Mirror of scrape-pack.yml L129-130 — drift breaks the keepalive (different key = different cache entry = no refresh)
      • A Record hit/miss bash step that writes hit or miss to $GITHUB_STEP_SUMMARY based on steps.<cache-step-id>.outputs.cache-hit
    • No install-native-deps, no embedder/ORT cache restore, no deadzone invocation, no upload-artifact
  • A markdown table in $GITHUB_STEP_SUMMARY with columns lib | version | status (hit / miss) plus a totals line N hit, M miss. A miss is not a workflow failure — it just means that lib will be fully rescraped on the next operator dispatch
  • docs/research/batch-scrape-actions.md §3 gains a new sub-section Cache keepalive (≤15 lines) documenting:
  • CLAUDE.mdBuild & run section gains one line: Cache keepalive: '.github/workflows/cache-keepalive.yml' refreshes artifact caches Mon+Thu to stay under GHA's 7-day eviction. A miss on cold libs is not a failure — next operator dispatch of 'scrape-pack' recovers.
  • First post-merge manual dispatch on main succeeds and the summary reports ≥1 hit (assuming the caches seeded by runs 24560380908 + 24560472614 are still within their 7-day window — if they've aged out, 0 hit, N miss is still a pass)

Code skeleton (sketch — finalize in implementation)

name: cache-keepalive

on:
  schedule:
    - cron: '0 4 * * 1,4'  # Mon + Thu 04:00 UTC, max gap 4 days vs 7-day GHA eviction
  workflow_dispatch: {}

permissions:
  contents: read

concurrency:
  group: cache-keepalive
  cancel-in-progress: false

jobs:
  expand-libs:
    runs-on: ubuntu-latest
    outputs:
      libs: ${{ steps.list.outputs.libs }}
    steps:
      - uses: actions/checkout@v6
      - uses: actions/setup-go@v6
        with: { go-version-file: go.mod }
      - name: Install native deps
        uses: ./.github/actions/install-native-deps
      - id: list
        shell: bash
        run: |
          set -euo pipefail
          libs="$(go run -tags ORT ./cmd/deadzone scrape -list -config libraries_sources.yaml)"
          echo "libs=$libs" >> "$GITHUB_OUTPUT"

  refresh:
    needs: expand-libs
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      max-parallel: 20
      matrix:
        entry: ${{ fromJSON(needs.expand-libs.outputs.libs) }}
    steps:
      - uses: actions/checkout@v6
      - name: Touch artifact cache
        id: touch
        # Mirror of scrape-pack.yml L129-130 — drift breaks the keepalive
        # (different key = different cache entry = no refresh).
        uses: actions/cache/restore@v5
        with:
          path: artifacts/${{ matrix.entry.slug }}
          key: artifact-${{ matrix.entry.slug }}-${{ matrix.entry.version }}-${{ hashFiles('libraries_sources.yaml') }}-${{ hashFiles('internal/embed/hugot.go') }}
      - name: Record hit/miss
        shell: bash
        run: |
          set -euo pipefail
          status="miss"; [ "${{ steps.touch.outputs.cache-hit }}" = "true" ] && status="hit"
          echo "| \`${{ matrix.entry.lib_id }}\` | \`${{ matrix.entry.version }}\` | $status |" >> "$GITHUB_STEP_SUMMARY"
          # Totals are rendered by a final `report` job that aggregates per-matrix notices —
          # implementer's choice on pattern (needs.refresh.outputs via matrix isn't trivial;
          # a final job that re-reads the needs.expand-libs.libs JSON + the action log is fine).

Concrete file pointers

Files to create:

  • .github/workflows/cache-keepalive.yml

Files to modify:

  • docs/research/batch-scrape-actions.md (§3 sub-section — ≤15 lines)
  • CLAUDE.md (1 line in Build & run)

Files to read as reference — do NOT refactor:

  • .github/workflows/scrape-pack.yml lines 55–78 (expand-libs pattern), 129–130 (artifact cache key)
  • internal/packs/paths.gopacks.Slug is used transitively via the JSON emitted by -list
  • cmd/deadzone/scrape.go-list flag already ships; no new Go surface needed

Test commands (literal, for agent self-check)

  • mise exec -- go build -tags ORT ./... — builds clean (should, since no Go changes)
  • Pre-run cache audit: gh api repos/laradji/deadzone/actions/caches --jq '.actions_caches[] | select(.key|startswith("artifact-")) | {key, last_accessed_at}'
  • After push, trigger manually: gh workflow run cache-keepalive.yml --ref <branch>
  • Watch: gh run watch <run-id> --exit-status
  • Post-run: re-run the cache audit — last_accessed_at on the artifact-* entries should have advanced past the run's start time. Summary shows a hit/miss table + totals
  • Verify no entry in the caches list has changed content (size / key should match pre-run) — keepalive must not resave the cache

Out of scope (fenced)

Open sub-decisions for the implementer

  • Totals reporting shape: aggregating hit/miss counters across a matrix is awkward without upload-artifact (ruled out above). Two acceptable patterns:
    1. A final report job reads needs.expand-libs.outputs.libs and echoes totals without per-slot data — keeps keepalive zero-side-effect
    2. Use the matrix's continue-on-error: false default + count non-failed slots — loses miss/hit distinction but simpler
    • Implementer picks. Per-slot row in the markdown summary is the hard requirement; the totals line is nice-to-have.
  • workflow_dispatch input filter: adding inputs.lib (like scrape-pack.yml) to scope the refresh to a single lib is optional. Skip unless trivial.

Metadata

Metadata

Assignees

Labels

P2Normal — clear value, not urgentfeatureNew feature

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions