feat(ci): add scrape-pack workflow for batch registry scrapes by laradji · Pull Request #127 · laradji/deadzone

laradji · 2026-04-16T20:07:32Z

Summary

Adds a workflow_dispatch-only GitHub Actions workflow that scrapes every resolved lib in parallel on GH-hosted runners, consolidates the artifacts into a single deadzone.db, and optionally publishes it to a GitHub Release.

Design decisions are pinned in docs/research/batch-scrape-actions.md and tracked in #126.

Changes

.github/workflows/scrape-pack.yml — three-job pipeline:
- expand-libs — resolves libraries_sources.yaml into a JSON matrix via the new scrape -list flag
- scrape — matrix job (max-parallel: 20, fail-fast: false) with per-lib artifact cache keyed on libraries_sources.yaml + internal/embed/hugot.go hashes; uses upload-artifact as inter-job scratch transport (Pattern B — Pattern C via REST cache API is not buildable, see research doc §4)
- consolidate — runs always() on partial matrix failures, fetches staged artifacts, runs deadzone consolidate, fires deadzone dbrelease only when inputs.tag is non-empty, and writes a per-slot status table to $GITHUB_STEP_SUMMARY
cmd/deadzone/scrape.go — new -list flag emits the resolved {lib_id, version, slug} matrix as single-line JSON and short-circuits before embedder/agent setup (no model cache or network needed for listing)
justfile — scrape recipe now accepts version=X to pin to a single expanded version
README.md — documents the gh workflow run scrape-pack.yml -f tag=<tag> entry point

Concurrency & safety

concurrency: scrape-pack queues dispatches serially — parallel runs would fight over the same cache keys and dbrelease --clobber the same asset names
Empty tag input stops at the consolidated-db cache (no side effects on the releases page)
permissions: contents: write is scoped only to enable the release-asset write path

Test plan

gh workflow run scrape-pack.yml (no tag) completes and produces a consolidated-db cache + summary table
gh workflow run scrape-pack.yml -f lib=/hashicorp/terraform restricts the matrix to a single lib
gh workflow run scrape-pack.yml -f tag=vX.Y.Z uploads deadzone.db to the existing release vX.Y.Z
Induced scrape failure in one matrix slot still lets consolidate run and surfaces the slot as failed in the summary
just scrape lib=/org/project version=1.14 runs locally against the new justfile signature
go run ./cmd/deadzone scrape -list -config libraries_sources.yaml emits valid single-line JSON

Fixes #126

upload-artifact@v4 computes the archive root as the LCA of matched files. With path: artifacts/<slug>, the LCA collapsed to that dir and stripped the slug prefix from every entry, so every slot's artifact.db would have collided at the same root after download-artifact merge-multiple in consolidate — db.Consolidate would have seen only the last slot's payload. Upload/download now use path: artifacts/, and a per-slot artifacts/.pack-root sentinel pins the LCA explicitly so the anchor does not rely on artifacts/manifest.yaml happening to be present. db.Consolidate's <dir>/*/artifact.db glob ignores the sentinel.

Nacer Laradji added 2 commits April 16, 2026 21:06

feat-feat-githubworkflowsscrape-pack-5zk

d79574d

laradji merged commit 0e12bc2 into main Apr 16, 2026
4 checks passed

laradji deleted the code/feat-feat-githubworkflowsscrape-pack-5zk branch April 16, 2026 20:27

This was referenced Apr 17, 2026

Research: Batch scrape pipeline via GitHub Actions matrix #53

Closed

feat(ci): cache-keepalive workflow to prevent 7-day artifact-cache eviction #128

Open

chore(ci): bump Node 20 actions to v5+/v6+/v7+ before 2026-06-02 deadline #129

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ci): add scrape-pack workflow for batch registry scrapes#127

feat(ci): add scrape-pack workflow for batch registry scrapes#127
laradji merged 2 commits intomainfrom
code/feat-feat-githubworkflowsscrape-pack-5zk

laradji commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

laradji commented Apr 16, 2026

Summary

Changes

Concurrency & safety

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant