feat(ci): add scrape-pack workflow for batch registry scrapes#127
Merged
feat(ci): add scrape-pack workflow for batch registry scrapes#127
Conversation
added 2 commits
April 16, 2026 21:06
upload-artifact@v4 computes the archive root as the LCA of matched files. With path: artifacts/<slug>, the LCA collapsed to that dir and stripped the slug prefix from every entry, so every slot's artifact.db would have collided at the same root after download-artifact merge-multiple in consolidate — db.Consolidate would have seen only the last slot's payload. Upload/download now use path: artifacts/, and a per-slot artifacts/.pack-root sentinel pins the LCA explicitly so the anchor does not rely on artifacts/manifest.yaml happening to be present. db.Consolidate's <dir>/*/artifact.db glob ignores the sentinel.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
workflow_dispatch-only GitHub Actions workflow that scrapes every resolved lib in parallel on GH-hosted runners, consolidates the artifacts into a singledeadzone.db, and optionally publishes it to a GitHub Release.Design decisions are pinned in
docs/research/batch-scrape-actions.mdand tracked in #126.Changes
.github/workflows/scrape-pack.yml— three-job pipeline:expand-libs— resolveslibraries_sources.yamlinto a JSON matrix via the newscrape -listflagscrape— matrix job (max-parallel: 20,fail-fast: false) with per-lib artifact cache keyed onlibraries_sources.yaml+internal/embed/hugot.gohashes; usesupload-artifactas inter-job scratch transport (Pattern B — Pattern C via REST cache API is not buildable, see research doc §4)consolidate— runsalways()on partial matrix failures, fetches staged artifacts, runsdeadzone consolidate, firesdeadzone dbreleaseonly wheninputs.tagis non-empty, and writes a per-slot status table to$GITHUB_STEP_SUMMARYcmd/deadzone/scrape.go— new-listflag emits the resolved{lib_id, version, slug}matrix as single-line JSON and short-circuits before embedder/agent setup (no model cache or network needed for listing)justfile—scraperecipe now acceptsversion=Xto pin to a single expanded versionREADME.md— documents thegh workflow run scrape-pack.yml -f tag=<tag>entry pointConcurrency & safety
concurrency: scrape-packqueues dispatches serially — parallel runs would fight over the same cache keys anddbrelease --clobberthe same asset namestaginput stops at the consolidated-db cache (no side effects on the releases page)permissions: contents: writeis scoped only to enable the release-asset write pathTest plan
gh workflow run scrape-pack.yml(no tag) completes and produces a consolidated-db cache + summary tablegh workflow run scrape-pack.yml -f lib=/hashicorp/terraformrestricts the matrix to a single libgh workflow run scrape-pack.yml -f tag=vX.Y.Zuploadsdeadzone.dbto the existing releasevX.Y.Zconsolidaterun and surfaces the slot asfailedin the summaryjust scrape lib=/org/project version=1.14runs locally against the new justfile signaturego run ./cmd/deadzone scrape -list -config libraries_sources.yamlemits valid single-line JSONFixes #126