fix(provisioner): apt-get update before gated auto-installs (AL-37)#25
Merged
Conversation
The locales auto-install branch in 10-agent-user.sh ran `apt-get install -y locales` without first running `apt-get update`. On hosts where /var/lib/apt/lists/ is empty — freshly pulled Ubuntu containers, long-idle real machines — apt has no record of any package availability and the install fails with "Package locales has no installation candidate". Mirrors the canonical pattern at 30-nodejs.sh:33: `apt-get update` then `apt-get install`, both wrapped in DEBIAN_FRONTEND=noninteractive. Both calls remain inside the existing `if ! command -v locale-gen` gate so hosts where the package is already installed skip both steps (steady state for repeat installer runs and non-slim images). No `|| true` masking — strict-mode propagation must surface real apt failures (network, broken sources, GPG expiry) as installer ERR-trap hits with src:line attribution. Refs: AL-37 (this fix), AL-30 (introduced the auto-install pattern), AL-36 (dogfood image that surfaced the bug).
Same bug class as 10-agent-user.sh's locales install — the visudo auto-install gate at 20-sudoers.sh:45 ran `apt-get install -y sudo` without first running `apt-get update`. On hosts with an empty apt cache the install fails with "Package sudo has no installation candidate", aborting the installer before the agent gets passwordless sudo (BHV-07/INST-06). Fix lockstep with the locales fix in 10-agent-user.sh: insert `DEBIAN_FRONTEND=noninteractive apt-get update` before the existing apt-get install, both inside the `if ! command -v visudo` gate. The gate's no-op-when-already-installed property is preserved. The comment block already said "Mirror the pattern used by 10-agent-user.sh's locales install" — updated to reflect the now- matching apt-get update + apt-get install pair. Refs: AL-37 (this fix), AL-30 (introduced the auto-install pattern).
…n coverage
AL-37 fixed plugin/provisioner/10-agent-user.sh and 20-sudoers.sh to run
`apt-get update` before any gated auto-install. With the strategic fix
landed in the installer, the dogfood image can — and should — start with
an empty /var/lib/apt/lists/.
This converts AL-37's failure scenario into permanent test coverage: any
future provisioner that re-introduces an "auto-install missing package
without apt-get update" pattern will fail dogfood retests immediately,
because the dogfood image starts in the exact "empty apt cache" state
that triggers the bug class.
Drops the previous tactical workaround comment block that documented why
we couldn't clean lists yet ("Until that lands, keeping
/var/lib/apt/lists/ populated here is the tactical fix"); that now reads
as an active misstatement of intent. Replaced with a positive-framing
comment explaining why we DO clean lists.
Image-size cost is roughly 25 MB recovered from the previously-retained
package lists; not a goal but a small bonus.
Refs: AL-37 (the installer fix that made this safe), AL-36 (this file
was added as the minimal dogfood substrate).
Quick task artifacts: PLAN.md (target/acceptance/pitfalls), SUMMARY.md (what changed + why + end-to-end Docker validation transcript), and the STATE.md row under Quick Tasks Completed referencing the three feature commits f12c94f, 86d5cbf, 94c4064. Refs: AL-37.
qa-engineer review of AL-37 surfaced two small follow-ups:
1. **Named negative assertion at the bats layer (10-installer.bats).** AL-37
is now structurally exercised by tests/docker/Dockerfile.dogfood's empty-
cache start state, but the bats suite did not have a test that names the
canonical apt failure strings ("Package <name> has no installation
candidate" / "Unable to locate package"). Added one @test that greps the
installer log for either string and __fails with the matched lines if
found. Converts AL-37 from "absence-of-stack-trace" coverage into named
negative assertion — a future regression is visible in the bats output
(and in the behavior-coverage-auditor's grep trail) instead of only in
dogfood log scrollback.
2. **Comment-precision nit (10-agent-user.sh).** Original comment said
"Mirrors the canonical pattern at 30-nodejs.sh:33" but 30-nodejs runs
apt-get update unconditionally (Node always installed) while we are
gated on locale-gen absence. Tightened to "Mirrors the apt-get update→
install ordering at 30-nodejs.sh:33 (gated here on `command -v
locale-gen` absence, unconditional there ...)" — accurate without
adding load-bearing logic.
Both changes are inert on the happy path: the new @test is a refute_output
shape that fails only if the bug regresses; the comment update is text-
only. shellcheck + bash -n still clean.
Refs: AL-37.
Pass-2 review (qa-engineer + behavior-coverage-auditor): - TST-07 phase-close gate greps only the v0.3.0 REQ-ID prefixes (BHV / RT / AGT / CLI / CAT / INST / HRN / TST / DOC). A bare `AL-37:` @test name does not break the gate but is invisible to it — losing the auditor- visibility benefit pass-1 wanted. - Project precedent for citing a Jira tag on a bugfix-followup @test is to LEAD with the closest REQ-ID and ride the AL-XX as a parenthetical regression tag. See `tests/bats/60-curl-installer.bats:206`: `INST-03: resolve_version reads first-hop redirect (AL-31 regression guard)`. - INST-05 ("installer log contains no EACCES or 'permission denied'") is the natural REQ-ID parent: both bug classes manifest as installer-log lines the entrypoint did NOT trip on, both assert "log has no <bad- string>". Renamed @test + updated `__fail` ID arg + added comment block citing the 60-curl-installer precedent so future maintainers see the convention. Refs: AL-37, INST-05.
Inline ai-deslop review (the project subagent landed in master via b69c211 but isn't yet active in my harness — ran the rubric inline). The comment block I added in 617c894 was 10 lines for a 1-line code addition; the final sentence ("it also reproduces the real-world starting state of a freshly pulled Ubuntu container where /var/lib/apt/lists/ is empty until something runs apt-get update") restated information already conveyed earlier in the block. Trimmed to 7 lines, no information loss. Refs: AL-37.
…mment ai-deslop pass-3 (now with the registered subagent rather than my inline rubric run): the trailing parenthetical "(gated here on `command -v locale-gen` absence, unconditional there because nodejs is always installed by the AgentLinux installer)" is over-explanation. A reader following the `30-nodejs.sh:33` cross-reference sees the gated-vs- unconditional distinction in two seconds. This walks back part of the qa-engineer pass-1 nit in c70d297, which asked for the precision because "canonical pattern" overstated the relationship. The new "Mirrors the pattern at 30-nodejs.sh:33." (one line) is honest about the cross-reference without claiming canonical- pattern equivalence and without inlining the difference. Refs: AL-37.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes AL-37 — provisioner installer aborted with
Package locales has no installation candidate(andPackage sudo …symmetrically) on hosts with empty/var/lib/apt/lists/(freshly pulled Ubuntu containers, long-idle hosts).plugin/provisioner/10-agent-user.shand20-sudoers.sh— addedDEBIAN_FRONTEND=noninteractive apt-get updateinside the existingif ! command -v <X>; thenauto-install gates, before the existingapt-get install. Mirrors the apt-get update→install ordering at30-nodejs.sh:33(gated here on prereq absence; unconditional there because nodejs is always installed by the AgentLinux installer).tests/docker/Dockerfile.dogfood— appended&& rm -rf /var/lib/apt/lists/*to the apt-get install RUN; replaced the previous AL-37 tactical-workaround comment block with a positive-framing comment. Converts AL-37 into permanent regression coverage: any future provisioner that re-introduces anauto-install without apt-get updatepattern fails dogfood retests immediately.tests/bats/10-installer.bats— added one named negative-assertion@test(lead:INST-05;(AL-37 regression guard)parenthetical, matching the60-curl-installer.bats:206precedent forINST-03 + AL-31). Greps the installer log forhas no installation candidate | Unable to locate packageand__fails with the matched lines if present.Strict-mode contract preserved: no
|| truemasking on the newapt-get updatelines — real apt failures (network, broken sources, GPG expiry) trip the entrypoint's ERR trap with src:line attribution. Sourced-fragment contract preserved: no new strict-mode flags inside the provisioners (they inherit from the entrypoint per their file headers).Why now
Bug class introduced under AL-30 (the four-bug fix bundle that landed
command -vauto-install gates), surfaced by AL-36 (the minimal-prereqsDockerfile.dogfoodthat runs the curl-pipe-bash installer against a clean substrate). Earlier dogfood retests masked the bug because the manual setup recipe ranapt-get updatebefore installing curl, side-effecting a populated cache that the AgentLinux installer's laterapt-get installrode on.Test plan
End-to-end on the regression scenario. Built
agentlinux-dogfood-al37:24.04from the modifiedDockerfile.dogfood. Confirmed image starting state:0files in/var/lib/apt/lists/, nolocale-gen, novisudo— both auto-install gates will fire. Booted privileged systemd container (--privileged --cgroupns=host -e container=docker -v /sys/fs/cgroup:/sys/fs/cgroup:rw --tmpfs /run --tmpfs /tmp -v "$PWD":/workspace:ro), mounted the worktree, copied to/opt/agentlinux-src, and ran the LOCAL worktree installer. Provisioners 10 / 20 / 30 / 40 all reacheddonebanners. Captured installer log assertions:has no installation candidatematchesapt-get updateran (Get / Hit lines after each WARN)Setting up localesreachedlocale C.UTF-8 enforcedreachedSetting up sudoreachedwrote /etc/sudoers.d/agentlinuxreacheddoneStatic gates.
bash -nclean on both edited provisioner scripts;shellcheckclean on both;docker buildofDockerfile.dogfoodsucceeds againstubuntu:24.04;pre-commit run --files <changed>clean (shellcheck + shfmt + secrets + JSON / YAML lint all green) on both pre-rebase and post-rebase.Review feedback loop, three passes.
bash-engineer+security-engineer+qa-engineer. bash + security: clean, no findings. qa-engineer: two actionable items — (1) add a named negative-assertion@testfor the canonical apt failure strings; (2) tighten an imprecise comment in10-agent-user.sh("canonical pattern at 30-nodejs.sh:33" → "apt-get update→install ordering at 30-nodejs.sh:33 (gated here, unconditional there)"). Both applied inc70d297.qa-engineer+behavior-coverage-auditoron the new bats@test. Auditor: TST-07 phase-close gate greps only v0.3.0 REQ-IDs (BHV / RT / AGT / CLI / CAT / INST / HRN / TST / DOC);AL-XXis Jira-only namespace. Project precedent (60-curl-installer.bats:206) leads with the closest REQ-ID and rides(AL-XX regression guard)as a parenthetical. Renamed ina01cb0eto lead withINST-05(sibling of the existing no-EACCES gate; both assert "log has no "), AL-37 as parenthetical. Loop converged.ai-deslopsubagent landed in master viab69c211(AL-35) but isn't yet active in my harness. Ran the rubric inline against the diff; one finding:Dockerfile.dogfoodcomment block was 10 lines for a 1-line code addition with the last sentence restating information already conveyed earlier. Trimmed to 7 lines in3c69db0.CI. Will exercise the existing matrix (
tests/docker/run.shforubuntu-{22.04,24.04,26.04}+ nightlytests/qemu/). The newINST-05 (AL-37 regression guard)@testruns as part oftests/bats/10-installer.batsin everytests/docker/run.shinvocation.Refs
Dockerfile.dogfoodthat surfaced AL-37 with a clean substrate20-sudoers.sh)🤖 Generated with Claude Code