Skip to content

Enable user lingering so the systemd user bus survives sessions#73

Open
robobryce wants to merge 1 commit into
brycelelbach:mainfrom
robobryce:add/enable-user-linger
Open

Enable user lingering so the systemd user bus survives sessions#73
robobryce wants to merge 1 commit into
brycelelbach:mainfrom
robobryce:add/enable-user-linger

Conversation

@robobryce

Copy link
Copy Markdown
Contributor

What

Enable user lingering (loginctl enable-linger <user>) as a default host-setup step, in a new enable_user_linger function called from main() right after install_base_deps.

Why

AAB sets up hosts where an agent runs unattended across SSH sessions that come and go. Without lingering, the per-user systemd instance — and its bus at $XDG_RUNTIME_DIR/bus — is torn down when the login session ends, so the next session finds no user bus.

Workloads that wrap commands in systemd-run --user --scope need that bus. autocuda run slice caps each build's CPU/memory in a per-worker scope; with no user bus it exits 2 with user systemd bus not found at /run/user/<uid>/bus and points the operator at sudo loginctl enable-linger $USER. That makes every autocuda host re-run the same manual fix on first use. Doing it in the bootstrap removes the step — slice-mode builds work out of the box, while exclusive-mode commands (which don't need the bus) were already fine.

How

enable_user_linger is idempotent and degrades gracefully — it mirrors the sudo-awareness of install_base_deps / update_etc_environment:

  • no loginctl (bare container, no systemd): logs and skips.
  • already lingering: logs and returns without invoking sudo (quiet re-runs).
  • non-root without passwordless sudo: warns with the exact manual command, skips.
  • enable succeeds / fails: logs success, or warns with the manual command on failure.

Running as root or with passwordless sudo (the documented AAB requirement), it just enables lingering.

Test plan

  • ./test.bash (lint + unit) — shellcheck clean; 113/113 bats pass, including 5 new enable_user_linger tests (one per branch above):
    === lint ===
    === unit (bats) ===
    1..113
    ok 56 enable_user_linger skips cleanly when loginctl is unavailable
    ok 57 enable_user_linger is a no-op when lingering is already enabled
    ok 58 enable_user_linger enables lingering when it is off
    ok 59 enable_user_linger skips and warns when passwordless sudo is unavailable
    ok 60 enable_user_linger warns when enable-linger fails
    ...
    ok 113 main() runs load_config_file only when given a positional arg (unset env vars populated)
    
  • ./test.bash --docker — full bootstrap end-to-end in a fresh ubuntu:22.04 (no systemd), run + re-run for idempotency. The skip path fires on both runs and the new assertion Add unit tests and GitHub actions for CI #14 passes both times:
    [bootstrap] loginctl not available (no systemd); skipping user-linger setup.
    ...
    PASS: No systemd user manager; user-linger setup correctly skipped.
    All e2e assertions passed.
    === e2e passed ===
    === docker e2e passed ===
    
  • ./test.bash --secrets (gitleaks v8.18.4) — both history and working-tree passes clean:
    66 commits scanned.
    no leaks found
    ...
    no leaks found
    
  • Live systemd-host check — sourced bootstrap.bash and ran enable_user_linger directly on a host where lingering is already on (the no-op branch), plus assertion Add unit tests and GitHub actions for CI #14's positive branch:
    [bootstrap] User lingering already enabled for shadeform.
    PASS: User lingering enabled for shadeform.
    
    The off→on transition on a systemd host is covered hermetically by bats test Remove bootstrap inference smoke tests #58; exercising it live would mean destructively toggling the test host's linger state.
  • ./test.bash --smoke — N/A: no inference-path or launcher change; this PR only adds a host-setup step and its tests.

Docs

README.md updated in the same change: a new item in What It Sets Up (with the bare-container caveat) and a /var/lib/systemd/linger/<user> row in What the Script Touches.

Run `loginctl enable-linger` for the bootstrapping user as part of host
setup. Without it the per-user systemd instance — and its bus at
$XDG_RUNTIME_DIR/bus — is torn down when the login session ends, so an
agent reconnecting over a fresh SSH session finds no user bus.

Unattended workloads that wrap commands in `systemd-run --user --scope`
depend on that bus. autocuda's `run slice`, which caps each build's
CPU/memory in a per-worker scope, exits 2 with "user systemd bus not
found" and points the operator at `loginctl enable-linger` — making
every autocuda host re-run the same manual fix. Doing it here removes
that step.

enable_user_linger skips cleanly where the setup cannot or need not run:
no loginctl (bare container), lingering already on (idempotent re-run),
or a non-root user without passwordless sudo (warns with the manual
command). Covered by bats unit tests for every branch and an e2e
assertion whose skip conditions mirror the function's.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants