Problem
There is currently no way to take a backend host offline for a kernel/OS
patch without hard-stopping every tenant container running on it.
containarium capacity withdraw --drain gracefully stops workloads within a
bounded window, but it is scoped to BYOC-advertised spare capacity only, not
"every tenant on this host." internal/cmd/pool_leave.go already has a
comment admitting workload drain/migration for pool-leave is an unbuilt
follow-up. There is no container live-migration primitive in this codebase.
Net effect: patching a host kernel that needs a reboot means either
hard-stopping every tenant on that host, or skipping the patch. Auto-upgrade
timers are deliberately disabled fleet-wide ("manual patching only" —
terraform/gce/scripts/startup-sentinel.sh, startup-spot.sh), so this
isn't a hypothetical — it's the only path today.
Proposal
A general-purpose "drain a host for maintenance" primitive, independent of
the BYOC capacity-advertisement feature:
- Mark a backend
draining — stop scheduling new containers onto it.
- For each running container: attempt graceful stop (respecting any
in-flight work / bounded window, similar to the existing --drain-window
knob), or relocate via the existing cross-backend move_container path
where the workload supports it.
- Report per-container outcome (drained / relocated / force-stopped /
failed) so an operator knows the blast radius before rebooting.
- Only after the host reports empty does the maintenance/reboot proceed.
Why this matters
This is the structural gap behind the "host-kernel LPE reaches every tenant"
caveat in docs/security/SECURITY-FAQ.md — today there's no way to respond
to a kernel CVE without either accepting downtime for every tenant on a host
or leaving the vulnerability unpatched.
Related
docs/security/SECURITY-FAQ.md
docs/security/KERNEL-PATCH-RUNBOOK.md (companion runbook, references this
issue as the blocking gap)
internal/cmd/capacity.go (existing bounded-drain primitive to generalize)
internal/cmd/pool_leave.go (prior TODO admitting the same gap)
Problem
There is currently no way to take a backend host offline for a kernel/OS
patch without hard-stopping every tenant container running on it.
containarium capacity withdraw --draingracefully stops workloads within abounded window, but it is scoped to BYOC-advertised spare capacity only, not
"every tenant on this host."
internal/cmd/pool_leave.goalready has acomment admitting workload drain/migration for pool-leave is an unbuilt
follow-up. There is no container live-migration primitive in this codebase.
Net effect: patching a host kernel that needs a reboot means either
hard-stopping every tenant on that host, or skipping the patch. Auto-upgrade
timers are deliberately disabled fleet-wide ("manual patching only" —
terraform/gce/scripts/startup-sentinel.sh,startup-spot.sh), so thisisn't a hypothetical — it's the only path today.
Proposal
A general-purpose "drain a host for maintenance" primitive, independent of
the BYOC capacity-advertisement feature:
draining— stop scheduling new containers onto it.in-flight work / bounded window, similar to the existing
--drain-windowknob), or relocate via the existing cross-backend
move_containerpathwhere the workload supports it.
failed) so an operator knows the blast radius before rebooting.
Why this matters
This is the structural gap behind the "host-kernel LPE reaches every tenant"
caveat in
docs/security/SECURITY-FAQ.md— today there's no way to respondto a kernel CVE without either accepting downtime for every tenant on a host
or leaving the vulnerability unpatched.
Related
docs/security/SECURITY-FAQ.mddocs/security/KERNEL-PATCH-RUNBOOK.md(companion runbook, references thisissue as the blocking gap)
internal/cmd/capacity.go(existing bounded-drain primitive to generalize)internal/cmd/pool_leave.go(prior TODO admitting the same gap)