Skip to content

Host drain-and-relocate workflow before kernel/OS patch reboot #889

Description

@hsinatfootprintai

Problem

There is currently no way to take a backend host offline for a kernel/OS
patch without hard-stopping every tenant container running on it.

containarium capacity withdraw --drain gracefully stops workloads within a
bounded window, but it is scoped to BYOC-advertised spare capacity only, not
"every tenant on this host." internal/cmd/pool_leave.go already has a
comment admitting workload drain/migration for pool-leave is an unbuilt
follow-up. There is no container live-migration primitive in this codebase.

Net effect: patching a host kernel that needs a reboot means either
hard-stopping every tenant on that host, or skipping the patch. Auto-upgrade
timers are deliberately disabled fleet-wide ("manual patching only" —
terraform/gce/scripts/startup-sentinel.sh, startup-spot.sh), so this
isn't a hypothetical — it's the only path today.

Proposal

A general-purpose "drain a host for maintenance" primitive, independent of
the BYOC capacity-advertisement feature:

  1. Mark a backend draining — stop scheduling new containers onto it.
  2. For each running container: attempt graceful stop (respecting any
    in-flight work / bounded window, similar to the existing --drain-window
    knob), or relocate via the existing cross-backend move_container path
    where the workload supports it.
  3. Report per-container outcome (drained / relocated / force-stopped /
    failed) so an operator knows the blast radius before rebooting.
  4. Only after the host reports empty does the maintenance/reboot proceed.

Why this matters

This is the structural gap behind the "host-kernel LPE reaches every tenant"
caveat in docs/security/SECURITY-FAQ.md — today there's no way to respond
to a kernel CVE without either accepting downtime for every tenant on a host
or leaving the vulnerability unpatched.

Related

  • docs/security/SECURITY-FAQ.md
  • docs/security/KERNEL-PATCH-RUNBOOK.md (companion runbook, references this
    issue as the blocking gap)
  • internal/cmd/capacity.go (existing bounded-drain primitive to generalize)
  • internal/cmd/pool_leave.go (prior TODO admitting the same gap)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestsecuritySecurity hardening / defensive features

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions