Firewall sibling drift gate misses image/binary/config changes

## Problem

`firewall.Stack.ensureContainer` (`internal/controlplane/firewall/stack.go:791-821`) decides whether to recreate Envoy/CoreDNS sibling containers based on **only two** drift labels (`driftLabels`, line 759-764):

- `dev.clawker.firewall.infra_certs_ready`
- `dev.clawker.firewall.otel_infra_port`

If both match the desired spec and the container is running, `ensureContainer` returns "already running" and no restart/recreate happens.

This is the same class of bug as the CP container drift gate (fixed in #300 via `LabelCPBinarySHA`), but the sibling gate is much narrower than the CP one.

## What drift labels DO NOT cover

A clawker CLI upgrade that changes any of the following will leave the **running** Envoy/CoreDNS containers serving stale state until something else forces a recreate:

1. **Envoy image digest** — `envoyImage` const at `stack.go:36` (pinned `envoyproxy/envoy:distroless-v1.37.1@sha256:...`). Bumping the SHA updates `ensureEnvoyImage` so the new image is pulled, but `ensureContainer` doesn't compare image digests against the running container → existing container keeps the old Envoy binary.
2. **Embedded CoreDNS binary** — `embed_coredns.go` ships `coredns-clawker` as a `//go:embed` asset. `ensureCorednsImage` rebuilds `coredns-clawker:latest` locally; image ID changes; existing container is not recreated.
3. **`envoy_config.go` / `coredns_config.go` template changes** — `ensureConfigs` rewrites `envoy.yaml` / `Corefile` to disk on every `EnsureRunning`, but Envoy/CoreDNS read at startup. Without a recreate (or `Reload`), the new files sit on disk while the processes serve the old in-memory config until the next rule mutation triggers `Stack.Reload`.
4. **`containerSpec.cmd` / `mounts` / `env` shape changes** — Docker preserves these from create-time. Not in the drift comparison.

## Why this is security-relevant

Firewall sibling containers ARE the egress enforcement plane. The CP drift gate exists precisely so a security fix to the CP can land via an ordinary `clawker run` after a CLI upgrade. The same expectation must hold for Envoy/CoreDNS — but today, an Envoy CVE bump (image digest), a CoreDNS dnsbpf plugin fix (embedded binary), or a deny-chain hardening (config template) won't reach users until either:

- a rule mutation happens (which calls `Stack.Reload` → `reloadContainer` → uses the same drift labels, same gap), or
- the user manually `docker rm`s the sibling containers, or
- one of the two existing drift labels (`infra_certs_ready`, `otel_infra_port`) happens to flip.

The CP container itself is correctly replaced on every binary change. Its first action on boot is to eventually `Stack.EnsureRunning` (via `FirewallInit`), which then short-circuits on the narrow label match. **Net effect: CP upgrades cleanly; the security-critical proxy + DNS resolver it manages do not.**

## Fix shape

Stamp `cpboot.cpBinaryHash()` (or a firewall-subset hash) as a third drift label on both siblings:

```go
const labelStackBuildSHA = "dev.clawker.firewall.stack_build_sha"

func (s *Stack) driftLabels() map[string]string {
    return map[string]string{
        labelInfraCertsReady: strconv.FormatBool(s.infraCertsReady),
        labelOtelInfraPort:   strconv.Itoa(...),
        labelStackBuildSHA:   stackBuildSHA,  // injected at NewStack
    }
}
```

Any change to the embedded CP binary → siblings recreate. Trade-off: every CP rebuild churns Envoy+CoreDNS even when only unrelated CP code changed. Acceptable — siblings are stateless, recreate is sub-second.

A tighter scope (hash only `firewall/` package + `embed_coredns.go` output + envoyImage const) would minimize churn but needs build-time tooling to compute. Start with the broad hash; narrow later if churn matters.

## Acceptance

- [ ] `driftLabels()` returns a third label keyed on a CP-binary-derived SHA (or narrower firewall-subset SHA)
- [ ] `ensureContainer` / `reloadContainer` emit `event=firewall_container_spec_drift` with the new label fields when the SHA mismatches
- [ ] Unit test: simulate SHA change between `EnsureRunning` calls, assert sibling containers are recreated
- [ ] No change to existing `infra_certs_ready` / `otel_infra_port` semantics

## Context

Surfaced during follow-up review of #300 (CP drift gate). Same conceptual gap, separate code path. See the resilience contract in `internal/controlplane/CLAUDE.md` — security-critical infrastructure must propagate updates via the same `clawker run` path users already trigger naturally.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Firewall sibling drift gate misses image/binary/config changes #308

Problem

What drift labels DO NOT cover

Why this is security-relevant

Fix shape

Acceptance

Context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Firewall sibling drift gate misses image/binary/config changes #308

Description

Problem

What drift labels DO NOT cover

Why this is security-relevant

Fix shape

Acceptance

Context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions