Skip to content

Fix shutdown data loss, bound untrusted metadata_size, harden Docker build#5

Merged
iksnerd merged 1 commit into
mainfrom
fix/review-2026-06-13
Jun 13, 2026
Merged

Fix shutdown data loss, bound untrusted metadata_size, harden Docker build#5
iksnerd merged 1 commit into
mainfrom
fix/review-2026-06-13

Conversation

@iksnerd

@iksnerd iksnerd commented Jun 13, 2026

Copy link
Copy Markdown
Owner

Three confirmed findings from the 2026-06-13 code review, verified by reading the code (not just agent output).

Changes

Shutdown data losscmd/tracker/main.go
The periodic flusher runs FlushToDB() + FlushUsers() + DrainBacklog() every tick, but the SIGTERM handler called only FlushToDB(). On Cloud Run (SIGTERM is the normal stop signal) the final usage deltas and any queued backlog were silently dropped when HUB_URL is configured. The flush also ran before srv.Shutdown(), so announces served during the 15s drain window mutated state after the final flush. Now: drain HTTP first, then mirror the full tick.

Unbounded allocation from untrusted metadata_sizeinternal/client/metadata.go
FetchMetadata did make([]byte, p.MetadataSize) where MetadataSize comes straight from a peer's BEP-10 extended handshake, guarded only by == 0. A malicious peer advertising a huge size triggers a memory-exhaustion DoS. Now capped at 16 MiB (far larger than any real info dict) and rejects <= 0. Regression test added.

ARG TARGETARCH empty under non-BuildKitDockerfile
Without BuildKit, TARGETARCH expands empty → litestream-v0.3.13-linux-.tar.gzcurl -f 404 → broken build on the documented docker build path. Defaults to amd64. Also folds in the in-progress multi-arch curl switch and the run.sh wait-loop change, both validated in review.

Verification

  • go build ./... clean, go vet ./... clean, gofmt clean
  • go test ./... — all packages pass, including the new TestFetchMetadataRejectsBadSize

Not included (tracked separately)

IPv6 peer parsing, wire.go cap comment, constant-time admin-key compare, wl get hex-decode error, and the v2 Merkle padding question (inconclusive — needs a Transmission 4.x golden vector before touching the padding logic).

🤖 Generated with Claude Code

…build

Three fixes from the 2026-06-13 code review:

- cmd/tracker: graceful shutdown only called FlushToDB(), dropping the
  FlushUsers() + DrainBacklog() steps the periodic flusher runs every tick,
  so final usage deltas and queued backlog were lost on SIGTERM (the normal
  Cloud Run stop signal). Now mirrors the tick, and flushes *after*
  srv.Shutdown() so announces served during the drain window are captured.

- internal/client: FetchMetadata allocated make([]byte, MetadataSize) from a
  peer-supplied (untrusted) value with only a == 0 check — a memory-exhaustion
  DoS. Cap at 16 MiB and reject <= 0. Adds a regression test.

- Dockerfile: ARG TARGETARCH had no default, so a non-BuildKit `docker build`
  expanded an empty arch into the litestream URL and 404'd. Default to amd64.
  (Includes the in-progress multi-arch curl + run.sh wait-loop changes the
  review validated.)
@iksnerd iksnerd merged commit 661d154 into main Jun 13, 2026
1 check passed
@iksnerd iksnerd deleted the fix/review-2026-06-13 branch June 13, 2026 10:51
iksnerd added a commit that referenced this pull request Jun 21, 2026
Fix shutdown data loss, bound untrusted metadata_size, harden Docker build
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant