Skip to content

Conversation

@dcoppa
Copy link

@dcoppa dcoppa commented Dec 30, 2025

Description

This PR implements graceful shutdown support for chproxy, allowing the proxy to properly handle SIGTERM and SIGINT signals and wait for active connections to complete before terminating.

Motivation: When chproxy is deployed in orchestrated environments like Kubernetes, abrupt termination can kill in-flight queries, leading to failed client requests and incomplete data operations. This implementation ensures clean shutdowns by waiting for active connections to complete (or timeout) before process termination.

Key changes:

  • Signal handling using signal.NotifyContext for SIGTERM/SIGINT
  • Connection tracking via http.Server.ConnState callback with atomic counter
  • Configurable graceful_shutdown_timeout (default: 25s, fits within K8s 30s grace period)
  • Concurrent shutdown of HTTP and HTTPS servers
  • Resource cleanup (caches, heartbeat goroutines, idle connections)
  • Debug logging for connection lifecycle
  • Test coverage for shutdown scenarios

Example shutdown flow:

INFO: Shutdown signal received
INFO: Starting graceful shutdown with 3 open connections
INFO: Shutting down HTTP server...
DEBUG: Connection closed from 127.0.0.1:41826 (active: 2)
DEBUG: Connection closed from 127.0.0.1:41832 (active: 1)
DEBUG: Connection closed from 127.0.0.1:41836 (active: 0)
INFO: HTTP server stopped
INFO: Closing proxy resources...
INFO: Graceful shutdown completed successfully (all connections closed)

Pull request type

Please check the type of change your PR introduces:

  • Bugfix
  • Feature
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • Documentation content changes
  • Other (please describe):

Checklist

  • Linter passes correctly
  • Add tests which fail without the change (if possible)
  • All tests passing
  • Extended the README / documentation, if necessary

Does this introduce a breaking change?

  • Yes
  • No

Further comments

Implementation Highlights:

  1. Connection Tracking: Uses http.Server.ConnState callback. This correctly tracks TCP connections and handles HTTP keep-alive.

  2. Concurrent Server Shutdown: HTTP and HTTPS servers shut down in parallel using sync.WaitGroup for efficiency.

  3. Thread Safety:

    • Atomic operations for connection counter
    • Consistent lock ordering (configLock -> lock) to prevent deadlocks
    • Safe channel close pattern for reloadSignal
  4. Error Handling: Errors collected via buffered channel and aggregated with errors.Join

  5. Testing Strategy: Uses subprocess pattern to test shutdown behavior in isolation

Files Changed:

  • main.go: Signal handling, connection tracking, graceful shutdown logic
  • proxy.go: Resource cleanup (caches, heartbeat goroutines, transport)
  • config/config.go: Added GracefulShutdownTimeout field
  • main_test.go: Two comprehensive test cases (normal shutdown + timeout)
  • Documentation: English, Chinese, and configuration reference

Testing:

# Run tests
go test -v -run TestGracefulShutdown

# Manual test
curl -u default:password http://localhost:9090 -d "SELECT sleep(10)" &
pkill -TERM chproxy  # Observe it waits for query completion

Add graceful shutdown support to properly handle SIGTERM and SIGINT
signals, allowing active connections to complete before termination.

Key changes:
- Track active HTTP connections via ConnState callback
- Implement configurable graceful_shutdown_timeout (default: 25s)
- Concurrently shutdown HTTP and HTTPS servers
- Wait for in-flight requests to complete or timeout
- Clean up proxy resources (caches, heartbeat goroutines)
- Add test coverage for shutdown scenarios
- Add debug logging for connection lifecycle events

The 25s default timeout fits within Kubernetes' 30s grace period,
ensuring pods can terminate cleanly without killing active queries.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant