This setup includes a lightweight Python alert_watcher service that tails Nginx logs to provide real-time Slack alerts for failovers and high error rates.
-
docker-compose.yml: Defines and runs all four services:app_blue,app_green,nginx_proxy, andalert_watcher. It's responsible for orchestrating the network and the newnginx-logsshared volume. -
nginx.conf.template: The Nginx configuration. It defines a custom JSON log format (json_logs) that captures detailed upstream data and writes it to the shared log file. -
nginx-init.sh: The Nginx startup script. It deletes the default log stream and create a realaccess.logfile, which is required for the Python script to be able to "tail" it. -
watcher.py: The heart of the project. A Python "sidecar" script that runs in its own container, continuously reads theaccess.logfile, and maintains the state of the system (current pool, error rate) to send alerts to Slack. -
requirements.txt: Lists the single Python dependency (requests) needed bywatcher.pyto send HTTP POST requests to the Slack webhook. -
.env.example: A template file that lists all required environment variables, includingSLACK_WEBHOOK_URL,ERROR_RATE_THRESHOLD, and other watcher settings. -
runbook.md: An operator's guide. It explains what each Slack alert means and provides clear, actionable steps for an engineer to take when a failover or error rate alert is received.
These instructions assume you are running on a remote cloud server (e.g., AWS EC2).
-
SSH into the Cloud Server and Clone the Repository
ssh -i ~/.ssh/[your-key-pair] username@[YOUR-IP-ADDRESS/HOSTNAME] git clone https://github.com/cf-cloud89/nginx-blue-green-observability.git cd your-repo-name
-
Install Git, Docker, and Docker Compose You must have Docker and Docker Compose installed. On modern Linux systems, this is often installed as a Docker plugin.
- Follow the official Docker install instructions for your Linux distribution.
- Ensure you install the
docker-compose-plugin(ordocker compose). - Add your user to the
dockergroup to avoid usingsudofor every command:sudo usermod -aG docker $USER - Important: You must log out and log back in for this change to take effect.
-
Get a Slack Webhook: Follow the official Slack guide to create an "Incoming Webhook" URL.
-
Create
.env: Copy.env.exampleto.env. -
Edit
.env: Paste your Slack Webhook URL intoSLACK_WEBHOOK_URL="...". -
Make Init Script Executable: The
nginx-init.shscript must have execute permissions to run inside the container.chmod +x nginx-init.sh
-
Start the System:
- This will start 4 containers: blue, green, nginx, and the watcher
- Run Docker Compose in detached (
-d) mode. (Usesudoif you didn't add your user to thedockergroup in no2 of the Setup Steps).
sudo docker compose up -d
-
Firewall Prerequisite Before you can test, you must open ports in your cloud provider's firewall (e.g., AWS Security Group, GCP Firewall).
You need to allow inbound TCP traffic on the following ports:
8080(for the Nginx proxy)8081(for the Blue app's chaos endpoint)8082(for the Green app's chaos endpoint)
-
Verify Nginx Logs:
- First, send a test request:
curl http://[YOUR_SERVER_IP]:8080/version - Now, check the Nginx logs. You should see the new JSON format.
-
sudo docker logs nginx_proxy
- Look for the
access_logline at the very bottom. It will be a long JSON string. (You can alsocatthe file inside the watcher:sudo docker exec alert_watcher cat /var/log/nginx/access.log).
- First, send a test request:
-
Test 1: Failover Alert
- Induce Chaos:
curl -X POST http://[YOUR_SERVER_IP]:8081/chaos/start?mode=error - Send a request:
curl http://[YOUR_SERVER_IP]:8080/version
- Check Slack: Within seconds, you should receive a "FAILOVER DETECTED" alert.
- Test Recovery:
curl -X POST http://[YOUR_SERVER_IP]:8081/chaos/stop
- Wait ~5 seconds (for
fail_timeout+ buffer), then send another request:curl http://[YOUR_SERVER_IP]:8080/version
- Check Slack: You should receive a "RECOVERY" alert.
- Induce Chaos:
-
Test 2: High Error Rate Alert
- This alert requires filling the 200-request window.
- Induce Chaos (again):
curl -X POST http://[YOUR_SERVER_IP]:8081/chaos/start?mode=error - Run this loop from your terminal to quickly send > 200 requests. (This will take a few seconds):
for i in {1..250}; do curl -s -o /dev/null http://[YOUR_SERVER_IP]:8080/; done
- Check Slack: The watcher's log window is now full of 5xx errors. You should receive a "High Error Rate" alert.
- Clean up:
curl -X POST http://[YOUR_SERVER_IP]:8081/chaos/stop