This document outlines the security posture, known gaps, and remediation plan for the N8N deployment infrastructure. It covers VPS access control, secrets management, container security, network exposure, and operational hardening.
Last audited: 2026-04-02 (all findings resolved)
| # | Finding | Status |
|---|---|---|
| C1 | All CI/CD and manual access uses root SSH — no privilege separation |
Resolved — deployer + automat users created, root SSH disabled |
| C2 | Hardcoded Cloudflare tunnel ID in source-controlled files | Resolved — switched to token-based auth |
| C3 | .env contains live Cloudflare credentials (token, tunnel secret, account tag) |
Resolved — old credentials rotated, new tunnel created |
| # | Finding | Status |
|---|---|---|
| H1 | SSH private key written to CI runner disk without cleanup | Resolved — if: always() cleanup step added to all workflows |
| H2 | CloudFlared metrics bound to 0.0.0.0:2000 (all interfaces) |
Resolved — container has no host ports; only reachable within Docker network |
| H3 | CloudFlared running with --loglevel debug in production |
Resolved — removed debug flag |
| H4 | Ollama CORS set to * (accepts any origin) |
Resolved — restricted to internal consumers |
| H5 | Traefik dashboard exposed without authentication | Resolved — basic auth middleware added (defense-in-depth with Cloudflare Access) |
| H6 | Grafana admin password defaults to admin if env var unset |
Resolved — strong password set via env var |
| H7 | Docker socket mounted directly to Traefik container | Resolved — replaced with tecnativa/docker-socket-proxy |
| # | Finding | Status |
|---|---|---|
| M1 | cAdvisor runs with SYS_ADMIN capability and apparmor:unconfined |
Accepted risk — required by cAdvisor for cgroup/filesystem metrics |
| M2 | No resource limits on Ollama, Traefik, CloudFlared, Grafana, Prometheus | Resolved — limits added to all containers |
| M3 | Metrics basic auth hash hardcoded in compose file | Resolved — moved to ${METRICS_AUTH} env var |
| M4 | PostgreSQL passwords passed as env vars (visible via docker inspect) |
Accepted risk — mitigated with chmod 600 on compose files; Docker secrets adds complexity for single-host |
| M5 | No rate limiting configured on any Traefik route | Resolved — global rate-limit middleware (100 req/s, 200 burst) with IP strategy |
| M6 | No fail2ban or SSH brute-force protection on VPS | Resolved — fail2ban installed with sshd + recidive jails |
| M7 | No centralized log aggregation | Resolved — Loki + Promtail added to monitoring stack |
| M8 | Backup files created without explicit file permissions | Resolved — explicit chmod 600/700 after backup creation |
| M9 | Host cert mounted from /root/.cloudflared/cert.pem |
Resolved — eliminated by token-based auth migration |
Separate automated deployment access from interactive admin access. Never use root directly over SSH.
Purpose: Automated deployments via GitHub Actions. Restricted to only the commands needed to deploy and manage Docker services.
# Create the user
useradd -m -s /bin/bash deployer
mkdir -p /home/deployer/.ssh
chmod 700 /home/deployer/.ssh
# Add the CI/CD public key (generate a NEW keypair for this user)
echo "<deployer-public-key>" > /home/deployer/.ssh/authorized_keys
chown -R deployer:deployer /home/deployer/.ssh
chmod 600 /home/deployer/.ssh/authorized_keys
# Grant scoped sudo permissions
cat > /etc/sudoers.d/deployer << 'SUDOEOF'
# Docker operations
deployer ALL=(root) NOPASSWD: /usr/bin/docker
deployer ALL=(root) NOPASSWD: /usr/bin/docker compose *
# Deployment directory management
deployer ALL=(root) NOPASSWD: /bin/mkdir -p /opt/n8n-*
deployer ALL=(root) NOPASSWD: /bin/ln -sfn /opt/n8n-*/releases/* /opt/n8n-*/current
deployer ALL=(root) NOPASSWD: /bin/tar -xzf /tmp/deployment.tar.gz *
deployer ALL=(root) NOPASSWD: /bin/chown -R deployer\:deployer /opt/n8n-*
deployer ALL=(root) NOPASSWD: /bin/rm -rf /opt/n8n-*/releases/*
# Service management
deployer ALL=(root) NOPASSWD: /bin/systemctl restart docker
deployer ALL=(root) NOPASSWD: /usr/bin/ufw status
SUDOEOF
chmod 440 /etc/sudoers.d/deployer
# Transfer ownership of deployment directories
chown -R deployer:deployer /opt/n8n-v2
chown -R deployer:deployer /opt/n8n-production 2>/dev/null || truePurpose: Manual administration, troubleshooting, general-purpose tasks. Has full sudo privileges but requires your personal SSH key and password for sudo.
# Create your personal admin user
useradd -m -s /bin/bash <your-username> -G sudo
passwd <your-username>
# Add your personal SSH public key
mkdir -p /home/<your-username>/.ssh
echo "<your-personal-public-key>" > /home/<your-username>/.ssh/authorized_keys
chown -R <your-username>:<your-username> /home/<your-username>/.ssh
chmod 700 /home/<your-username>/.ssh
chmod 600 /home/<your-username>/.ssh/authorized_keysWith this account you can run sudo -i to get a root shell when needed. The difference from logging in as root directly:
- SSH logs show your username, not just "root"
- If your key is compromised, the attacker still needs your sudo password
- You can disable this account independently without affecting deployments
After both accounts are verified working:
# /etc/ssh/sshd_config
PermitRootLogin no
PasswordAuthentication no
KbdInteractiveAuthentication no
systemctl restart sshImportant: Test SSH access with your personal account in a separate terminal BEFORE closing your current root session.
Note: With PasswordAuthentication no, SSH brute-force attacks are immediately rejected without allowing password attempts. This eliminates the attack surface entirely — attackers need a valid private key to even begin authentication.
| Secret | Purpose | Format | Rotation |
|---|---|---|---|
VPS_SSH_KEY |
SSH private key for deployer user (NOT root) |
PEM key | Quarterly or on suspicion |
PRODUCTION_VPS_HOST |
Production VPS hostname/IP | Hostname/IP | On infrastructure change |
STAGING_VPS_HOST |
Staging VPS hostname/IP (optional) | Hostname/IP | On infrastructure change |
CLOUDFLARE_TUNNEL_TOKEN |
Tunnel token (replaces credentials JSON + tunnel ID) | JWT token | Quarterly or on suspicion |
CLOUDFLARE_API_TOKEN |
API token with DNS:Edit permissions only | Token string | Quarterly or on suspicion |
DOMAIN_NAME |
Primary domain name | Domain string | On domain change |
TRAEFIK_DASHBOARD_AUTH |
Traefik dashboard basic auth | Raw bcrypt hash (single $) |
Quarterly |
METRICS_AUTH |
Traefik metrics endpoint basic auth | Raw bcrypt hash (single $) |
Quarterly |
SLACK_WEBHOOK_URL |
Notification webhook (optional) | URL | On rotation |
DISCORD_WEBHOOK_URL |
Notification webhook (optional) | URL | On rotation |
Important: TRAEFIK_DASHBOARD_AUTH and METRICS_AUTH must be stored with single $ signs (raw htpasswd -nB output). The deploy workflow automatically doubles them for Docker Compose. Do NOT store pre-doubled $$ values.
- Navigate to your GitHub repository
- Go to Settings > Secrets and variables > Actions
- Click "New repository secret"
- Add each required secret with the appropriate value
Rotate credentials immediately if:
- Credentials are accidentally exposed in code, logs, or conversation context
- A team member with access leaves the organization
- Suspicious activity is detected
- As part of regular security maintenance (quarterly recommended)
Tunnel token (replaces old credentials JSON + tunnel ID + tunnel secret):
- Log into Cloudflare Zero Trust dashboard: https://one.dash.cloudflare.com/
- Go to Networks > Tunnels
- Delete the existing tunnel (or create a new one alongside for zero-downtime rotation)
- Create a new tunnel > name it > choose "cloudflared"
- Copy the tunnel token from the provided docker command
- Update GitHub secret:
CLOUDFLARE_TUNNEL_TOKEN - Update local
.envwith the new token (never commit this file) - Deploy to verify the new tunnel connects
- Delete the old tunnel if it still exists
API token (for DNS management scripts):
- Go to https://dash.cloudflare.com/profile/api-tokens
- Create Token > use "Edit zone DNS" template (scope to your zone only)
- Update GitHub secret:
CLOUDFLARE_API_TOKEN - Test DNS scripts to verify functionality
- Delete the old token from the same page
Note: ACCOUNT_TAG is your Cloudflare account identifier (not a secret, not rotatable). It is safe to keep in configuration but should not be in source-controlled files.
- Generate a new keypair:
ssh-keygen -t ed25519 -C "deployer@github-actions" - Add the new public key to
deployer'sauthorized_keyson VPS - Update the
VPS_SSH_KEYGitHub secret with the new private key - Test a deployment
- Remove the old public key from
authorized_keys
After any rotation, trigger a deployment (staging first) and verify:
- CloudFlare tunnel connects successfully
- DNS records resolve correctly
- All services pass health checks
- Monitoring stack reports healthy
The following files are excluded from git tracking via .gitignore:
edge/cloudflared/*.json— Tunnel credential files.envand*.env— Environment files with secrets*.pem,*.key,*.crt— Certificate and key files*.sqlite,*.sqlite3— Database files
NEVER commit the following to the repository:
- API tokens or keys
- SSH private keys
- Database passwords
- SSL/TLS certificates or private keys
- Tunnel credential JSON files
- Any file containing sensitive credentials
Periodically verify no secrets have entered git history:
# Check if .env was ever committed
git log --all -p -- .env
# Search for known secret patterns
git log --all -S "CLOUDFLARE_TOKEN" --oneline
git log --all -S "TUNNEL_SECRET" --onelineIf secrets are found in history, use git filter-repo to purge them and force-push.
| Network | Purpose | External Access |
|---|---|---|
edge |
Traefik + CloudFlared reverse proxy | Via Cloudflare tunnel only |
ai-internal |
Ollama, Qdrant, PostgreSQL | None (internal only) |
monitoring |
Prometheus, Grafana, AlertManager | Via Cloudflare tunnel only |
All external traffic is routed through the Cloudflare tunnel. No ports are exposed directly on the VPS host.
- CloudFlared: Metrics on
0.0.0.0:2000— safe (no host ports, only reachable within Docker network) - CloudFlared: Set
--loglevel info(notdebug) - Ollama: Restrict
OLLAMA_ORIGINSto known consumers (not*) - Traefik: Add authentication middleware to dashboard route
- Traefik: Use a Docker socket proxy instead of direct socket mount
- Grafana: Remove default password fallback (
:-admin) - All services: Add
deploy.resources.limitsfor memory and CPU - Compose: Switched to token-based auth (no more credentials JSON or tunnel ID in compose)
All containers have explicit resource limits:
| Container | Memory | CPUs |
|---|---|---|
| Traefik | 256M | 0.5 |
| CloudFlared | 256M | 0.3 |
| Docker Socket Proxy | 128M | 0.2 |
| Prometheus | 1G | 0.5 |
| Grafana | 512M | 0.5 |
| AlertManager | 128M | 0.2 |
| Loki | 512M | 0.5 |
| Promtail | 256M | 0.3 |
| Node Exporter | 128M | 0.2 |
| cAdvisor | 256M | 0.3 |
| N8N (template) | 1G | 1.0 |
| PostgreSQL (template) | 512M | 0.5 |
| Ollama (template) | 8G | 4.0 |
| Qdrant (template) | 2G | 1.0 |
| Generic App (template) | 512M | 0.5 |
Two layers of protection: subnet-level blocks for known abuse networks (dropped at kernel level), and SSH-only access for everything else.
ufw default deny incoming
ufw default allow outgoing
ufw allow 22/tcp
ufw enableAll HTTP/HTTPS traffic reaches services exclusively via the Cloudflare tunnel (which originates outbound connections from the VPS), so ports 80/443 do not need to be open.
Known bulletproof hosting providers and scanning networks are permanently blocked at the firewall. These rules are inserted before the SSH allow rule so packets are dropped before reaching fail2ban or SSH.
| Subnet | Operator | Reason | Added |
|---|---|---|---|
92.118.39.0/24 |
DMZHOST (NL) | Persistent SSH scanning, 5000+ attempts | 2026-04-06 |
2.57.122.0/24 |
DMZHOST / TECHOFF SRV (NL) | Persistent SSH scanning, coordinated with above | 2026-04-06 |
195.178.110.0/24 |
TECHOFF SRV LIMITED (GB/AD) | Repeat offender, same abuse contact | 2026-04-06 |
45.148.10.0/24 |
DMZHOST (AD) | Same operator, same abuse pattern | 2026-04-06 |
When you observe persistent attackers in the Grafana security dashboard or fail2ban logs, follow this process to evaluate and block them:
Step 1 — Identify repeat offenders
# Find IPs that have been banned multiple times
sudo grep "Ban" /var/log/fail2ban.log | grep -oP '\d+\.\d+\.\d+\.\d+' | sort | uniq -c | sort -rn | head -15
# Check currently banned IPs
sudo fail2ban-client status sshdStep 2 — Investigate the source network
# Look up the IP's network owner
whois <IP> | grep -iE "org|net|descr|country|abuse"
# For RIPE-managed IPs (European/Middle East), query RIPE directly
whois -h whois.ripe.net <IP> | grep -iE "org|net|descr|country|abuse|inetnum"Red flags that indicate a block-worthy network:
- Gmail or disposable email for abuse contact
- "Bulletproof" hosting providers (DMZHOST, TECHOFF, etc.)
- Multiple IPs from the same /24 appearing in your ban list
- Organization registered in one country, operating in another
- Network description is just a URL or empty
- Same
mnt-refororgacross multiple attacking subnets
Step 3 — Verify the subnet scope
# Check the inetnum range — only block what they own
whois -h whois.ripe.net <IP> | grep "inetnum"
# Example output: inetnum: 92.118.39.0 - 92.118.39.255 → block 92.118.39.0/24Step 4 — Apply the block
# Insert before the SSH allow rule (position 1)
sudo ufw insert 1 deny from <SUBNET>/24 to any comment "<OPERATOR> - abuse network"
# Verify rule order — deny rules must come before allow rules
sudo ufw status numberedStep 5 — Document the block
Update this table (above) with the subnet, operator, reason, and date.
# List rules with numbers
sudo ufw status numbered
# Delete by number
sudo ufw delete <number># 1. Review current firewall rules
sudo ufw status numbered
# 2. Check fail2ban effectiveness
sudo fail2ban-client status sshd
sudo fail2ban-client status recidive
sudo fail2ban-client status recidive-permanent
# 3. Find new repeat offenders not yet blocked at firewall level
sudo grep "Ban" /var/log/fail2ban.log | grep -oP '\d+\.\d+\.\d+\.\d+' | sort | uniq -c | sort -rn | head -15
# 4. Check if any blocked subnets are no longer attacking (optional cleanup)
sudo grep -c "BLOCK" /var/log/ufw.log # Overall block count
# 5. Review Grafana security dashboard for trends
# - SSH Attack Activity panel: are failures declining?
# - Banned IPs by Jail: is recidive catching repeat offenders?Three-tier ban escalation: initial 24h ban, 30-day ban for repeat offenders, permanent ban for persistent attackers.
apt install fail2ban
systemctl enable fail2ban
# /etc/fail2ban/jail.local
cat > /etc/fail2ban/jail.local << 'EOF'
[sshd]
enabled = true
port = ssh
filter = sshd
logpath = /var/log/auth.log
maxretry = 5
bantime = 86400 # 24 hours
findtime = 600 # 10 minute window
[recidive]
enabled = true
backend = auto
logpath = /var/log/fail2ban.log
banaction = %(banaction_allports)s
# 30-day ban if banned 3+ times within 7 days
maxretry = 3
bantime = 2592000 # 30 days
findtime = 604800 # 7 day window
[recidive-permanent]
enabled = true
filter = recidive[_jailname=recidive-permanent]
backend = auto
logpath = /var/log/fail2ban.log
banaction = %(banaction_allports)s
# Permanent ban if banned 5+ times within 30 days
maxretry = 5
bantime = -1 # permanent
findtime = 2592000 # 30 day window
EOF
systemctl restart fail2banImportant configuration notes:
backend = autois required for recidive jails — without it, fail2ban uses systemd journal matching (PRIORITY=5) which doesn't capture ban events from the log filefilter = recidive[_jailname=recidive-permanent]is required for custom jail names — it references the built-in recidive filter and sets the jail name to prevent self-matching loops
Monitor with:
sudo fail2ban-client status sshd
sudo fail2ban-client status recidive
sudo fail2ban-client status recidive-permanentFor fully automated handling of repeat offenders, the VPS runs a periodic script that promotes IPs and subnets from the fail2ban log to a persistent kernel-level blocklist. This complements fail2ban (which holds bans in memory with TTLs) by providing a permanent, high-scale blocklist that survives reboots and fail2ban restarts.
/var/log/fail2ban.log
↓
update-blocklist.sh (hourly via systemd timer)
↓
┌────┴────┐
↓ ↓
ipset:abuse-ips ipset:abuse-subnets
↓ ↓
iptables INPUT DROP (inserted before UFW rules)
↓
Metrics → node_exporter textfile collector
↓
Grafana dashboard + Prometheus alerts
Why ipset over more UFW rules:
- O(1) kernel hash lookup vs O(n) linear rule traversal
- Scales to millions of entries without performance degradation
- Survives reboots via a dedicated restore service
- Survives
ufw reloadvia rules integrated into/etc/ufw/before.rules - Entries auto-expire after 24 days (ipset TTL) unless refreshed by the updater
Logic:
- Parse
/var/log/fail2ban.log(including rotated.1,.2.gzetc.) - IPs with ≥3 bans → add to
abuse-ipsipset (24-day TTL) - /24 subnets with ≥3 distinct attacking IPs → add to
abuse-subnetsipset - Whitelist entries (localhost, private ranges, trusted IPs) are never blocked
- TTLs refresh automatically on re-encounter (stale entries age out)
- State is persisted to disk and restored at boot before UFW starts
flockprevents concurrent runs from racing on ipset state- Metrics exported to Prometheus for visualization
UFW integration:
Rules are added to /etc/ufw/before.rules inside the *filter section with marker comments for idempotent management. This means:
- Rules survive
ufw reload(iptables flush/reapply) - Rules survive
ufw disable/enablecycles - Setup reloads UFW automatically after adding rules
- Removing the marker block and running
ufw reloadcleanly removes the blocklist
The scripts live in scripts/blocklist/ and are deployed with the rest of the infrastructure. Run the one-time setup on the VPS as root:
# Run from the current release directory on the VPS
cd /opt/n8n-v2/current/scripts/blocklist
sudo ./setup-blocklist.shThis installs ipset, creates the two ipsets, adds iptables DROP rules, sets up the whitelist at /etc/blocklist/whitelist.conf, installs blocklist-restore.service (restores state at boot), and enables blocklist.timer (runs the updater hourly).
After setup, edit the whitelist to add any trusted IPs:
sudo nano /etc/blocklist/whitelist.conf
# Add any IP or CIDR range you want to protect from auto-blocking, e.g.:
# 203.0.113.42 # home
# 198.51.100.0/24 # officeThen run the first update manually:
sudo /usr/local/bin/update-blocklist.shThe update script accepts environment variables:
| Variable | Default | Description |
|---|---|---|
IP_BAN_THRESHOLD |
3 | Minimum ban count before an IP is added to abuse-ips |
SUBNET_IP_THRESHOLD |
3 | Minimum distinct attacking IPs in a /24 before the subnet is added |
FAIL2BAN_LOG |
/var/log/fail2ban.log |
Path to fail2ban log |
WHITELIST_FILE |
/etc/blocklist/whitelist.conf |
Whitelist path |
Threshold rationale: fail2ban's sshd jail bans after 5 failed attempts. Requiring 3 fail2ban bans before promotion means an IP triggered roughly 15 failed attempts across multiple ban cycles — a clear pattern of persistent abuse, not transient noise.
To override, edit /etc/systemd/system/blocklist.service and add Environment=... lines, then systemctl daemon-reload.
View the current blocklist:
# IPs
sudo ipset list abuse-ips | head -20
# Subnets
sudo ipset list abuse-subnets
# Counts only
sudo ipset list abuse-ips -terse
sudo ipset list abuse-subnets -terseCheck the timer and recent runs:
systemctl status blocklist.timer
systemctl list-timers blocklist.timer
journalctl -u blocklist.service -n 50
tail -50 /var/log/blocklist-updates.logManually remove a false positive:
# Remove a specific IP
sudo ipset del abuse-ips 1.2.3.4
# Remove a subnet
sudo ipset del abuse-subnets 1.2.3.0/24
# Persist the change (save ipset state to /var/lib/blocklist/)
sudo ipset save abuse-ips > /var/lib/blocklist/abuse-ips.save
sudo ipset save abuse-subnets > /var/lib/blocklist/abuse-subnets.save
# Then add the IP/subnet to the whitelist so it won't be re-added
sudo nano /etc/blocklist/whitelist.confFlush and reset (if something goes wrong):
sudo ipset flush abuse-ips
sudo ipset flush abuse-subnets
sudo ipset save abuse-ips > /var/lib/blocklist/abuse-ips.save
sudo ipset save abuse-subnets > /var/lib/blocklist/abuse-subnets.save
sudo systemctl start blocklist.service # rebuild from logsUninstall (remove blocklist entirely):
# 1. Stop and disable services
sudo systemctl disable --now blocklist.timer
sudo systemctl disable blocklist-restore.service
# 2. Remove UFW rules (edit /etc/ufw/before.rules and delete the marker block)
sudo sed -i '/^# BEGIN blocklist/,/^# END blocklist/d' /etc/ufw/before.rules
sudo ufw reload
# 3. Destroy ipsets
sudo ipset destroy abuse-ips
sudo ipset destroy abuse-subnets
# 4. Remove state and scripts
sudo rm -rf /var/lib/blocklist /etc/blocklist
sudo rm -f /usr/local/bin/update-blocklist.sh /usr/local/bin/restore-blocklist.sh
sudo rm -f /etc/systemd/system/blocklist.service /etc/systemd/system/blocklist.timer /etc/systemd/system/blocklist-restore.service
sudo systemctl daemon-reloadMetrics exported to Prometheus:
| Metric | Type | Description |
|---|---|---|
blocklist_ips_total |
gauge | Current number of blocked IPs |
blocklist_subnets_total |
gauge | Current number of blocked subnets |
blocklist_last_update_timestamp |
gauge | Unix timestamp of last run |
blocklist_ips_added_last_run |
gauge | New IPs added in the most recent update |
blocklist_subnets_added_last_run |
gauge | New subnets added in the most recent update |
Grafana dashboard: The Security & Intrusion Detection dashboard includes blocklist panels showing total blocked IPs/subnets, last update time, and growth over time.
Prometheus alerts:
BlocklistUpdateStalled— no update in 2+ hours (timer may have failed)BlocklistGrowthSpike— 10+ new IPs added in a single run (possible attack surge)
The UFW-level subnet blocks (documented above) are maintained separately for well-known bulletproof hosting providers. Those rules are permanent and hand-curated. The automated blocklist handles the long tail of individual offenders and emerging /24 patterns.
The VPS receives continuous automated SSH scanning from botnets and bulletproof hosting providers. This is normal for any internet-facing server — it is not targeted.
Typical attack patterns observed:
- Volume: 50-200 failed SSH attempts per hour from 10-20 unique IPs
- Usernames tried:
root(90%+),admin,ubuntu,sol/solana/validator(crypto botnet) - Source networks: Bulletproof hosting (DMZHOST, TECHOFF), compromised VPS instances, residential IoT botnets
- Behavior: Automated credential stuffing, abandon after first rejection (password auth disabled)
Why these attacks are not a threat:
PasswordAuthentication no— attackers can't even attempt passwordsPermitRootLogin no— the most-targeted username is rejected instantly- Key-only auth — brute-forcing an ed25519 key is computationally infeasible
- fail2ban escalation — persistent IPs are permanently banned
- UFW subnet blocks — known abuse networks are dropped at kernel level
Attack flow:
Attacker → UFW (subnet blocked? → DROP)
→ SSH (key auth only → immediate reject)
→ fail2ban sshd (5 rejects → 24h ban)
→ fail2ban recidive (3 bans in 7d → 30-day ban)
→ fail2ban recidive-permanent (5 bans in 30d → permanent ban)
apt install unattended-upgrades
dpkg-reconfigure -plow unattended-upgradesapt install auditd
auditctl -w /opt/n8n-v2 -p wa -k n8n-deploy
auditctl -w /etc/ssh/sshd_config -p wa -k ssh-config
auditctl -w /etc/sudoers.d/ -p wa -k sudoers-changeThe deployment workflow:
- Creates
.envfrom GitHub secrets using python3 (avoids bash$expansion of bcrypt hashes) - Never stores credentials in the repository
- Uses secure environment variable passing via SSH heredocs
- Sets
umask 077during deployment to restrict file permissions - Cleans up SSH key material after deployment (
if: always()) - Fixes monitoring config permissions (
chmod -R o+rX) before container start - Doubles
$signs in auth hashes automatically for Docker Compose compatibility
All workflows must include a cleanup step:
- name: Cleanup SSH key
if: always()
run: |
rm -f ~/.ssh/id_rsa
rm -f ~/.ssh/known_hosts- Concurrency groups prevent duplicate deployments
- Pre-deployment validation (Docker Compose config check, shellcheck)
- Automatic rollback on deployment failure
- Health checks after deployment
- Backup creation before every deployment (7-day retention)
When running scripts locally, use environment variables or the .env file:
# Use .env file
cp env.example .env
# Edit .env with your actual values — NEVER commit this fileAutomated backups are created before each deployment:
- Service configurations
- PostgreSQL database dumps (
pg_dumpper database) - Docker volume archives
- Retention: last 7 backups
Backup location on VPS: /opt/n8n-v2/shared/backups/{configs,databases,volumes}
- Prometheus — Metrics collection (30-day retention), scrapes via Docker socket proxy
- Grafana — Dashboard visualization (Prometheus + Loki datasources)
- AlertManager — Alert routing → N8N webhook → Telegram
- Loki — Log aggregation (7-day retention)
- Promtail — Log shipper (Docker container logs +
/var/log/auth.log) - cAdvisor — Container resource metrics
- Node Exporter — System-level metrics + fail2ban textfile collector
- Hourly monitoring workflow — Automated health checks via GitHub Actions
- Daily security digest — Cron job at 08:00 UTC → N8N webhook → Telegram
| Group | Alert | Threshold |
|---|---|---|
| Security | SSHBruteForceSpike | >200 failures/hour |
| Security | SSHDistributedAttack | >15 unique IPs/hour |
| Security | HighBanCount | >20 banned IPs |
| System | ContainerDown | Any container down >1min |
| System | HighCPUUsage | >85% for 5min |
| System | DiskSpaceCritical | >95% |
| Services | CloudflareTunnelDown | Tunnel metrics unreachable |
| Services | ContainerHighMemory | >90% of limit |
- Cross-compose-project Prometheus targets use container names (e.g.,
edge-traefik-1), not service names - Loki image does not include
wget— healthcheck usescurl - Monitoring config files require
chmod -R o+rXafter deployment (automated in workflow)
- Traefik access log analysis for anomaly detection
- Cloudflare audit log monitoring via API
- Grafana alerting dashboards for Loki log patterns
If credentials are compromised:
- Immediate: Rotate all potentially affected credentials (see rotation procedures above)
- Isolate: If VPS compromise is suspected, restrict firewall to your IP only
- Assess: Review
auth.log,audit.log, Docker logs, and Cloudflare audit logs - Update: Change all related passwords, tokens, and SSH keys
- Document: Record the incident timeline and lessons learned
- Harden: Update security procedures based on findings
- Rotate all Cloudflare credentials (token, tunnel secret, tunnel ID)
- Verify
.envhas never been committed to git history - Replace credentials JSON auth with token-based tunnel auth (
CLOUDFLARE_TUNNEL_TOKEN) - Fix CloudFlared metrics binding (
127.0.0.1:2000) and log level (info)
- Create
deployeruser on VPS with scoped sudo - Upgrade
automatuser to admin with sudo group - Update
VPS_SSH_KEYGitHub secret with deployer's ed25519 private key - Update all workflow files to use
deployer@instead ofroot@ - Disable root SSH login (
PermitRootLogin no) - Add SSH key cleanup step to all workflows
- Add authentication to Traefik dashboard
- Set strong Grafana admin password, remove
:-admindefault
- Install and configure fail2ban on VPS (7 IPs already banned)
- Configure UFW — reset to SSH-only, all service ports closed
- Install unattended-upgrades for automatic security patches
- Set up audit logging on VPS (auditd with rules for deploy dir, sshd_config, sudoers)
- Disable SSH password authentication (
PasswordAuthentication no) - Replace direct Docker socket mount with socket proxy for Traefik
- Add resource limits to all containers
- Restrict Ollama CORS origins
- PostgreSQL passwords — accepted risk with file permission hardening (
chmod 600) - Enable rate limiting on Traefik routes
- Set up centralized logging (Loki + Promtail)
- Use environment variables for all sensitive configuration
- Never hardcode credentials in source code
- Use separate credentials for development and production
- Regularly update dependencies and base images
- Use least-privilege access for all services and users
- Enable audit logging where possible
- Regularly review and rotate credentials (quarterly)
- Use secure communication channels (HTTPS, SSH with key auth only)
- Limit access to production credentials to essential personnel only
- Use individual accounts rather than shared credentials
- Implement proper offboarding procedures
- Regular access reviews and cleanups
Remember: Security is everyone's responsibility. When in doubt, ask for guidance rather than compromising security.