Skip to content

Security: warlock016/N8N

Security

SECURITY.md

Security Guidelines

Overview

This document outlines the security posture, known gaps, and remediation plan for the N8N deployment infrastructure. It covers VPS access control, secrets management, container security, network exposure, and operational hardening.

Last audited: 2026-04-02 (all findings resolved)


Security Audit Summary

Critical Findings

# Finding Status
C1 All CI/CD and manual access uses root SSH — no privilege separation Resolved — deployer + automat users created, root SSH disabled
C2 Hardcoded Cloudflare tunnel ID in source-controlled files Resolved — switched to token-based auth
C3 .env contains live Cloudflare credentials (token, tunnel secret, account tag) Resolved — old credentials rotated, new tunnel created

High Findings

# Finding Status
H1 SSH private key written to CI runner disk without cleanup Resolved — if: always() cleanup step added to all workflows
H2 CloudFlared metrics bound to 0.0.0.0:2000 (all interfaces) Resolved — container has no host ports; only reachable within Docker network
H3 CloudFlared running with --loglevel debug in production Resolved — removed debug flag
H4 Ollama CORS set to * (accepts any origin) Resolved — restricted to internal consumers
H5 Traefik dashboard exposed without authentication Resolved — basic auth middleware added (defense-in-depth with Cloudflare Access)
H6 Grafana admin password defaults to admin if env var unset Resolved — strong password set via env var
H7 Docker socket mounted directly to Traefik container Resolved — replaced with tecnativa/docker-socket-proxy

Medium Findings

# Finding Status
M1 cAdvisor runs with SYS_ADMIN capability and apparmor:unconfined Accepted risk — required by cAdvisor for cgroup/filesystem metrics
M2 No resource limits on Ollama, Traefik, CloudFlared, Grafana, Prometheus Resolved — limits added to all containers
M3 Metrics basic auth hash hardcoded in compose file Resolved — moved to ${METRICS_AUTH} env var
M4 PostgreSQL passwords passed as env vars (visible via docker inspect) Accepted risk — mitigated with chmod 600 on compose files; Docker secrets adds complexity for single-host
M5 No rate limiting configured on any Traefik route Resolved — global rate-limit middleware (100 req/s, 200 burst) with IP strategy
M6 No fail2ban or SSH brute-force protection on VPS Resolved — fail2ban installed with sshd + recidive jails
M7 No centralized log aggregation Resolved — Loki + Promtail added to monitoring stack
M8 Backup files created without explicit file permissions Resolved — explicit chmod 600/700 after backup creation
M9 Host cert mounted from /root/.cloudflared/cert.pem Resolved — eliminated by token-based auth migration

VPS Access Model

Principle

Separate automated deployment access from interactive admin access. Never use root directly over SSH.

User Accounts

deployer — CI/CD service account

Purpose: Automated deployments via GitHub Actions. Restricted to only the commands needed to deploy and manage Docker services.

# Create the user
useradd -m -s /bin/bash deployer
mkdir -p /home/deployer/.ssh
chmod 700 /home/deployer/.ssh

# Add the CI/CD public key (generate a NEW keypair for this user)
echo "<deployer-public-key>" > /home/deployer/.ssh/authorized_keys
chown -R deployer:deployer /home/deployer/.ssh
chmod 600 /home/deployer/.ssh/authorized_keys

# Grant scoped sudo permissions
cat > /etc/sudoers.d/deployer << 'SUDOEOF'
# Docker operations
deployer ALL=(root) NOPASSWD: /usr/bin/docker
deployer ALL=(root) NOPASSWD: /usr/bin/docker compose *

# Deployment directory management
deployer ALL=(root) NOPASSWD: /bin/mkdir -p /opt/n8n-*
deployer ALL=(root) NOPASSWD: /bin/ln -sfn /opt/n8n-*/releases/* /opt/n8n-*/current
deployer ALL=(root) NOPASSWD: /bin/tar -xzf /tmp/deployment.tar.gz *
deployer ALL=(root) NOPASSWD: /bin/chown -R deployer\:deployer /opt/n8n-*
deployer ALL=(root) NOPASSWD: /bin/rm -rf /opt/n8n-*/releases/*

# Service management
deployer ALL=(root) NOPASSWD: /bin/systemctl restart docker
deployer ALL=(root) NOPASSWD: /usr/bin/ufw status
SUDOEOF
chmod 440 /etc/sudoers.d/deployer

# Transfer ownership of deployment directories
chown -R deployer:deployer /opt/n8n-v2
chown -R deployer:deployer /opt/n8n-production 2>/dev/null || true

Personal admin account — interactive SSH access

Purpose: Manual administration, troubleshooting, general-purpose tasks. Has full sudo privileges but requires your personal SSH key and password for sudo.

# Create your personal admin user
useradd -m -s /bin/bash <your-username> -G sudo
passwd <your-username>

# Add your personal SSH public key
mkdir -p /home/<your-username>/.ssh
echo "<your-personal-public-key>" > /home/<your-username>/.ssh/authorized_keys
chown -R <your-username>:<your-username> /home/<your-username>/.ssh
chmod 700 /home/<your-username>/.ssh
chmod 600 /home/<your-username>/.ssh/authorized_keys

With this account you can run sudo -i to get a root shell when needed. The difference from logging in as root directly:

  • SSH logs show your username, not just "root"
  • If your key is compromised, the attacker still needs your sudo password
  • You can disable this account independently without affecting deployments

Disable root SSH and password authentication

After both accounts are verified working:

# /etc/ssh/sshd_config
PermitRootLogin no
PasswordAuthentication no
KbdInteractiveAuthentication no

systemctl restart ssh

Important: Test SSH access with your personal account in a separate terminal BEFORE closing your current root session.

Note: With PasswordAuthentication no, SSH brute-force attacks are immediately rejected without allowing password attempts. This eliminates the attack surface entirely — attackers need a valid private key to even begin authentication.


GitHub Secrets Management

Required Secrets

Secret Purpose Format Rotation
VPS_SSH_KEY SSH private key for deployer user (NOT root) PEM key Quarterly or on suspicion
PRODUCTION_VPS_HOST Production VPS hostname/IP Hostname/IP On infrastructure change
STAGING_VPS_HOST Staging VPS hostname/IP (optional) Hostname/IP On infrastructure change
CLOUDFLARE_TUNNEL_TOKEN Tunnel token (replaces credentials JSON + tunnel ID) JWT token Quarterly or on suspicion
CLOUDFLARE_API_TOKEN API token with DNS:Edit permissions only Token string Quarterly or on suspicion
DOMAIN_NAME Primary domain name Domain string On domain change
TRAEFIK_DASHBOARD_AUTH Traefik dashboard basic auth Raw bcrypt hash (single $) Quarterly
METRICS_AUTH Traefik metrics endpoint basic auth Raw bcrypt hash (single $) Quarterly
SLACK_WEBHOOK_URL Notification webhook (optional) URL On rotation
DISCORD_WEBHOOK_URL Notification webhook (optional) URL On rotation

Important: TRAEFIK_DASHBOARD_AUTH and METRICS_AUTH must be stored with single $ signs (raw htpasswd -nB output). The deploy workflow automatically doubles them for Docker Compose. Do NOT store pre-doubled $$ values.

Setting up Secrets

  1. Navigate to your GitHub repository
  2. Go to Settings > Secrets and variables > Actions
  3. Click "New repository secret"
  4. Add each required secret with the appropriate value

Credential Rotation

When to Rotate

Rotate credentials immediately if:

  • Credentials are accidentally exposed in code, logs, or conversation context
  • A team member with access leaves the organization
  • Suspicious activity is detected
  • As part of regular security maintenance (quarterly recommended)

Cloudflare Credential Rotation

Tunnel token (replaces old credentials JSON + tunnel ID + tunnel secret):

  1. Log into Cloudflare Zero Trust dashboard: https://one.dash.cloudflare.com/
  2. Go to Networks > Tunnels
  3. Delete the existing tunnel (or create a new one alongside for zero-downtime rotation)
  4. Create a new tunnel > name it > choose "cloudflared"
  5. Copy the tunnel token from the provided docker command
  6. Update GitHub secret: CLOUDFLARE_TUNNEL_TOKEN
  7. Update local .env with the new token (never commit this file)
  8. Deploy to verify the new tunnel connects
  9. Delete the old tunnel if it still exists

API token (for DNS management scripts):

  1. Go to https://dash.cloudflare.com/profile/api-tokens
  2. Create Token > use "Edit zone DNS" template (scope to your zone only)
  3. Update GitHub secret: CLOUDFLARE_API_TOKEN
  4. Test DNS scripts to verify functionality
  5. Delete the old token from the same page

Note: ACCOUNT_TAG is your Cloudflare account identifier (not a secret, not rotatable). It is safe to keep in configuration but should not be in source-controlled files.

SSH Key Rotation

  1. Generate a new keypair: ssh-keygen -t ed25519 -C "deployer@github-actions"
  2. Add the new public key to deployer's authorized_keys on VPS
  3. Update the VPS_SSH_KEY GitHub secret with the new private key
  4. Test a deployment
  5. Remove the old public key from authorized_keys

Post-Rotation Verification

After any rotation, trigger a deployment (staging first) and verify:

  • CloudFlare tunnel connects successfully
  • DNS records resolve correctly
  • All services pass health checks
  • Monitoring stack reports healthy

File Security

Excluded Files

The following files are excluded from git tracking via .gitignore:

  • edge/cloudflared/*.json — Tunnel credential files
  • .env and *.env — Environment files with secrets
  • *.pem, *.key, *.crt — Certificate and key files
  • *.sqlite, *.sqlite3 — Database files

Never Commit

NEVER commit the following to the repository:

  • API tokens or keys
  • SSH private keys
  • Database passwords
  • SSL/TLS certificates or private keys
  • Tunnel credential JSON files
  • Any file containing sensitive credentials

Verifying Git History

Periodically verify no secrets have entered git history:

# Check if .env was ever committed
git log --all -p -- .env

# Search for known secret patterns
git log --all -S "CLOUDFLARE_TOKEN" --oneline
git log --all -S "TUNNEL_SECRET" --oneline

If secrets are found in history, use git filter-repo to purge them and force-push.


Container & Service Security

Network Architecture

Network Purpose External Access
edge Traefik + CloudFlared reverse proxy Via Cloudflare tunnel only
ai-internal Ollama, Qdrant, PostgreSQL None (internal only)
monitoring Prometheus, Grafana, AlertManager Via Cloudflare tunnel only

All external traffic is routed through the Cloudflare tunnel. No ports are exposed directly on the VPS host.

Container Hardening Checklist

  • CloudFlared: Metrics on 0.0.0.0:2000 — safe (no host ports, only reachable within Docker network)
  • CloudFlared: Set --loglevel info (not debug)
  • Ollama: Restrict OLLAMA_ORIGINS to known consumers (not *)
  • Traefik: Add authentication middleware to dashboard route
  • Traefik: Use a Docker socket proxy instead of direct socket mount
  • Grafana: Remove default password fallback (:-admin)
  • All services: Add deploy.resources.limits for memory and CPU
  • Compose: Switched to token-based auth (no more credentials JSON or tunnel ID in compose)

Resource Limits

All containers have explicit resource limits:

Container Memory CPUs
Traefik 256M 0.5
CloudFlared 256M 0.3
Docker Socket Proxy 128M 0.2
Prometheus 1G 0.5
Grafana 512M 0.5
AlertManager 128M 0.2
Loki 512M 0.5
Promtail 256M 0.3
Node Exporter 128M 0.2
cAdvisor 256M 0.3
N8N (template) 1G 1.0
PostgreSQL (template) 512M 0.5
Ollama (template) 8G 4.0
Qdrant (template) 2G 1.0
Generic App (template) 512M 0.5

VPS Hardening

Firewall (UFW)

Two layers of protection: subnet-level blocks for known abuse networks (dropped at kernel level), and SSH-only access for everything else.

ufw default deny incoming
ufw default allow outgoing
ufw allow 22/tcp
ufw enable

All HTTP/HTTPS traffic reaches services exclusively via the Cloudflare tunnel (which originates outbound connections from the VPS), so ports 80/443 do not need to be open.

Blocked Abuse Networks

Known bulletproof hosting providers and scanning networks are permanently blocked at the firewall. These rules are inserted before the SSH allow rule so packets are dropped before reaching fail2ban or SSH.

Subnet Operator Reason Added
92.118.39.0/24 DMZHOST (NL) Persistent SSH scanning, 5000+ attempts 2026-04-06
2.57.122.0/24 DMZHOST / TECHOFF SRV (NL) Persistent SSH scanning, coordinated with above 2026-04-06
195.178.110.0/24 TECHOFF SRV LIMITED (GB/AD) Repeat offender, same abuse contact 2026-04-06
45.148.10.0/24 DMZHOST (AD) Same operator, same abuse pattern 2026-04-06

Adding New Abuse Networks

When you observe persistent attackers in the Grafana security dashboard or fail2ban logs, follow this process to evaluate and block them:

Step 1 — Identify repeat offenders

# Find IPs that have been banned multiple times
sudo grep "Ban" /var/log/fail2ban.log | grep -oP '\d+\.\d+\.\d+\.\d+' | sort | uniq -c | sort -rn | head -15

# Check currently banned IPs
sudo fail2ban-client status sshd

Step 2 — Investigate the source network

# Look up the IP's network owner
whois <IP> | grep -iE "org|net|descr|country|abuse"

# For RIPE-managed IPs (European/Middle East), query RIPE directly
whois -h whois.ripe.net <IP> | grep -iE "org|net|descr|country|abuse|inetnum"

Red flags that indicate a block-worthy network:

  • Gmail or disposable email for abuse contact
  • "Bulletproof" hosting providers (DMZHOST, TECHOFF, etc.)
  • Multiple IPs from the same /24 appearing in your ban list
  • Organization registered in one country, operating in another
  • Network description is just a URL or empty
  • Same mnt-ref or org across multiple attacking subnets

Step 3 — Verify the subnet scope

# Check the inetnum range — only block what they own
whois -h whois.ripe.net <IP> | grep "inetnum"
# Example output: inetnum: 92.118.39.0 - 92.118.39.255 → block 92.118.39.0/24

Step 4 — Apply the block

# Insert before the SSH allow rule (position 1)
sudo ufw insert 1 deny from <SUBNET>/24 to any comment "<OPERATOR> - abuse network"

# Verify rule order — deny rules must come before allow rules
sudo ufw status numbered

Step 5 — Document the block

Update this table (above) with the subnet, operator, reason, and date.

Removing a Block

# List rules with numbers
sudo ufw status numbered

# Delete by number
sudo ufw delete <number>

Periodic Review Checklist (monthly)

# 1. Review current firewall rules
sudo ufw status numbered

# 2. Check fail2ban effectiveness
sudo fail2ban-client status sshd
sudo fail2ban-client status recidive
sudo fail2ban-client status recidive-permanent

# 3. Find new repeat offenders not yet blocked at firewall level
sudo grep "Ban" /var/log/fail2ban.log | grep -oP '\d+\.\d+\.\d+\.\d+' | sort | uniq -c | sort -rn | head -15

# 4. Check if any blocked subnets are no longer attacking (optional cleanup)
sudo grep -c "BLOCK" /var/log/ufw.log  # Overall block count

# 5. Review Grafana security dashboard for trends
# - SSH Attack Activity panel: are failures declining?
# - Banned IPs by Jail: is recidive catching repeat offenders?

fail2ban

Three-tier ban escalation: initial 24h ban, 30-day ban for repeat offenders, permanent ban for persistent attackers.

apt install fail2ban
systemctl enable fail2ban

# /etc/fail2ban/jail.local
cat > /etc/fail2ban/jail.local << 'EOF'
[sshd]
enabled = true
port = ssh
filter = sshd
logpath = /var/log/auth.log
maxretry = 5
bantime = 86400       # 24 hours
findtime = 600        # 10 minute window

[recidive]
enabled = true
backend = auto
logpath = /var/log/fail2ban.log
banaction = %(banaction_allports)s
# 30-day ban if banned 3+ times within 7 days
maxretry = 3
bantime = 2592000     # 30 days
findtime = 604800     # 7 day window

[recidive-permanent]
enabled = true
filter = recidive[_jailname=recidive-permanent]
backend = auto
logpath = /var/log/fail2ban.log
banaction = %(banaction_allports)s
# Permanent ban if banned 5+ times within 30 days
maxretry = 5
bantime = -1          # permanent
findtime = 2592000    # 30 day window
EOF

systemctl restart fail2ban

Important configuration notes:

  • backend = auto is required for recidive jails — without it, fail2ban uses systemd journal matching (PRIORITY=5) which doesn't capture ban events from the log file
  • filter = recidive[_jailname=recidive-permanent] is required for custom jail names — it references the built-in recidive filter and sets the jail name to prevent self-matching loops

Monitor with:

sudo fail2ban-client status sshd
sudo fail2ban-client status recidive
sudo fail2ban-client status recidive-permanent

Automated Blocklist (ipset + systemd timer)

For fully automated handling of repeat offenders, the VPS runs a periodic script that promotes IPs and subnets from the fail2ban log to a persistent kernel-level blocklist. This complements fail2ban (which holds bans in memory with TTLs) by providing a permanent, high-scale blocklist that survives reboots and fail2ban restarts.

Architecture

/var/log/fail2ban.log
         ↓
  update-blocklist.sh (hourly via systemd timer)
         ↓
    ┌────┴────┐
    ↓         ↓
ipset:abuse-ips    ipset:abuse-subnets
    ↓         ↓
 iptables INPUT DROP (inserted before UFW rules)
    ↓
 Metrics → node_exporter textfile collector
    ↓
 Grafana dashboard + Prometheus alerts

Why ipset over more UFW rules:

  • O(1) kernel hash lookup vs O(n) linear rule traversal
  • Scales to millions of entries without performance degradation
  • Survives reboots via a dedicated restore service
  • Survives ufw reload via rules integrated into /etc/ufw/before.rules
  • Entries auto-expire after 24 days (ipset TTL) unless refreshed by the updater

Logic:

  1. Parse /var/log/fail2ban.log (including rotated .1, .2.gz etc.)
  2. IPs with ≥3 bans → add to abuse-ips ipset (24-day TTL)
  3. /24 subnets with ≥3 distinct attacking IPs → add to abuse-subnets ipset
  4. Whitelist entries (localhost, private ranges, trusted IPs) are never blocked
  5. TTLs refresh automatically on re-encounter (stale entries age out)
  6. State is persisted to disk and restored at boot before UFW starts
  7. flock prevents concurrent runs from racing on ipset state
  8. Metrics exported to Prometheus for visualization

UFW integration: Rules are added to /etc/ufw/before.rules inside the *filter section with marker comments for idempotent management. This means:

  • Rules survive ufw reload (iptables flush/reapply)
  • Rules survive ufw disable/enable cycles
  • Setup reloads UFW automatically after adding rules
  • Removing the marker block and running ufw reload cleanly removes the blocklist

Installation

The scripts live in scripts/blocklist/ and are deployed with the rest of the infrastructure. Run the one-time setup on the VPS as root:

# Run from the current release directory on the VPS
cd /opt/n8n-v2/current/scripts/blocklist
sudo ./setup-blocklist.sh

This installs ipset, creates the two ipsets, adds iptables DROP rules, sets up the whitelist at /etc/blocklist/whitelist.conf, installs blocklist-restore.service (restores state at boot), and enables blocklist.timer (runs the updater hourly).

After setup, edit the whitelist to add any trusted IPs:

sudo nano /etc/blocklist/whitelist.conf
# Add any IP or CIDR range you want to protect from auto-blocking, e.g.:
# 203.0.113.42       # home
# 198.51.100.0/24    # office

Then run the first update manually:

sudo /usr/local/bin/update-blocklist.sh

Configuration

The update script accepts environment variables:

Variable Default Description
IP_BAN_THRESHOLD 3 Minimum ban count before an IP is added to abuse-ips
SUBNET_IP_THRESHOLD 3 Minimum distinct attacking IPs in a /24 before the subnet is added
FAIL2BAN_LOG /var/log/fail2ban.log Path to fail2ban log
WHITELIST_FILE /etc/blocklist/whitelist.conf Whitelist path

Threshold rationale: fail2ban's sshd jail bans after 5 failed attempts. Requiring 3 fail2ban bans before promotion means an IP triggered roughly 15 failed attempts across multiple ban cycles — a clear pattern of persistent abuse, not transient noise.

To override, edit /etc/systemd/system/blocklist.service and add Environment=... lines, then systemctl daemon-reload.

Operations

View the current blocklist:

# IPs
sudo ipset list abuse-ips | head -20

# Subnets
sudo ipset list abuse-subnets

# Counts only
sudo ipset list abuse-ips -terse
sudo ipset list abuse-subnets -terse

Check the timer and recent runs:

systemctl status blocklist.timer
systemctl list-timers blocklist.timer
journalctl -u blocklist.service -n 50
tail -50 /var/log/blocklist-updates.log

Manually remove a false positive:

# Remove a specific IP
sudo ipset del abuse-ips 1.2.3.4

# Remove a subnet
sudo ipset del abuse-subnets 1.2.3.0/24

# Persist the change (save ipset state to /var/lib/blocklist/)
sudo ipset save abuse-ips > /var/lib/blocklist/abuse-ips.save
sudo ipset save abuse-subnets > /var/lib/blocklist/abuse-subnets.save

# Then add the IP/subnet to the whitelist so it won't be re-added
sudo nano /etc/blocklist/whitelist.conf

Flush and reset (if something goes wrong):

sudo ipset flush abuse-ips
sudo ipset flush abuse-subnets
sudo ipset save abuse-ips > /var/lib/blocklist/abuse-ips.save
sudo ipset save abuse-subnets > /var/lib/blocklist/abuse-subnets.save
sudo systemctl start blocklist.service  # rebuild from logs

Uninstall (remove blocklist entirely):

# 1. Stop and disable services
sudo systemctl disable --now blocklist.timer
sudo systemctl disable blocklist-restore.service

# 2. Remove UFW rules (edit /etc/ufw/before.rules and delete the marker block)
sudo sed -i '/^# BEGIN blocklist/,/^# END blocklist/d' /etc/ufw/before.rules
sudo ufw reload

# 3. Destroy ipsets
sudo ipset destroy abuse-ips
sudo ipset destroy abuse-subnets

# 4. Remove state and scripts
sudo rm -rf /var/lib/blocklist /etc/blocklist
sudo rm -f /usr/local/bin/update-blocklist.sh /usr/local/bin/restore-blocklist.sh
sudo rm -f /etc/systemd/system/blocklist.service /etc/systemd/system/blocklist.timer /etc/systemd/system/blocklist-restore.service
sudo systemctl daemon-reload

Metrics exported to Prometheus:

Metric Type Description
blocklist_ips_total gauge Current number of blocked IPs
blocklist_subnets_total gauge Current number of blocked subnets
blocklist_last_update_timestamp gauge Unix timestamp of last run
blocklist_ips_added_last_run gauge New IPs added in the most recent update
blocklist_subnets_added_last_run gauge New subnets added in the most recent update

Grafana dashboard: The Security & Intrusion Detection dashboard includes blocklist panels showing total blocked IPs/subnets, last update time, and growth over time.

Prometheus alerts:

  • BlocklistUpdateStalled — no update in 2+ hours (timer may have failed)
  • BlocklistGrowthSpike — 10+ new IPs added in a single run (possible attack surge)

Manual UFW subnet blocks

The UFW-level subnet blocks (documented above) are maintained separately for well-known bulletproof hosting providers. Those rules are permanent and hand-curated. The automated blocklist handles the long tail of individual offenders and emerging /24 patterns.

Threat Landscape

The VPS receives continuous automated SSH scanning from botnets and bulletproof hosting providers. This is normal for any internet-facing server — it is not targeted.

Typical attack patterns observed:

  • Volume: 50-200 failed SSH attempts per hour from 10-20 unique IPs
  • Usernames tried: root (90%+), admin, ubuntu, sol/solana/validator (crypto botnet)
  • Source networks: Bulletproof hosting (DMZHOST, TECHOFF), compromised VPS instances, residential IoT botnets
  • Behavior: Automated credential stuffing, abandon after first rejection (password auth disabled)

Why these attacks are not a threat:

  • PasswordAuthentication no — attackers can't even attempt passwords
  • PermitRootLogin no — the most-targeted username is rejected instantly
  • Key-only auth — brute-forcing an ed25519 key is computationally infeasible
  • fail2ban escalation — persistent IPs are permanently banned
  • UFW subnet blocks — known abuse networks are dropped at kernel level

Attack flow:

Attacker → UFW (subnet blocked? → DROP)
         → SSH (key auth only → immediate reject)
         → fail2ban sshd (5 rejects → 24h ban)
         → fail2ban recidive (3 bans in 7d → 30-day ban)
         → fail2ban recidive-permanent (5 bans in 30d → permanent ban)

Automatic Security Updates

apt install unattended-upgrades
dpkg-reconfigure -plow unattended-upgrades

Audit Logging

apt install auditd
auditctl -w /opt/n8n-v2 -p wa -k n8n-deploy
auditctl -w /etc/ssh/sshd_config -p wa -k ssh-config
auditctl -w /etc/sudoers.d/ -p wa -k sudoers-change

CI/CD Security

GitHub Actions

The deployment workflow:

  1. Creates .env from GitHub secrets using python3 (avoids bash $ expansion of bcrypt hashes)
  2. Never stores credentials in the repository
  3. Uses secure environment variable passing via SSH heredocs
  4. Sets umask 077 during deployment to restrict file permissions
  5. Cleans up SSH key material after deployment (if: always())
  6. Fixes monitoring config permissions (chmod -R o+rX) before container start
  7. Doubles $ signs in auth hashes automatically for Docker Compose compatibility

SSH Key Cleanup

All workflows must include a cleanup step:

- name: Cleanup SSH key
  if: always()
  run: |
    rm -f ~/.ssh/id_rsa
    rm -f ~/.ssh/known_hosts

Workflow Protections

  • Concurrency groups prevent duplicate deployments
  • Pre-deployment validation (Docker Compose config check, shellcheck)
  • Automatic rollback on deployment failure
  • Health checks after deployment
  • Backup creation before every deployment (7-day retention)

Deployment Security

Environment Variables

When running scripts locally, use environment variables or the .env file:

# Use .env file
cp env.example .env
# Edit .env with your actual values — NEVER commit this file

Backup & Recovery

Automated backups are created before each deployment:

  • Service configurations
  • PostgreSQL database dumps (pg_dump per database)
  • Docker volume archives
  • Retention: last 7 backups

Backup location on VPS: /opt/n8n-v2/shared/backups/{configs,databases,volumes}


Monitoring and Alerts

Current Stack

  • Prometheus — Metrics collection (30-day retention), scrapes via Docker socket proxy
  • Grafana — Dashboard visualization (Prometheus + Loki datasources)
  • AlertManager — Alert routing → N8N webhook → Telegram
  • Loki — Log aggregation (7-day retention)
  • Promtail — Log shipper (Docker container logs + /var/log/auth.log)
  • cAdvisor — Container resource metrics
  • Node Exporter — System-level metrics + fail2ban textfile collector
  • Hourly monitoring workflow — Automated health checks via GitHub Actions
  • Daily security digest — Cron job at 08:00 UTC → N8N webhook → Telegram

Alert Rules

Group Alert Threshold
Security SSHBruteForceSpike >200 failures/hour
Security SSHDistributedAttack >15 unique IPs/hour
Security HighBanCount >20 banned IPs
System ContainerDown Any container down >1min
System HighCPUUsage >85% for 5min
System DiskSpaceCritical >95%
Services CloudflareTunnelDown Tunnel metrics unreachable
Services ContainerHighMemory >90% of limit

Operational Notes

  • Cross-compose-project Prometheus targets use container names (e.g., edge-traefik-1), not service names
  • Loki image does not include wget — healthcheck uses curl
  • Monitoring config files require chmod -R o+rX after deployment (automated in workflow)

Future Improvements

  • Traefik access log analysis for anomaly detection
  • Cloudflare audit log monitoring via API
  • Grafana alerting dashboards for Loki log patterns

Incident Response

If credentials are compromised:

  1. Immediate: Rotate all potentially affected credentials (see rotation procedures above)
  2. Isolate: If VPS compromise is suspected, restrict firewall to your IP only
  3. Assess: Review auth.log, audit.log, Docker logs, and Cloudflare audit logs
  4. Update: Change all related passwords, tokens, and SSH keys
  5. Document: Record the incident timeline and lessons learned
  6. Harden: Update security procedures based on findings

Remediation Action Items

Phase 1 — Immediate (before next deployment)

  • Rotate all Cloudflare credentials (token, tunnel secret, tunnel ID)
  • Verify .env has never been committed to git history
  • Replace credentials JSON auth with token-based tunnel auth (CLOUDFLARE_TUNNEL_TOKEN)
  • Fix CloudFlared metrics binding (127.0.0.1:2000) and log level (info)

Phase 2 — This week

  • Create deployer user on VPS with scoped sudo
  • Upgrade automat user to admin with sudo group
  • Update VPS_SSH_KEY GitHub secret with deployer's ed25519 private key
  • Update all workflow files to use deployer@ instead of root@
  • Disable root SSH login (PermitRootLogin no)
  • Add SSH key cleanup step to all workflows
  • Add authentication to Traefik dashboard
  • Set strong Grafana admin password, remove :-admin default

Phase 3 — This month

  • Install and configure fail2ban on VPS (7 IPs already banned)
  • Configure UFW — reset to SSH-only, all service ports closed
  • Install unattended-upgrades for automatic security patches
  • Set up audit logging on VPS (auditd with rules for deploy dir, sshd_config, sudoers)
  • Disable SSH password authentication (PasswordAuthentication no)
  • Replace direct Docker socket mount with socket proxy for Traefik
  • Add resource limits to all containers
  • Restrict Ollama CORS origins
  • PostgreSQL passwords — accepted risk with file permission hardening (chmod 600)
  • Enable rate limiting on Traefik routes
  • Set up centralized logging (Loki + Promtail)

Best Practices

Development

  • Use environment variables for all sensitive configuration
  • Never hardcode credentials in source code
  • Use separate credentials for development and production
  • Regularly update dependencies and base images

Deployment

  • Use least-privilege access for all services and users
  • Enable audit logging where possible
  • Regularly review and rotate credentials (quarterly)
  • Use secure communication channels (HTTPS, SSH with key auth only)

Team Access

  • Limit access to production credentials to essential personnel only
  • Use individual accounts rather than shared credentials
  • Implement proper offboarding procedures
  • Regular access reviews and cleanups

Remember: Security is everyone's responsibility. When in doubt, ask for guidance rather than compromising security.

There aren't any published security advisories