Security Guidelines

Overview

This document outlines the security posture, known gaps, and remediation plan for the N8N deployment infrastructure. It covers VPS access control, secrets management, container security, network exposure, and operational hardening.

Last audited: 2026-04-02 (all findings resolved)

Security Audit Summary

Critical Findings

#	Finding	Status
C1	All CI/CD and manual access uses `root` SSH — no privilege separation	Resolved — deployer + automat users created, root SSH disabled
C2	Hardcoded Cloudflare tunnel ID in source-controlled files	Resolved — switched to token-based auth
C3	`.env` contains live Cloudflare credentials (token, tunnel secret, account tag)	Resolved — old credentials rotated, new tunnel created

High Findings

#	Finding	Status
H1	SSH private key written to CI runner disk without cleanup	Resolved — `if: always()` cleanup step added to all workflows
H2	CloudFlared metrics bound to `0.0.0.0:2000` (all interfaces)	Resolved — container has no host ports; only reachable within Docker network
H3	CloudFlared running with `--loglevel debug` in production	Resolved — removed debug flag
H4	Ollama CORS set to `*` (accepts any origin)	Resolved — restricted to internal consumers
H5	Traefik dashboard exposed without authentication	Resolved — basic auth middleware added (defense-in-depth with Cloudflare Access)
H6	Grafana admin password defaults to `admin` if env var unset	Resolved — strong password set via env var
H7	Docker socket mounted directly to Traefik container	Resolved — replaced with tecnativa/docker-socket-proxy

Medium Findings

#	Finding	Status
M1	cAdvisor runs with `SYS_ADMIN` capability and `apparmor:unconfined`	Accepted risk — required by cAdvisor for cgroup/filesystem metrics
M2	No resource limits on Ollama, Traefik, CloudFlared, Grafana, Prometheus	Resolved — limits added to all containers
M3	Metrics basic auth hash hardcoded in compose file	Resolved — moved to `${METRICS_AUTH}` env var
M4	PostgreSQL passwords passed as env vars (visible via `docker inspect`)	Accepted risk — mitigated with `chmod 600` on compose files; Docker secrets adds complexity for single-host
M5	No rate limiting configured on any Traefik route	Resolved — global rate-limit middleware (100 req/s, 200 burst) with IP strategy
M6	No fail2ban or SSH brute-force protection on VPS	Resolved — fail2ban installed with sshd + recidive jails
M7	No centralized log aggregation	Resolved — Loki + Promtail added to monitoring stack
M8	Backup files created without explicit file permissions	Resolved — explicit `chmod 600/700` after backup creation
M9	Host cert mounted from `/root/.cloudflared/cert.pem`	Resolved — eliminated by token-based auth migration

VPS Access Model

Principle

Separate automated deployment access from interactive admin access. Never use root directly over SSH.

User Accounts

`deployer` — CI/CD service account

Purpose: Automated deployments via GitHub Actions. Restricted to only the commands needed to deploy and manage Docker services.

# Create the user
useradd -m -s /bin/bash deployer
mkdir -p /home/deployer/.ssh
chmod 700 /home/deployer/.ssh

# Add the CI/CD public key (generate a NEW keypair for this user)
echo "<deployer-public-key>" > /home/deployer/.ssh/authorized_keys
chown -R deployer:deployer /home/deployer/.ssh
chmod 600 /home/deployer/.ssh/authorized_keys

# Grant scoped sudo permissions
cat > /etc/sudoers.d/deployer << 'SUDOEOF'
# Docker operations
deployer ALL=(root) NOPASSWD: /usr/bin/docker
deployer ALL=(root) NOPASSWD: /usr/bin/docker compose *

# Deployment directory management
deployer ALL=(root) NOPASSWD: /bin/mkdir -p /opt/n8n-*
deployer ALL=(root) NOPASSWD: /bin/ln -sfn /opt/n8n-*/releases/* /opt/n8n-*/current
deployer ALL=(root) NOPASSWD: /bin/tar -xzf /tmp/deployment.tar.gz *
deployer ALL=(root) NOPASSWD: /bin/chown -R deployer\:deployer /opt/n8n-*
deployer ALL=(root) NOPASSWD: /bin/rm -rf /opt/n8n-*/releases/*

# Service management
deployer ALL=(root) NOPASSWD: /bin/systemctl restart docker
deployer ALL=(root) NOPASSWD: /usr/bin/ufw status
SUDOEOF
chmod 440 /etc/sudoers.d/deployer

# Transfer ownership of deployment directories
chown -R deployer:deployer /opt/n8n-v2
chown -R deployer:deployer /opt/n8n-production 2>/dev/null || true

Personal admin account — interactive SSH access

Purpose: Manual administration, troubleshooting, general-purpose tasks. Has full sudo privileges but requires your personal SSH key and password for sudo.

# Create your personal admin user
useradd -m -s /bin/bash <your-username> -G sudo
passwd <your-username>

# Add your personal SSH public key
mkdir -p /home/<your-username>/.ssh
echo "<your-personal-public-key>" > /home/<your-username>/.ssh/authorized_keys
chown -R <your-username>:<your-username> /home/<your-username>/.ssh
chmod 700 /home/<your-username>/.ssh
chmod 600 /home/<your-username>/.ssh/authorized_keys

With this account you can run sudo -i to get a root shell when needed. The difference from logging in as root directly:

SSH logs show your username, not just "root"
If your key is compromised, the attacker still needs your sudo password
You can disable this account independently without affecting deployments

Disable root SSH and password authentication

After both accounts are verified working:

# /etc/ssh/sshd_config
PermitRootLogin no
PasswordAuthentication no
KbdInteractiveAuthentication no

systemctl restart ssh

Important: Test SSH access with your personal account in a separate terminal BEFORE closing your current root session.

Note: With PasswordAuthentication no, SSH brute-force attacks are immediately rejected without allowing password attempts. This eliminates the attack surface entirely — attackers need a valid private key to even begin authentication.

GitHub Secrets Management

Required Secrets

Secret	Purpose	Format	Rotation
`VPS_SSH_KEY`	SSH private key for `deployer` user (NOT root)	PEM key	Quarterly or on suspicion
`PRODUCTION_VPS_HOST`	Production VPS hostname/IP	Hostname/IP	On infrastructure change
`STAGING_VPS_HOST`	Staging VPS hostname/IP (optional)	Hostname/IP	On infrastructure change
`CLOUDFLARE_TUNNEL_TOKEN`	Tunnel token (replaces credentials JSON + tunnel ID)	JWT token	Quarterly or on suspicion
`CLOUDFLARE_API_TOKEN`	API token with DNS:Edit permissions only	Token string	Quarterly or on suspicion
`DOMAIN_NAME`	Primary domain name	Domain string	On domain change
`TRAEFIK_DASHBOARD_AUTH`	Traefik dashboard basic auth	Raw bcrypt hash (single `$`)	Quarterly
`METRICS_AUTH`	Traefik metrics endpoint basic auth	Raw bcrypt hash (single `$`)	Quarterly
`SLACK_WEBHOOK_URL`	Notification webhook (optional)	URL	On rotation
`DISCORD_WEBHOOK_URL`	Notification webhook (optional)	URL	On rotation

Important: TRAEFIK_DASHBOARD_AUTH and METRICS_AUTH must be stored with single $ signs (raw htpasswd -nB output). The deploy workflow automatically doubles them for Docker Compose. Do NOT store pre-doubled $$ values.

Setting up Secrets

Navigate to your GitHub repository
Go to Settings > Secrets and variables > Actions
Click "New repository secret"
Add each required secret with the appropriate value

Credential Rotation

When to Rotate

Rotate credentials immediately if:

Credentials are accidentally exposed in code, logs, or conversation context
A team member with access leaves the organization
Suspicious activity is detected
As part of regular security maintenance (quarterly recommended)

Cloudflare Credential Rotation

Tunnel token (replaces old credentials JSON + tunnel ID + tunnel secret):

Log into Cloudflare Zero Trust dashboard: https://one.dash.cloudflare.com/
Go to Networks > Tunnels
Delete the existing tunnel (or create a new one alongside for zero-downtime rotation)
Create a new tunnel > name it > choose "cloudflared"
Copy the tunnel token from the provided docker command
Update GitHub secret: CLOUDFLARE_TUNNEL_TOKEN
Update local .env with the new token (never commit this file)
Deploy to verify the new tunnel connects
Delete the old tunnel if it still exists

API token (for DNS management scripts):

Go to https://dash.cloudflare.com/profile/api-tokens
Create Token > use "Edit zone DNS" template (scope to your zone only)
Update GitHub secret: CLOUDFLARE_API_TOKEN
Test DNS scripts to verify functionality
Delete the old token from the same page

Note: ACCOUNT_TAG is your Cloudflare account identifier (not a secret, not rotatable). It is safe to keep in configuration but should not be in source-controlled files.

SSH Key Rotation

Generate a new keypair: ssh-keygen -t ed25519 -C "deployer@github-actions"
Add the new public key to deployer's authorized_keys on VPS
Update the VPS_SSH_KEY GitHub secret with the new private key
Test a deployment
Remove the old public key from authorized_keys

Post-Rotation Verification

After any rotation, trigger a deployment (staging first) and verify:

CloudFlare tunnel connects successfully
DNS records resolve correctly
All services pass health checks
Monitoring stack reports healthy

File Security

Excluded Files

The following files are excluded from git tracking via .gitignore:

edge/cloudflared/*.json — Tunnel credential files
.env and *.env — Environment files with secrets
*.pem, *.key, *.crt — Certificate and key files
*.sqlite, *.sqlite3 — Database files

Never Commit

NEVER commit the following to the repository:

API tokens or keys
SSH private keys
Database passwords
SSL/TLS certificates or private keys
Tunnel credential JSON files
Any file containing sensitive credentials

Verifying Git History

Periodically verify no secrets have entered git history:

# Check if .env was ever committed
git log --all -p -- .env

# Search for known secret patterns
git log --all -S "CLOUDFLARE_TOKEN" --oneline
git log --all -S "TUNNEL_SECRET" --oneline

If secrets are found in history, use git filter-repo to purge them and force-push.

Container & Service Security

Network Architecture

Network	Purpose	External Access
`edge`	Traefik + CloudFlared reverse proxy	Via Cloudflare tunnel only
`ai-internal`	Ollama, Qdrant, PostgreSQL	None (internal only)
`monitoring`	Prometheus, Grafana, AlertManager	Via Cloudflare tunnel only

All external traffic is routed through the Cloudflare tunnel. No ports are exposed directly on the VPS host.

Container Hardening Checklist

CloudFlared: Metrics on 0.0.0.0:2000 — safe (no host ports, only reachable within Docker network)
CloudFlared: Set --loglevel info (not debug)
Ollama: Restrict OLLAMA_ORIGINS to known consumers (not *)
Traefik: Add authentication middleware to dashboard route
Traefik: Use a Docker socket proxy instead of direct socket mount
Grafana: Remove default password fallback (:-admin)
All services: Add deploy.resources.limits for memory and CPU
Compose: Switched to token-based auth (no more credentials JSON or tunnel ID in compose)

Resource Limits

All containers have explicit resource limits:

Container	Memory	CPUs
Traefik	256M	0.5
CloudFlared	256M	0.3
Docker Socket Proxy	128M	0.2
Prometheus	1G	0.5
Grafana	512M	0.5
AlertManager	128M	0.2
Loki	512M	0.5
Promtail	256M	0.3
Node Exporter	128M	0.2
cAdvisor	256M	0.3
N8N (template)	1G	1.0
PostgreSQL (template)	512M	0.5
Ollama (template)	8G	4.0
Qdrant (template)	2G	1.0
Generic App (template)	512M	0.5

VPS Hardening

Firewall (UFW)

Two layers of protection: subnet-level blocks for known abuse networks (dropped at kernel level), and SSH-only access for everything else.

ufw default deny incoming
ufw default allow outgoing
ufw allow 22/tcp
ufw enable

All HTTP/HTTPS traffic reaches services exclusively via the Cloudflare tunnel (which originates outbound connections from the VPS), so ports 80/443 do not need to be open.

Blocked Abuse Networks

Known bulletproof hosting providers and scanning networks are permanently blocked at the firewall. These rules are inserted before the SSH allow rule so packets are dropped before reaching fail2ban or SSH.

Subnet	Operator	Reason	Added
`92.118.39.0/24`	DMZHOST (NL)	Persistent SSH scanning, 5000+ attempts	2026-04-06
`2.57.122.0/24`	DMZHOST / TECHOFF SRV (NL)	Persistent SSH scanning, coordinated with above	2026-04-06
`195.178.110.0/24`	TECHOFF SRV LIMITED (GB/AD)	Repeat offender, same abuse contact	2026-04-06
`45.148.10.0/24`	DMZHOST (AD)	Same operator, same abuse pattern	2026-04-06

Adding New Abuse Networks

When you observe persistent attackers in the Grafana security dashboard or fail2ban logs, follow this process to evaluate and block them:

Step 1 — Identify repeat offenders

# Find IPs that have been banned multiple times
sudo grep "Ban" /var/log/fail2ban.log | grep -oP '\d+\.\d+\.\d+\.\d+' | sort | uniq -c | sort -rn | head -15

# Check currently banned IPs
sudo fail2ban-client status sshd

Step 2 — Investigate the source network

# Look up the IP's network owner
whois <IP> | grep -iE "org|net|descr|country|abuse"

# For RIPE-managed IPs (European/Middle East), query RIPE directly
whois -h whois.ripe.net <IP> | grep -iE "org|net|descr|country|abuse|inetnum"

Red flags that indicate a block-worthy network:

Gmail or disposable email for abuse contact
"Bulletproof" hosting providers (DMZHOST, TECHOFF, etc.)
Multiple IPs from the same /24 appearing in your ban list
Organization registered in one country, operating in another
Network description is just a URL or empty
Same mnt-ref or org across multiple attacking subnets

Step 3 — Verify the subnet scope

# Check the inetnum range — only block what they own
whois -h whois.ripe.net <IP> | grep "inetnum"
# Example output: inetnum: 92.118.39.0 - 92.118.39.255 → block 92.118.39.0/24

Step 4 — Apply the block

# Insert before the SSH allow rule (position 1)
sudo ufw insert 1 deny from <SUBNET>/24 to any comment "<OPERATOR> - abuse network"

# Verify rule order — deny rules must come before allow rules
sudo ufw status numbered

Step 5 — Document the block

Update this table (above) with the subnet, operator, reason, and date.

Removing a Block

# List rules with numbers
sudo ufw status numbered

# Delete by number
sudo ufw delete <number>

Periodic Review Checklist (monthly)

# 1. Review current firewall rules
sudo ufw status numbered

# 2. Check fail2ban effectiveness
sudo fail2ban-client status sshd
sudo fail2ban-client status recidive
sudo fail2ban-client status recidive-permanent

# 3. Find new repeat offenders not yet blocked at firewall level
sudo grep "Ban" /var/log/fail2ban.log | grep -oP '\d+\.\d+\.\d+\.\d+' | sort | uniq -c | sort -rn | head -15

# 4. Check if any blocked subnets are no longer attacking (optional cleanup)
sudo grep -c "BLOCK" /var/log/ufw.log  # Overall block count

# 5. Review Grafana security dashboard for trends
# - SSH Attack Activity panel: are failures declining?
# - Banned IPs by Jail: is recidive catching repeat offenders?

fail2ban

Three-tier ban escalation: initial 24h ban, 30-day ban for repeat offenders, permanent ban for persistent attackers.

apt install fail2ban
systemctl enable fail2ban

# /etc/fail2ban/jail.local
cat > /etc/fail2ban/jail.local << 'EOF'
[sshd]
enabled = true
port = ssh
filter = sshd
logpath = /var/log/auth.log
maxretry = 5
bantime = 86400       # 24 hours
findtime = 600        # 10 minute window

[recidive]
enabled = true
backend = auto
logpath = /var/log/fail2ban.log
banaction = %(banaction_allports)s
# 30-day ban if banned 3+ times within 7 days
maxretry = 3
bantime = 2592000     # 30 days
findtime = 604800     # 7 day window

[recidive-permanent]
enabled = true
filter = recidive[_jailname=recidive-permanent]
backend = auto
logpath = /var/log/fail2ban.log
banaction = %(banaction_allports)s
# Permanent ban if banned 5+ times within 30 days
maxretry = 5
bantime = -1          # permanent
findtime = 2592000    # 30 day window
EOF

systemctl restart fail2ban

Important configuration notes:

backend = auto is required for recidive jails — without it, fail2ban uses systemd journal matching (PRIORITY=5) which doesn't capture ban events from the log file
filter = recidive[_jailname=recidive-permanent] is required for custom jail names — it references the built-in recidive filter and sets the jail name to prevent self-matching loops

Monitor with:

sudo fail2ban-client status sshd
sudo fail2ban-client status recidive
sudo fail2ban-client status recidive-permanent

Automated Blocklist (ipset + systemd timer)

For fully automated handling of repeat offenders, the VPS runs a periodic script that promotes IPs and subnets from the fail2ban log to a persistent kernel-level blocklist. This complements fail2ban (which holds bans in memory with TTLs) by providing a permanent, high-scale blocklist that survives reboots and fail2ban restarts.

Architecture

/var/log/fail2ban.log
         ↓
  update-blocklist.sh (hourly via systemd timer)
         ↓
    ┌────┴────┐
    ↓         ↓
ipset:abuse-ips    ipset:abuse-subnets
    ↓         ↓
 iptables INPUT DROP (inserted before UFW rules)
    ↓
 Metrics → node_exporter textfile collector
    ↓
 Grafana dashboard + Prometheus alerts

Why ipset over more UFW rules:

O(1) kernel hash lookup vs O(n) linear rule traversal
Scales to millions of entries without performance degradation
Survives reboots via a dedicated restore service
Survives ufw reload via rules integrated into /etc/ufw/before.rules
Entries auto-expire after 24 days (ipset TTL) unless refreshed by the updater

Logic:

Parse /var/log/fail2ban.log (including rotated .1, .2.gz etc.)
IPs with ≥3 bans → add to abuse-ips ipset (24-day TTL)
/24 subnets with ≥3 distinct attacking IPs → add to abuse-subnets ipset
Whitelist entries (localhost, private ranges, trusted IPs) are never blocked
TTLs refresh automatically on re-encounter (stale entries age out)
State is persisted to disk and restored at boot before UFW starts
flock prevents concurrent runs from racing on ipset state
Metrics exported to Prometheus for visualization

UFW integration: Rules are added to /etc/ufw/before.rules inside the *filter section with marker comments for idempotent management. This means:

Rules survive ufw reload (iptables flush/reapply)
Rules survive ufw disable/enable cycles
Setup reloads UFW automatically after adding rules
Removing the marker block and running ufw reload cleanly removes the blocklist

Installation

The scripts live in scripts/blocklist/ and are deployed with the rest of the infrastructure. Run the one-time setup on the VPS as root:

# Run from the current release directory on the VPS
cd /opt/n8n-v2/current/scripts/blocklist
sudo ./setup-blocklist.sh

This installs ipset, creates the two ipsets, adds iptables DROP rules, sets up the whitelist at /etc/blocklist/whitelist.conf, installs blocklist-restore.service (restores state at boot), and enables blocklist.timer (runs the updater hourly).

After setup, edit the whitelist to add any trusted IPs:

sudo nano /etc/blocklist/whitelist.conf
# Add any IP or CIDR range you want to protect from auto-blocking, e.g.:
# 203.0.113.42       # home
# 198.51.100.0/24    # office

Then run the first update manually:

sudo /usr/local/bin/update-blocklist.sh

Configuration

The update script accepts environment variables:

Variable	Default	Description
`IP_BAN_THRESHOLD`	3	Minimum ban count before an IP is added to `abuse-ips`
`SUBNET_IP_THRESHOLD`	3	Minimum distinct attacking IPs in a /24 before the subnet is added
`FAIL2BAN_LOG`	`/var/log/fail2ban.log`	Path to fail2ban log
`WHITELIST_FILE`	`/etc/blocklist/whitelist.conf`	Whitelist path

Threshold rationale: fail2ban's sshd jail bans after 5 failed attempts. Requiring 3 fail2ban bans before promotion means an IP triggered roughly 15 failed attempts across multiple ban cycles — a clear pattern of persistent abuse, not transient noise.

To override, edit /etc/systemd/system/blocklist.service and add Environment=... lines, then systemctl daemon-reload.

Operations

View the current blocklist:

# IPs
sudo ipset list abuse-ips | head -20

# Subnets
sudo ipset list abuse-subnets

# Counts only
sudo ipset list abuse-ips -terse
sudo ipset list abuse-subnets -terse

Check the timer and recent runs:

systemctl status blocklist.timer
systemctl list-timers blocklist.timer
journalctl -u blocklist.service -n 50
tail -50 /var/log/blocklist-updates.log

Manually remove a false positive:

# Remove a specific IP
sudo ipset del abuse-ips 1.2.3.4

# Remove a subnet
sudo ipset del abuse-subnets 1.2.3.0/24

# Persist the change (save ipset state to /var/lib/blocklist/)
sudo ipset save abuse-ips > /var/lib/blocklist/abuse-ips.save
sudo ipset save abuse-subnets > /var/lib/blocklist/abuse-subnets.save

# Then add the IP/subnet to the whitelist so it won't be re-added
sudo nano /etc/blocklist/whitelist.conf

Flush and reset (if something goes wrong):

sudo ipset flush abuse-ips
sudo ipset flush abuse-subnets
sudo ipset save abuse-ips > /var/lib/blocklist/abuse-ips.save
sudo ipset save abuse-subnets > /var/lib/blocklist/abuse-subnets.save
sudo systemctl start blocklist.service  # rebuild from logs

Uninstall (remove blocklist entirely):

# 1. Stop and disable services
sudo systemctl disable --now blocklist.timer
sudo systemctl disable blocklist-restore.service

# 2. Remove UFW rules (edit /etc/ufw/before.rules and delete the marker block)
sudo sed -i '/^# BEGIN blocklist/,/^# END blocklist/d' /etc/ufw/before.rules
sudo ufw reload

# 3. Destroy ipsets
sudo ipset destroy abuse-ips
sudo ipset destroy abuse-subnets

# 4. Remove state and scripts
sudo rm -rf /var/lib/blocklist /etc/blocklist
sudo rm -f /usr/local/bin/update-blocklist.sh /usr/local/bin/restore-blocklist.sh
sudo rm -f /etc/systemd/system/blocklist.service /etc/systemd/system/blocklist.timer /etc/systemd/system/blocklist-restore.service
sudo systemctl daemon-reload

Metrics exported to Prometheus:

Metric	Type	Description
`blocklist_ips_total`	gauge	Current number of blocked IPs
`blocklist_subnets_total`	gauge	Current number of blocked subnets
`blocklist_last_update_timestamp`	gauge	Unix timestamp of last run
`blocklist_ips_added_last_run`	gauge	New IPs added in the most recent update
`blocklist_subnets_added_last_run`	gauge	New subnets added in the most recent update

Grafana dashboard: The Security & Intrusion Detection dashboard includes blocklist panels showing total blocked IPs/subnets, last update time, and growth over time.

Prometheus alerts:

BlocklistUpdateStalled — no update in 2+ hours (timer may have failed)
BlocklistGrowthSpike — 10+ new IPs added in a single run (possible attack surge)

Manual UFW subnet blocks

The UFW-level subnet blocks (documented above) are maintained separately for well-known bulletproof hosting providers. Those rules are permanent and hand-curated. The automated blocklist handles the long tail of individual offenders and emerging /24 patterns.

Threat Landscape

The VPS receives continuous automated SSH scanning from botnets and bulletproof hosting providers. This is normal for any internet-facing server — it is not targeted.

Typical attack patterns observed:

Volume: 50-200 failed SSH attempts per hour from 10-20 unique IPs
Usernames tried: root (90%+), admin, ubuntu, sol/solana/validator (crypto botnet)
Source networks: Bulletproof hosting (DMZHOST, TECHOFF), compromised VPS instances, residential IoT botnets
Behavior: Automated credential stuffing, abandon after first rejection (password auth disabled)

Why these attacks are not a threat:

PasswordAuthentication no — attackers can't even attempt passwords
PermitRootLogin no — the most-targeted username is rejected instantly
Key-only auth — brute-forcing an ed25519 key is computationally infeasible
fail2ban escalation — persistent IPs are permanently banned
UFW subnet blocks — known abuse networks are dropped at kernel level

Attack flow:

Attacker → UFW (subnet blocked? → DROP)
         → SSH (key auth only → immediate reject)
         → fail2ban sshd (5 rejects → 24h ban)
         → fail2ban recidive (3 bans in 7d → 30-day ban)
         → fail2ban recidive-permanent (5 bans in 30d → permanent ban)

Automatic Security Updates

apt install unattended-upgrades
dpkg-reconfigure -plow unattended-upgrades

Audit Logging

apt install auditd
auditctl -w /opt/n8n-v2 -p wa -k n8n-deploy
auditctl -w /etc/ssh/sshd_config -p wa -k ssh-config
auditctl -w /etc/sudoers.d/ -p wa -k sudoers-change

CI/CD Security

GitHub Actions

The deployment workflow:

Creates .env from GitHub secrets using python3 (avoids bash $ expansion of bcrypt hashes)
Never stores credentials in the repository
Uses secure environment variable passing via SSH heredocs
Sets umask 077 during deployment to restrict file permissions
Cleans up SSH key material after deployment (if: always())
Fixes monitoring config permissions (chmod -R o+rX) before container start
Doubles $ signs in auth hashes automatically for Docker Compose compatibility

SSH Key Cleanup

All workflows must include a cleanup step:

- name: Cleanup SSH key
  if: always()
  run: |
    rm -f ~/.ssh/id_rsa
    rm -f ~/.ssh/known_hosts

Workflow Protections

Concurrency groups prevent duplicate deployments
Pre-deployment validation (Docker Compose config check, shellcheck)
Automatic rollback on deployment failure
Health checks after deployment
Backup creation before every deployment (7-day retention)

Deployment Security

Environment Variables

When running scripts locally, use environment variables or the .env file:

# Use .env file
cp env.example .env
# Edit .env with your actual values — NEVER commit this file

Backup & Recovery

Automated backups are created before each deployment:

Service configurations
PostgreSQL database dumps (pg_dump per database)
Docker volume archives
Retention: last 7 backups

Backup location on VPS: /opt/n8n-v2/shared/backups/{configs,databases,volumes}

Monitoring and Alerts

Current Stack

Prometheus — Metrics collection (30-day retention), scrapes via Docker socket proxy
Grafana — Dashboard visualization (Prometheus + Loki datasources)
AlertManager — Alert routing → N8N webhook → Telegram
Loki — Log aggregation (7-day retention)
Promtail — Log shipper (Docker container logs + /var/log/auth.log)
cAdvisor — Container resource metrics
Node Exporter — System-level metrics + fail2ban textfile collector
Hourly monitoring workflow — Automated health checks via GitHub Actions
Daily security digest — Cron job at 08:00 UTC → N8N webhook → Telegram

Alert Rules

Group	Alert	Threshold
Security	SSHBruteForceSpike	>200 failures/hour
Security	SSHDistributedAttack	>15 unique IPs/hour
Security	HighBanCount	>20 banned IPs
System	ContainerDown	Any container down >1min
System	HighCPUUsage	>85% for 5min
System	DiskSpaceCritical	>95%
Services	CloudflareTunnelDown	Tunnel metrics unreachable
Services	ContainerHighMemory	>90% of limit

Operational Notes

Cross-compose-project Prometheus targets use container names (e.g., edge-traefik-1), not service names
Loki image does not include wget — healthcheck uses curl
Monitoring config files require chmod -R o+rX after deployment (automated in workflow)

Future Improvements

Traefik access log analysis for anomaly detection
Cloudflare audit log monitoring via API
Grafana alerting dashboards for Loki log patterns

Incident Response

If credentials are compromised:

Immediate: Rotate all potentially affected credentials (see rotation procedures above)
Isolate: If VPS compromise is suspected, restrict firewall to your IP only
Assess: Review auth.log, audit.log, Docker logs, and Cloudflare audit logs
Update: Change all related passwords, tokens, and SSH keys
Document: Record the incident timeline and lessons learned
Harden: Update security procedures based on findings

Remediation Action Items

Phase 1 — Immediate (before next deployment)

Rotate all Cloudflare credentials (token, tunnel secret, tunnel ID)
Verify .env has never been committed to git history
Replace credentials JSON auth with token-based tunnel auth (CLOUDFLARE_TUNNEL_TOKEN)
Fix CloudFlared metrics binding (127.0.0.1:2000) and log level (info)

Phase 2 — This week

Create deployer user on VPS with scoped sudo
Upgrade automat user to admin with sudo group
Update VPS_SSH_KEY GitHub secret with deployer's ed25519 private key
Update all workflow files to use deployer@ instead of root@
Disable root SSH login (PermitRootLogin no)
Add SSH key cleanup step to all workflows
Add authentication to Traefik dashboard
Set strong Grafana admin password, remove :-admin default

Phase 3 — This month

Best Practices

Development

Use environment variables for all sensitive configuration
Never hardcode credentials in source code
Use separate credentials for development and production
Regularly update dependencies and base images

Deployment

Use least-privilege access for all services and users
Enable audit logging where possible
Regularly review and rotate credentials (quarterly)
Use secure communication channels (HTTPS, SSH with key auth only)

Team Access

Limit access to production credentials to essential personnel only
Use individual accounts rather than shared credentials
Implement proper offboarding procedures
Regular access reviews and cleanups

Remember: Security is everyone's responsibility. When in doubt, ask for guidance rather than compromising security.

Security: warlock016/N8N

Security

SECURITY.md