-
Notifications
You must be signed in to change notification settings - Fork 17
Description
A physical worker node remains in offline state permanently after a brief power outage. Running apply-vpn-routes command manually fixes the VPN configuration and restores the connection.
Steps to reproduce
- Physical Worker and Leader nodes are on the same network, without UPS
- Leader node has a Samba DC, bound to its LAN IP address
- A power outage occurs
Expected behavior
When power comes back, VPN connectivity should be restored automatically after boot, and both nodes must be fully functional.
Actual behavior
When power comes back the worker node remains in offline state because the wireguard VPN is misconfigured.
The journal evidences that apply-vpn-routes is executed early at boot time, when the network interface is still unconfigured. The script logic calculates a wrong IP route for the leader node, pushing it inside the VPN.
That command is executed by redis.service unit, after "network-online.target".
Leader node and Samba DC IP: 192.168.1.3
Jan 09 09:30:46 ns8worker redis[2089]: ip route replace 192.168.1.3 nexthop dev wg0
...
Jan 09 09:32:03 ns8n6 kernel: igb 0000:01:00.0 enp1s0: igb: enp1s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
...
Jan 09 09:32:03 ns8n6 NetworkManager[1557]: <info> [1767947523.4155] policy: set 'enp1s0' (enp1s0) as default for IPv4 routing and DNS
Contents of wg0.conf. The IP 192.168.1.3 must not be among AllowedIPs, however, since it does not match a link-local interface route apply-vpn-routes pushes it in the VPN.
[root@ns8worker ~]# cat /etc/wireguard/wg0.conf
[Interface]
Address = 10.5.4.6/32
ListenPort = 55820
PrivateKey = ***
[Peer]
PublicKey = ***
AllowedIPs = 10.5.4.1/32, 10.5.4.2/32, 10.5.4.3/32, 10.5.4.4/32, 10.5.4.5/32, 192.168.1.3/32
Endpoint = 192.168.1.3:55820
PersistentKeepalive = 25
Components
Core 3.16.0
See also
Metadata
Metadata
Assignees
Labels
Type
Projects
Status