accurately detect default gateways#153
Conversation
|
@aojea: GitHub didn't allow me to assign the following users: dkennetzoracle, tamilmani1989. Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
dkennetzoracle
left a comment
There was a problem hiding this comment.
Some nits / questions, but I think it's a great add and a correctness fix. I'm not 100% sure it fixes the NonUplinkChecker situation on #138 on OKE, I'd need to double check. RA injects IPv6 defaults for us. If they share a metric (which I think they do) I still need something like NonUplinkChecker. If they don't I can drop NonUplinkChecker.
This definitely addresses one of the sub-issues in that PR, and is correct. LGTM!
@dkennetzoracle I'm very much interested in hearing about this after you've had the opportunity to try this out |
|
@gauravkghildiyal - I will let you know! It's very hard for me to get access to GB200+ so I need to make the most of my time on them haha. I'm pretty sure each NIC gets a default route for ipv6 and these use the kernel's RA priority (so 1024) so we wouldn't be able to differentiate the 8 RDMA NICs from each other by metric. They all look like peer's from he kernel's pov. So they'd all get excluded still here, haha. But it should work for the Azure case |
|
I just verified this PR on AWS p4d.24large, the primary NiC with default gateway/route (e.g. ens32) is excluded, and 4 EFA/ENA devices are properly included in the resource slice:
|
|
verified this PR on Azure GB300, the primary NiC (eth0) with default gateway is excluded, while Azure accelerated networking VF (e.g. enP8051s1) together with other mlx5_* NiCs are included
/lgtm |
|
Holding to have time to go through this and allow resolution of a few comments. /hold |
From a kubernetes architectural and security perspective, we like to have certain guardrails so users can not shoot themselves on the feet. Any logic that allows an interface with a default route to be unmounted or modified based on an opportunistic RA injection is very risky.:
I still think we need a better mechanism for improving filtering, so I encourage you to check @gauravkghildiyal proposal in #152 , that I think will be able to accomodate the |
|
@aojea aligned with @gauravkghildiyal 's issue, it seems like a good fix and a generally better way to approach the problem! |
tamilmani1989
left a comment
There was a problem hiding this comment.
changes lgtm. left a comment, i will leave to your discretion
✅ Deploy Preview for dranet canceled.
|
|
pushed addressing latest comments, great reviews, caught some flaws:
|
Address several critical edge cases in default interface detection and introduces a robust, rootless networking test framework. 1. Point-to-Point Interfaces: Removed the `r.Gw == nil` check. Previously, this caused the agent to completely ignore active VPNs and tunnels (like Wireguard or tun/tap) because P2P links route directly out of the device without a Gateway IP. 2. Route Metrics: The kernel relies on metrics (Priority) to determine the active route in a multi-WAN setup. The old code ignored this, returning all interfaces. It now correctly isolates the lowest metric for IPv4 and IPv6 independently, while preserving ECMP support. 3. Multipath Link Lookup: Fixed a bug where multipath routes were queried using the parent route's link index (which is often 0) rather than the nexthop's link index. Signed-off-by: Antonio Ojea <aojea@google.com>
|
@aojea @gauravkghildiyal I can confirm this fixed our issues (besides NIC orphaning) on OCI. I pulled in our OKE changes to this branch locally and deployed, and didn't need any additional changes. We made a few additional changes on our side, and fixed our dual ipv6 stack deployment. With these changes, the only thing we are missing is the cri upstream changes. |
|
/hold cancel based on the feedback from OKE and Azure, @gauravkghildiyal for final review |
|
Perfect, thanks for validating @anson627 and @dkennetzoracle Thanks for the change @aojea /lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: anson627, aojea, dkennetzoracle, gauravkghildiyal, tamilmani1989 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Address several critical edge cases in default interface detection and introduces a robust, rootless networking test framework.
r.Gw == nilcheck. Previously, this caused the agent to completely ignore active VPNs and tunnels (like Wireguard or tun/tap) because P2P links route directly out of the device without a Gateway IP.