Skip to content

accurately detect default gateways#153

Merged
k8s-ci-robot merged 1 commit into
kubernetes-sigs:mainfrom
aojea:detect_gw
Apr 20, 2026
Merged

accurately detect default gateways#153
k8s-ci-robot merged 1 commit into
kubernetes-sigs:mainfrom
aojea:detect_gw

Conversation

@aojea
Copy link
Copy Markdown
Contributor

@aojea aojea commented Apr 16, 2026

Address several critical edge cases in default interface detection and introduces a robust, rootless networking test framework.

  1. Point-to-Point Interfaces: Removed the r.Gw == nil check. Previously, this caused the agent to completely ignore active VPNs and tunnels (like Wireguard or tun/tap) because P2P links route directly out of the device without a Gateway IP.
  2. Route Metrics: The kernel relies on metrics (Priority) to determine the active route in a multi-WAN setup. The old code ignored this, returning all interfaces. It now correctly isolates the lowest metric for IPv4 and IPv6 independently, while preserving ECMP support.
  3. Multipath Link Lookup: Fixed a bug where multipath routes were queried using the parent route's link index (which is often 0) rather than the nexthop's link index.

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Apr 16, 2026
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 16, 2026
@aojea
Copy link
Copy Markdown
Contributor Author

aojea commented Apr 16, 2026

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@aojea: GitHub didn't allow me to assign the following users: dkennetzoracle, tamilmani1989.

Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

Details

In response to this:

/assign @gauravkghildiyal @dkennetzoracle @tamilmani1989

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Comment thread pkg/inventory/net.go
Comment thread internal/testutils/userns.go
Comment thread pkg/inventory/net_test.go
Comment thread pkg/inventory/net_test.go
Comment thread internal/testutils/userns.go
Comment thread internal/testutils/userns.go
Copy link
Copy Markdown
Contributor

@dkennetzoracle dkennetzoracle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits / questions, but I think it's a great add and a correctness fix. I'm not 100% sure it fixes the NonUplinkChecker situation on #138 on OKE, I'd need to double check. RA injects IPv6 defaults for us. If they share a metric (which I think they do) I still need something like NonUplinkChecker. If they don't I can drop NonUplinkChecker.

This definitely addresses one of the sub-issues in that PR, and is correct. LGTM!

@gauravkghildiyal
Copy link
Copy Markdown
Member

I'm not 100% sure it fixes the NonUplinkChecker situation on #138 on OKE, I'd need to double check.

@dkennetzoracle I'm very much interested in hearing about this after you've had the opportunity to try this out

@dkennetzoracle
Copy link
Copy Markdown
Contributor

@gauravkghildiyal - I will let you know! It's very hard for me to get access to GB200+ so I need to make the most of my time on them haha. I'm pretty sure each NIC gets a default route for ipv6 and these use the kernel's RA priority (so 1024) so we wouldn't be able to differentiate the 8 RDMA NICs from each other by metric. They all look like peer's from he kernel's pov.

So they'd all get excluded still here, haha. But it should work for the Azure case

@anson627
Copy link
Copy Markdown
Contributor

anson627 commented Apr 16, 2026

I just verified this PR on AWS p4d.24large, the primary NiC with default gateway/route (e.g. ens32) is excluded, and 4 EFA/ENA devices are properly included in the resource slice:

Device Type Interface PCI Address NUMA IP
pci-0000-10-01-0 ENA ens33 0000:10:01.0 0 192.168.157.168/19
pci-0000-10-1b-0 EFA 0000:10:1b.0 0 rdmap16s27
pci-0000-20-01-0 ENA ens65 0000:20:01.0 0 192.168.155.210/19
pci-0000-20-1b-0 EFA 0000:20:1b.0 0 rdmap32s27
pci-0000-90-01-0 ENA ens129 0000:90:01.0 1 192.168.145.129/19
pci-0000-90-1b-0 EFA 0000:90:1b.0 1 rdmap144s27
pci-0000-a0-01-0 ENA ens161 0000:a0:01.0 1 192.168.132.181/19
pci-0000-a0-1b-0 EFA 0000:a0:1b.0 1 rdmap160s27

@anson627
Copy link
Copy Markdown
Contributor

anson627 commented Apr 16, 2026

verified this PR on Azure GB300, the primary NiC (eth0) with default gateway is excluded, while Azure accelerated networking VF (e.g. enP8051s1) together with other mlx5_* NiCs are included

# Name RDMA Device PCI Address NUMA Vendor RDMA
1 pci-0101-00-00-0 mlx5_0 0101:00:00.0 0 Mellanox true
2 pci-0102-00-00-0 mlx5_1 0102:00:00.0 0 Mellanox true
3 pci-0103-00-00-0 mlx5_2 0103:00:00.0 1 Mellanox true
4 pci-0104-00-00-0 mlx5_3 0104:00:00.0 1 Mellanox true
5 pci-1f73-00-02-0 1f73:00:02.0 0 Mellanox true

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 16, 2026
@gauravkghildiyal
Copy link
Copy Markdown
Member

Holding to have time to go through this and allow resolution of a few comments.

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 16, 2026
@aojea
Copy link
Copy Markdown
Contributor Author

aojea commented Apr 17, 2026

RA injects IPv6 defaults for us. If they share a metric (which I think they do) I still need something like NonUplinkChecker. If they don't I can drop NonUplinkChecker.

From a kubernetes architectural and security perspective, we like to have certain guardrails so users can not shoot themselves on the feet. Any logic that allows an interface with a default route to be unmounted or modified based on an opportunistic RA injection is very risky.:

  • In the Kubernetes networking model, there is typically no difference between the control plane and the dataplane traffic. If an interface intended for the dataplane receives a default route via RA, it will be 'hijacking' control plane traffic, so the kubelet and pods trying to connect to the apiserver will fail (unless this secondary interface has access too, but this looks a very convoluted setup with routing loops). This will effectively disconnect the entire node from the cluster.

  • We already were hit by security issues "abusing" RA, so right now is RA processing disabled by default in all container interfaces, the same attack can be performed at the node level

A vulnerability was found in all versions of containernetworking/plugins before version 0.8.6, that allows malicious containers in Kubernetes clusters to perform man-in-the-middle (MitM) attacks. A malicious container can exploit this flaw by sending rogue IPv6 router advertisements to the host or other containers, to redirect traffic to the malicious container. (CVE-2020-10749)

I still think we need a better mechanism for improving filtering, so I encourage you to check @gauravkghildiyal proposal in #152 , that I think will be able to accomodate the NonUplinkChecker functionality perfectly

@dkennetzoracle
Copy link
Copy Markdown
Contributor

@aojea aligned with @gauravkghildiyal 's issue, it seems like a good fix and a generally better way to approach the problem!

Copy link
Copy Markdown
Contributor

@tamilmani1989 tamilmani1989 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes lgtm. left a comment, i will leave to your discretion

Comment thread pkg/inventory/net_test.go
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 17, 2026
@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 17, 2026

Deploy Preview for dranet canceled.

Name Link
🔨 Latest commit 52aa19a
🔍 Latest deploy log https://app.netlify.com/projects/dranet/deploys/69e2958e6384bc0008a17d96

@aojea
Copy link
Copy Markdown
Contributor Author

aojea commented Apr 17, 2026

pushed addressing latest comments, great reviews, caught some flaws:

  • clarify function scope explicitly indicating it queries for the unspecified routes 0.0.0.0/0, ::/0 or nil that are the kernel default networks
  • adding new unit tests

Address several critical edge cases in default interface
detection and introduces a robust, rootless networking test framework.

1. Point-to-Point Interfaces: Removed the `r.Gw == nil` check.
   Previously, this caused the agent to completely ignore active VPNs
   and tunnels (like Wireguard or tun/tap) because P2P links route
   directly out of the device without a Gateway IP.
2. Route Metrics: The kernel relies on metrics (Priority) to determine
   the active route in a multi-WAN setup. The old code ignored this,
   returning all interfaces. It now correctly isolates the lowest metric
   for IPv4 and IPv6 independently, while preserving ECMP support.
3. Multipath Link Lookup: Fixed a bug where multipath routes were queried
   using the parent route's link index (which is often 0) rather than
   the nexthop's link index.

Signed-off-by: Antonio Ojea <aojea@google.com>
@dkennetzoracle
Copy link
Copy Markdown
Contributor

@aojea @gauravkghildiyal I can confirm this fixed our issues (besides NIC orphaning) on OCI. I pulled in our OKE changes to this branch locally and deployed, and didn't need any additional changes. We made a few additional changes on our side, and fixed our dual ipv6 stack deployment. With these changes, the only thing we are missing is the cri upstream changes.

@aojea
Copy link
Copy Markdown
Contributor Author

aojea commented Apr 20, 2026

/hold cancel

based on the feedback from OKE and Azure, @gauravkghildiyal for final review

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 20, 2026
@gauravkghildiyal
Copy link
Copy Markdown
Member

Perfect, thanks for validating @anson627 and @dkennetzoracle

Thanks for the change @aojea

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 20, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: anson627, aojea, dkennetzoracle, gauravkghildiyal, tamilmani1989

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [aojea,gauravkghildiyal]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit a6ac1b5 into kubernetes-sigs:main Apr 20, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants