Skip to content

[Draft]Add antreanodeconfig crd agent#119

Open
luolanzone wants to merge 13 commits into
mainfrom
add-antreanodeconfig-crd-agent
Open

[Draft]Add antreanodeconfig crd agent#119
luolanzone wants to merge 13 commits into
mainfrom
add-antreanodeconfig-crd-agent

Conversation

@luolanzone
Copy link
Copy Markdown
Owner

@luolanzone luolanzone commented Mar 10, 2026

Note

High Risk
Introduces a new beta feature with CRD changes, controller logic, and integration with secondary networks. The module path change to v2 affects the entire codebase. Changes to core agent initialization and secondary network bridge configuration could impact existing deployments.

Overview
Adds the AntreaNodeConfig CRD (beta feature, enabled by default) to enable per-Node configuration of Antrea agent settings via nodeSelector-based policies. This allows cluster administrators to apply different secondary network OVS bridge configurations to different Node pools.

Key changes:

  • New CRD AntreaNodeConfig with nodeSelector matching and secondary network bridge configuration (bridge name, physical interfaces, VLAN filtering, multicast snooping).
  • New controller in pkg/agent/antreanodeconfig/ that watches AntreaNodeConfig resources and publishes immutable snapshots to subscribers (e.g., secondary network controller) when the effective configuration for a Node changes.
  • Secondary network integration updated to consume AntreaNodeConfig snapshots, with support for waiting on initial configuration before bridge initialization.
  • Module path bumped to v2 (antrea.io/antrea/v2) across all imports for the major version release.
  • Flow Aggregator multi-arch support added to build workflows (amd64, arm64, arm) with manifest list publishing.
  • Test infrastructure enhanced with VLAN-aware bridge support for Kind clusters and new e2e tests for AntreaNodeConfig with label selectors.

Reviewed by Cursor Bugbot for commit a8f0913. Bugbot is set up for automated code reviews on this repo. Configure here.

@luolanzone luolanzone force-pushed the add-antreanodeconfig-crd-agent branch from a5bbaef to 95720de Compare March 10, 2026 02:54
Comment thread cmd/antrea-agent/agent.go Outdated
Comment thread pkg/agent/secondarynetwork/init_linux.go
Comment thread cmd/antrea-agent/agent.go
@luolanzone luolanzone force-pushed the add-antreanodeconfig-crd-agent branch from 95720de to db6dcaa Compare March 10, 2026 10:10
Comment thread build/charts/antrea/crds/antreanodeconfig.yaml Outdated
@luolanzone luolanzone force-pushed the add-antreanodeconfig-crd-agent branch from db6dcaa to cd65b32 Compare March 11, 2026 09:19
Comment thread pkg/agent/secondarynetwork/podwatch/controller.go
Comment thread pkg/agent/secondarynetwork/podwatch/controller.go
@luolanzone luolanzone force-pushed the add-antreanodeconfig-crd-agent branch from cd65b32 to 178d747 Compare March 11, 2026 09:58
@luolanzone luolanzone force-pushed the add-antreanodeconfig-crd-agent branch from 178d747 to c642a08 Compare March 31, 2026 04:02
Comment thread pkg/agent/agent_linux.go
@luolanzone luolanzone force-pushed the add-antreanodeconfig-crd-agent branch from c642a08 to ab802dd Compare March 31, 2026 06:33
Comment thread pkg/agent/secondarynetwork/init.go Outdated
@luolanzone luolanzone force-pushed the add-antreanodeconfig-crd-agent branch 2 times, most recently from 862ac4f to 9fe8413 Compare April 1, 2026 08:17
Comment thread pkg/agent/secondarynetwork/init_linux.go Outdated
Comment thread cmd/antrea-agent/agent.go
@luolanzone luolanzone force-pushed the add-antreanodeconfig-crd-agent branch from 9fe8413 to 5aaaba7 Compare April 8, 2026 09:09
Comment thread go.mod
@luolanzone luolanzone force-pushed the add-antreanodeconfig-crd-agent branch 3 times, most recently from bc8ae5c to b868b1a Compare April 10, 2026 02:32
@luolanzone luolanzone force-pushed the add-antreanodeconfig-crd-agent branch 3 times, most recently from d054735 to 1e626ed Compare April 21, 2026 09:43
Comment thread cmd/antrea-agent/agent.go Outdated
// wait for the first ANC snapshot before Initialize so the effective bridge is known.
if features.DefaultFeatureGate.Enabled(features.SecondaryNetwork) {
if err := secondaryNetworkController.WaitForInitialANCSnapshotAndEnsureBridge(stopCh); err != nil {
return fmt.Errorf("failed to wait for AntreaNodeConfig snapshot for secondary network: %w", err)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential deadlock with AntreaNodeConfig initialization order

High Severity

When both SecondaryNetwork and AntreaNodeConfig features are enabled, WaitForInitialANCSnapshotAndEnsureBridge is called at line 995 to wait for the first AntreaNodeConfig snapshot. However, this blocking wait occurs during the agent's initialization sequence, before many controllers have started their work. The antreaNodeConfigController.Run is started at line 882, but the subscription callback that closes ancFirstSnapshotCh (set up at lines 159-172 in secondarynetwork/init.go) requires the controller to process events and publish snapshots. If the controller's event processing is delayed or depends on other initialization steps that happen after line 995, this could cause a deadlock or extended startup delay, especially since the wait is blocking the main initialization flow.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 1e626ed. Configure here.

@luolanzone luolanzone force-pushed the add-antreanodeconfig-crd-agent branch from 1e626ed to 0f4b2b2 Compare April 21, 2026 10:16
Comment thread ci/kind/test-secondary-network-kind.sh Outdated
fi

trap "quit" INT EXIT
# trap "quit" INT EXIT
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented out trap disables cleanup on exit

High Severity

The trap "quit" INT EXIT statement is commented out, preventing cleanup from running when the script exits normally or is interrupted. This means the quit function won't execute, leaving stale Kind clusters and Docker networks that should be cleaned up. The cleanup logic is defined but never triggered.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit ec6c54f. Configure here.

Comment thread ci/kind/kind-setup.sh
docker network connect --driver-opt=com.docker.network.endpoint.ifname=$ifname $network $node
docker network connect "${extra_gw_priority[@]}" --driver-opt=com.docker.network.endpoint.ifname=$ifname "$network" "$node"
echo "connected worker $node to network $network"
i=$((i+1))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nested loop counter placement causes incorrect interface assignment

High Severity

The counter i is initialized to 1 before the inner loop but incremented inside it, causing the interface number to continue incrementing across all networks for each node instead of resetting. For example, if there are 2 networks, node1 gets eth1 and eth2, but node2 would incorrectly get eth3 and eth4 instead of eth1 and eth2. This breaks network connectivity between nodes.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit ec6c54f. Configure here.

@luolanzone luolanzone force-pushed the add-antreanodeconfig-crd-agent branch from ec6c54f to a8f0913 Compare April 29, 2026 10:13
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 4 total unresolved issues (including 3 from previous reviews).

Fix All in Cursor

Reviewed by Cursor Bugbot for commit a8f0913. Configure here.

{{- end }}
nodeSelector:
kubernetes.io/os: linux
kubernetes.io/arch: amd64
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flow aggregator removed architecture-specific node selector

Low Severity

The deployment removed the kubernetes.io/arch: amd64 node selector while keeping kubernetes.io/os: linux. This change allows the flow-aggregator to schedule on non-amd64 architectures (arm64, arm), but the workflow changes suggest multi-arch images are being built. However, without verifying that all dependencies and features work correctly on non-amd64 platforms, this could lead to runtime failures.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit a8f0913. Configure here.

@luolanzone luolanzone force-pushed the add-antreanodeconfig-crd-agent branch 2 times, most recently from 28505be to ee65d01 Compare May 26, 2026 08:45
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
@luolanzone luolanzone force-pushed the add-antreanodeconfig-crd-agent branch from 32cb1e4 to 7a1a8bc Compare June 2, 2026 15:24
renovate Bot and others added 10 commits June 3, 2026 11:29
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Add the Multi-cluster tests to GitHub Actions and update related scripts.

Signed-off-by: Shuyang Xin <xin_shuyang@hotmail.com>
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Introduce pkg/agent/antreanodeconfig:

- Match AntreaNodeConfig objects to a Node via nodeSelector; pick the
  oldest match (creationTimestamp, name as tiebreaker) and apply the
  first SecondaryNetwork winner only (no field-level merge).
- Resolve the effective secondary-network OVS bridge from CRD list +
  static agent config (EffectiveSecondaryOVSBridge, EffectiveSnapshot).
- Add a controller that watches AntreaNodeConfig and the local Node,
  recomputes the snapshot when labels or CRDs change, and notifies
  channel.Notifier subscribers (with periodic ANC resync).

Add agent-facing SecondaryNetwork types under pkg/agent/types. Set
the AntreaNodeConfig feature gate to Beta (default on). Refresh the
agent chart, bundled install YAMLs, and feature-gate tests.

Signed-off-by: Lan Luo <lan.luo@broadcom.com>
When the AntreaNodeConfig feature gate is enabled, antrea-agent starts the
antreanodeconfig controller plus a SubscribableChannel and passes an
effective-bridge callback and channel subscriber into the secondary network
controller. The secondary network controller creates the initial OVS bridge
from that callback, subscribes to ANC snapshot notifications to enqueue
rate-limited bridge reconciliation work, and replaces the podwatch OVS client
when the effective bridge changes.

- Reconcile bridge name, physical interfaces, and trunk AllowedVLANs on Linux
  (including clearing stale trunks and tearing down stale host-connection port
  pairs when moving from single- to multi-interface uplink configs).
- Add OVS client support for trunk ports (CreateTrunkPort, SetPortTrunks) and
  trunk parsing in port listings; extend mocks and tests.
- Make podwatch PodController bridge access concurrency-safe and add
  UpdateOVSBridge for dynamic bridge swaps.
- Add OVSBridgeConfig helpers in pkg/agent/types; log uplink restore errors
  in agent_linux.

Signed-off-by: Lan Luo <lan.luo@broadcom.com>
Signed-off-by: Lan Luo <lan.luo@broadcom.com>
Signed-off-by: Lan Luo <lan.luo@broadcom.com>
Signed-off-by: Lan Luo <lan.luo@broadcom.com>
Signed-off-by: Lan Luo <lan.luo@broadcom.com>
@luolanzone luolanzone force-pushed the add-antreanodeconfig-crd-agent branch from 7a1a8bc to 73db071 Compare June 5, 2026 09:05
Signed-off-by: Lan Luo <lan.luo@broadcom.com>
@luolanzone luolanzone force-pushed the add-antreanodeconfig-crd-agent branch from 73db071 to 94815d4 Compare June 5, 2026 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants