Skip to content

e2e: EVPN with disruptive actions#3161

Open
jechen0648 wants to merge 7 commits intoopenshift:masterfrom
jechen0648:evpn_disruptive_case
Open

e2e: EVPN with disruptive actions#3161
jechen0648 wants to merge 7 commits intoopenshift:masterfrom
jechen0648:evpn_disruptive_case

Conversation

@jechen0648
Copy link
Copy Markdown
Contributor

@jechen0648 jechen0648 commented Apr 28, 2026

📑 Description

Summary
Adds an OpenShift-only EVPN disruptive e2e test suite (openshift/test/evpn.go) that runs three EVPN topologies (L3 IP-VRF, L2 MAC-VRF, L2 MAC-VRF+IP-VRF) with random VTEP subnets, checks connectivity and isolation, then validates recovery after node reboot, OVN-K restart, FRR-K8s restart, and external FRR (“spine”) failure/restart. Core logic lives in reusable helpers under test/e2e/evpn.go; the OpenShift test harness is moved from openshift/test/ to openshift/pkg/ so openshift/test/ can stay thin test entrypoints.

What changed
test/e2e
ClusterProvider.RebootNode — implements node reboot
DeploymentConfig.FRRK8sDaemonSetName — Lets tests wait on the correct FRR-K8s DaemonSet after restarts when it is KIND vs OpenShift cluster
Exports for OpenShift tests — Small wrappers (IPs, annotations, subnet filtering, ovnkube-node restart, node/GW checks) so OpenShift tests do not duplicate e2e internals.
CUDN_ADVERTISED_EVPNShared — RouteAdvertisement / network type support so multiple EVPN namespaces can share one FRRConfiguration (per-network advertise labels, shared network: testName selector on FRR).
EVPN helper expansion — Idempotent FRR setup, shared FRRConfiguration for multiple sessions, EVPNDisruptiveState, destroy/reapply kernel state on external FRR, RestartExternalFRRDaemons (daemon restart inside the spine container, not Docker-level restart), BGP + VNI checks using show evpn vni JSON (L2: numRemoteVteps; L3: numArpNd), exported SetupEVPNNetworkWithServers / EVPNExternalSetupIDs (real spine VNIs/subnets for anti-flake checks), network factories, reachability helpers.
updates on route_advertisements.go, network_segmentation*.go, util.go, e2e.go adjusted for new APIs and shared RA network type.

openshift/
Harness move — Shared registration, annotations, deploymentconfig/infraprovider shims, and generated files move openshift/test/ → openshift/pkg/; annotate, go generate, and ovn-kubernetes-tests-ext imports updated.

openshift/test/evpn.go — setup EVPN sessions, then do baseline connectivity check(pod↔external, isolation between VPNs, node↔pod negative checks, KAPI healthz), then four It blocks do disruptive actions:1)node restart (including VTEP loopback restore and pod recreate), 2)OVN-K restart, 3)FRR-K8s restart, 4) spine restart (destroy kernel state, restart FRR daemons, wait for process, re-apply kernel state), then verify connectivity again after disruptive action.

Why
Exercise EVPN data and control paths under realistic failures on OpenShift while keeping upstream test/e2e helpers reusable for both KIND and OpenShift extended tests.

Testing
OpenShift extended EVPN suite
KIND EVPN coverage remains compatible via updated runEVPNNetworkAndServers / RA wiring where touched.

Fixes #

Additional Information for reviewers

✅ Checks

  • My code requires changes to the documentation
  • if so, I have updated the documentation as required
  • My code requires tests
  • if so, I have added and/or updated the tests as required
  • All the tests have passed in the CI

How to verify it

Summary by CodeRabbit

  • New Features

    • Added comprehensive EVPN disruptive e2e tests for recovery after node, OVN-K, FRR-K8s, and external spine restarts.
    • Added reboot capability to infrastructure providers (KinD/OpenShift) and new public test helpers for restart workflows and network verification.
  • Tests

    • Improved EVPN cleanup robustness, FRR configuration persistence, and VTEP loopback idempotency.
    • Exposed additional test utilities for subnet matching, pod annotations, and connectivity assertions.
  • Chores

    • Centralized and reorganized shared test packages and updated module/config generation targets.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 28, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jechen0648
Once this PR has been reviewed and has the lgtm label, please assign jcaamano for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 28, 2026

Warning

Rate limit exceeded

@jechen0648 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 36 minutes and 42 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Enterprise

Run ID: 1192a985-9eba-401a-a4ee-f7f4352e34ad

📥 Commits

Reviewing files that changed from the base of the PR and between cb362e6 and ff5558e.

⛔ Files ignored due to path filters (1)
  • openshift/pkg/generated/zz_generated.annotations.go is excluded by !**/generated/**
📒 Files selected for processing (16)
  • openshift/cmd/annotate/main.go
  • openshift/cmd/ovn-kubernetes-tests-ext/main.go
  • openshift/cmd/ovn-kubernetes-tests-ext/provider.go
  • openshift/go.mod
  • openshift/hack/update-tests-annotation.sh
  • openshift/pkg/annotate/annotate.go
  • openshift/pkg/annotate/rules.go
  • openshift/pkg/deploymentconfig/openshift.go
  • openshift/pkg/e2e_test.go
  • openshift/pkg/infraprovider/baremetal.go
  • openshift/pkg/infraprovider/openshift.go
  • openshift/pkg/namespace.go
  • openshift/pkg/tests.go
  • openshift/test/evpn.go
  • test/e2e/evpn.go
  • test/e2e/route_advertisements.go

Walkthrough

Repoints various internal OpenShift test imports from openshift/test to openshift/pkg, adds comprehensive EVPN disruptive e2e tests (multiple disruption scenarios and helpers), introduces a provider-level RebootNode API + implementations, adds FRR-K8s DaemonSet naming APIs, and exports several previously internal test helpers.

Changes

Cohort / File(s) Summary
Import Path & CLI Adjustments
openshift/cmd/annotate/main.go, openshift/cmd/ovn-kubernetes-tests-ext/main.go, openshift/cmd/ovn-kubernetes-tests-ext/provider.go, openshift/hack/update-tests-annotation.sh
Switched imports and script references from openshift/test/... to openshift/pkg/...; added a blank import for legacy test-side effects where needed.
Go Module
openshift/go.mod
Moved k8s.io/utils to primary require (direct dependency) and added explanatory comments about k8s staging replace directives.
Annotate & Test Labels
openshift/pkg/annotate/rules.go, openshift/pkg/tests.go
Blank-imported OpenShift Ginkgo specs; updated label-to-group mapping to recognize "[Serial]" and added four EVPN disruptive Serial test entries in informing tests.
DeploymentConfig API & Impl
openshift/pkg/deploymentconfig/openshift.go, test/e2e/deploymentconfig/api/api.go, test/e2e/deploymentconfig/configs/kind/kind.go
Added FRRK8sDaemonSetName() string to the deployment config interface and provided OpenShift/Kind implementations.
InfraProvider Reboot API & Impl
test/e2e/infraprovider/api/api.go, test/e2e/infraprovider/providers/kind/kind.go, openshift/pkg/infraprovider/openshift.go
Added RebootNode(nodeName string) error to provider API; implemented Kind reboot (stop/start container) and OpenShift provider reboot (exec true then async systemctl reboot).
EVPN Tests & Helpers (OpenShift)
openshift/test/evpn.go
New OpenShift Ginkgo EVPN disruptive suite (serial) creating 3 EVPN networks, randomized VTEPs, pinned pods, multiple disruption scenarios, and comprehensive verification routines. Adds exported helper familyStr.
EVPN Core Utilities & Exported Helpers
test/e2e/evpn.go
Large set of changes: idempotent external FRR teardown, persist FRR config via write memory, idempotent VTEP loopback handling, exported FRR/config/helpers, disruptive-state operations, readiness/convergence waiters, and connectivity/isolation assertions.
E2E API Exports & Utility Wrappers
test/e2e/e2e.go, test/e2e/util.go, test/e2e/network_segmentation.go, test/e2e/network_segmentation_utils.go
Exposed multiple previously internal helpers: OVN pod restart wrappers, IP-family/subnet matchers, pod-annotation IP retrieval, node readiness wait, framework constructor, pod deletion-with-wait, and other utility forwarders.
Route Advertisement Updates
test/e2e/route_advertisements.go
Added cudnAdvertisedEVPNShared handling, adjusted FRR selector behavior to use testName, and refactored EVPN helper invocation to capture returns explicitly.
Miscellaneous Test Adjustments
test/e2e/evpn.go (existing e2e package)
Made external FRR kernel ops idempotent and extended exported disruptive helpers/state and setup wrappers (merged with EVPN core changes).

Sequence Diagram

sequenceDiagram
    participant Test as Test Suite
    participant FRR as External FRR (Spine)
    participant OVN as OVN-K / FRR-K8s (DaemonSets)
    participant Node as Worker Node
    participant Pod as EVPN Pods

    Test->>FRR: Create VNIs/VRFs, BGP config, VTEPs
    FRR-->>Test: BGP/EVPN ready

    Test->>OVN: Deploy pinned EVPN pods
    OVN-->>Test: Pods Running

    Test->>Test: Verify connectivity & isolation

    loop For each disruption scenario
        alt Node reboot
            Test->>Node: Reboot (provider RebootNode)
            Node-->>Test: Node back
            Test->>OVN: Wait daemonset rollout / recreate pods
            Test->>FRR: Re-add VTEP loopback
        else OVN-K restart
            Test->>OVN: Restart ovnkube-node daemonset
            OVN-->>Test: Rollout complete
        else FRR-K8s restart
            Test->>OVN: Restart FRR-K8s daemonset
            OVN-->>Test: Rollout complete
        else Spine restart
            Test->>FRR: Destroy/recreate kernel state + restart daemons
            FRR-->>Test: Ready
        end
        Test->>Test: Wait BGP/EVPN convergence
        Test->>Test: Re-verify connectivity & isolation
    end
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~40 minutes

Poem

🐇🥕 I hopped from test to pkg today,
Moved imports, helpers, and tests in play,
EVPNs bustle, reboots in a dance,
Rollouts and BGP get another chance,
Carrot-coded helpers sing — hop, test, hooray!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'e2e: EVPN with disruptive actions' clearly and concisely summarizes the main change: adding EVPN end-to-end tests that exercise disruptive failure scenarios.
Docstring Coverage ✅ Passed Docstring coverage is 89.66% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
openshift/go.mod (2)

234-254: Maintainability: replace overrides many k8s.io/* versions.

The require block pins e.g. k8s.io/api v0.35.1, but the replace block overrides it to github.com/openshift/kubernetes/staging/... at a different pseudo-version. This is probably intended for OpenShift compatibility, but it’s easy for future maintainers to misread.

Consider adding a short comment (if not already elsewhere) clarifying that the require versions are largely placeholders/constraints and the replace block is the real source of truth for the staging modules.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openshift/go.mod` around lines 234 - 254, Add a brief clarifying comment near
the go.mod replace block explaining that the top-level require pins (e.g.
k8s.io/api v0.35.1) are nominal and that the replace directives for k8s.io/*
modules (e.g. replace k8s.io/api =>
github.com/openshift/kubernetes/staging/src/k8s.io/api ...) are the
authoritative source used for OpenShift compatibility; place the comment
adjacent to the existing replace block and mention the specific modules
(k8s.io/api, k8s.io/apimachinery, k8s.io/client-go, etc.) so future maintainers
understand the intent.

14-14: Check the placeholder test/e2e pseudo-version is intentional.

Requiring github.com/ovn-kubernetes/ovn-kubernetes/test/e2e v0.0.0-00010101000000-000000000000 is often used alongside a local replace (here: => ../test/e2e). If this pattern is intentional, it’s fine; otherwise, consider tightening it to avoid confusion and to reduce surprises when running go mod tidy/go list.

Also applies to: 224-230

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openshift/go.mod` at line 14, The go.mod entry using the pseudo-version for
github.com/ovn-kubernetes/ovn-kubernetes/test/e2e looks like an intentional
local override pattern; verify whether you meant to depend on a local module via
a replace or on a published tag. If you intend a local module, add or keep a
matching replace directive pointing to ../test/e2e (or the correct relative
path) so the pseudo-version is explicit and tidy won't replace it; if you
intended to track a released version, replace the pseudo-version with the
correct module tag/commit SHA. Also apply the same verification/fix for the
other occurrences mentioned (lines 224-230).
test/e2e/util.go (1)

2118-2118: Docstring overstates behavior (“force-deletes”).

Line 2118 says force-deletes, but this path performs delete + wait, not force deletion semantics. Consider tightening wording to avoid confusion.

Suggested wording fix
-// DeletePodWithWaitByName force-deletes the named pod in the given namespace and waits for it to disappear.
+// DeletePodWithWaitByName deletes the named pod in the given namespace and waits for it to disappear.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/util.go` at line 2118, The docstring for DeletePodWithWaitByName
incorrectly says it "force-deletes" the pod; update the comment to accurately
reflect the behavior (it performs a normal delete and waits for the pod to
disappear) — e.g., change "force-deletes the named pod" to "deletes the named
pod and waits for it to disappear" or similar wording to remove the implication
of force-delete semantics.
test/e2e/infraprovider/providers/kind/kind.go (1)

158-163: Wrap start failure with reboot context.

Line 162 returns the raw StartContainer error; adding node/action context will improve failure diagnosis.

Suggested diff
 func (k *kind) RebootNode(nodeName string) error {
 	if err := k.engine.StopContainer(nodeName); err != nil {
 		return fmt.Errorf("failed to stop KinD node %q during reboot: %w", nodeName, err)
 	}
-	return k.engine.StartContainer(nodeName)
+	if err := k.engine.StartContainer(nodeName); err != nil {
+		return fmt.Errorf("failed to start KinD node %q during reboot: %w", nodeName, err)
+	}
+	return nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/infraprovider/providers/kind/kind.go` around lines 158 - 163, The
RebootNode function currently returns the raw error from
k.engine.StartContainer(nodeName); instead, wrap that error with reboot context
to improve diagnostics: after calling k.engine.StartContainer(nodeName) capture
its error and, if non-nil, return fmt.Errorf("failed to start KinD node %q
during reboot: %w", nodeName, err). Keep the existing StopContainer error
handling as-is (it already wraps), and update the return in RebootNode to wrap
StartContainer failures using the same pattern.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@openshift/pkg/infraprovider/openshift.go`:
- Around line 174-179: RebootNode currently discards any error from the reboot
initiation; change it to surface initiation failures by invoking
ExecK8NodeCommand in a goroutine that sends its error on a channel, then select
on that channel (with a short timeout) to return any non-nil error to the
caller; keep the actual reboot fire-and-forget semantics if the command succeeds
or if you purposely ignore a late failure, but do not always return nil.
Reference the RebootNode method and the ExecK8NodeCommand call when making this
change.

In `@openshift/test/evpn.go`:
- Around line 171-174: The test currently falls back by assigning
otherWorkerNode = targetWorkerNode which breaks topology; instead ensure two
distinct worker nodes are selected and fail the test if a second distinct worker
isn’t available: when discovering nodes, replace the fallback assignment with a
check that otherWorkerNode is non-empty and not equal to targetWorkerNode (use
gomega.Expect(otherWorkerNode).NotTo(gomega.BeEmpty(), ...) and
gomega.Expect(otherWorkerNode).NotTo(gomega.Equal(targetWorkerNode), ...));
ensure TestPod and otherPod are left targeted to those distinct nodes so the
node-restart path recreates both pods correctly.

In `@test/e2e/evpn.go`:
- Around line 909-918: The code is hardcoding "/32" when adding VTEP loopback
addresses via infraprovider.Get().ExecK8NodeCommand which breaks IPv6; replace
the fixed "/32" with a family-specific host prefix (e.g., use "/32" for IPv4 and
"/128" for IPv6) by checking the IP family (ip.To4() or equivalent) and append
the correct suffix when building the addr string passed to ExecK8NodeCommand;
also apply the same hostAddr/prefix selection logic to the host-cidrs polling
and cleanup paths so adding, polling and removal use the same family-specific
prefix consistently (update the helper that constructs the loopback addr and any
code paths that currently assume "/32").

---

Nitpick comments:
In `@openshift/go.mod`:
- Around line 234-254: Add a brief clarifying comment near the go.mod replace
block explaining that the top-level require pins (e.g. k8s.io/api v0.35.1) are
nominal and that the replace directives for k8s.io/* modules (e.g. replace
k8s.io/api => github.com/openshift/kubernetes/staging/src/k8s.io/api ...) are
the authoritative source used for OpenShift compatibility; place the comment
adjacent to the existing replace block and mention the specific modules
(k8s.io/api, k8s.io/apimachinery, k8s.io/client-go, etc.) so future maintainers
understand the intent.
- Line 14: The go.mod entry using the pseudo-version for
github.com/ovn-kubernetes/ovn-kubernetes/test/e2e looks like an intentional
local override pattern; verify whether you meant to depend on a local module via
a replace or on a published tag. If you intend a local module, add or keep a
matching replace directive pointing to ../test/e2e (or the correct relative
path) so the pseudo-version is explicit and tidy won't replace it; if you
intended to track a released version, replace the pseudo-version with the
correct module tag/commit SHA. Also apply the same verification/fix for the
other occurrences mentioned (lines 224-230).

In `@test/e2e/infraprovider/providers/kind/kind.go`:
- Around line 158-163: The RebootNode function currently returns the raw error
from k.engine.StartContainer(nodeName); instead, wrap that error with reboot
context to improve diagnostics: after calling k.engine.StartContainer(nodeName)
capture its error and, if non-nil, return fmt.Errorf("failed to start KinD node
%q during reboot: %w", nodeName, err). Keep the existing StopContainer error
handling as-is (it already wraps), and update the return in RebootNode to wrap
StartContainer failures using the same pattern.

In `@test/e2e/util.go`:
- Line 2118: The docstring for DeletePodWithWaitByName incorrectly says it
"force-deletes" the pod; update the comment to accurately reflect the behavior
(it performs a normal delete and waits for the pod to disappear) — e.g., change
"force-deletes the named pod" to "deletes the named pod and waits for it to
disappear" or similar wording to remove the implication of force-delete
semantics.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Enterprise

Run ID: 5d2e258b-4a45-4861-b49b-d72213144640

📥 Commits

Reviewing files that changed from the base of the PR and between ab5bb67 and 476eacd.

⛔ Files ignored due to path filters (1)
  • openshift/pkg/generated/zz_generated.annotations.go is excluded by !**/generated/**
📒 Files selected for processing (24)
  • openshift/cmd/annotate/main.go
  • openshift/cmd/ovn-kubernetes-tests-ext/main.go
  • openshift/cmd/ovn-kubernetes-tests-ext/provider.go
  • openshift/go.mod
  • openshift/hack/update-tests-annotation.sh
  • openshift/pkg/annotate/annotate.go
  • openshift/pkg/annotate/rules.go
  • openshift/pkg/deploymentconfig/openshift.go
  • openshift/pkg/e2e_test.go
  • openshift/pkg/infraprovider/baremetal.go
  • openshift/pkg/infraprovider/openshift.go
  • openshift/pkg/namespace.go
  • openshift/pkg/tests.go
  • openshift/test/evpn.go
  • test/e2e/deploymentconfig/api/api.go
  • test/e2e/deploymentconfig/configs/kind/kind.go
  • test/e2e/e2e.go
  • test/e2e/evpn.go
  • test/e2e/infraprovider/api/api.go
  • test/e2e/infraprovider/providers/kind/kind.go
  • test/e2e/network_segmentation.go
  • test/e2e/network_segmentation_utils.go
  • test/e2e/route_advertisements.go
  • test/e2e/util.go

Comment thread openshift/pkg/infraprovider/openshift.go
Comment thread openshift/test/evpn.go
Comment thread test/e2e/evpn.go Outdated
Clarify in docs that ShutdownNode keeps the node off until StartNode, and add
RebootNode for a full stop+start cycle without a separate StartNode.

KIND implements RebootNode by stopping and starting the node container.

Signed-off-by: Jean Chen <jechen@redhat.com>
Expose the frr-k8s DaemonSet name (e.g. frr-k8s-daemon on KIND) so tests can
wait for rollout after FRR-K8s pod restarts.

Signed-off-by: Jean Chen <jechen@redhat.com>
Export small wrappers used by the OpenShift EVPN work: IP/annotation helpers,
subnet filters, ovnkube-node restart functions, and node readiness/ GW checks,
without duplicating e2e internals in another module.

Signed-off-by: Jean Chen <jechen@redhat.com>
@jechen0648 jechen0648 force-pushed the evpn_disruptive_case branch from 476eacd to cb362e6 Compare April 28, 2026 11:36
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
openshift/test/evpn.go (1)

336-340: Avoid unconditional 10-minute sleep on BGP failure.

This can heavily inflate serial-lane runtime on failures; consider gating it behind a debug flag.

💡 Suggested refactor
 import (
 	"context"
 	"fmt"
 	"net"
+	"os"
 	"time"
@@
-			if bgpErr != nil {
+			if bgpErr != nil {
 				ginkgo.GinkgoLogr.Info("BGP EVPN sessions not established — pausing 10 minutes for debugging before cleanup")
 				ginkgo.GinkgoLogr.Info("Inspect with: oc get frrconfiguration -n openshift-frr-k8s -o yaml")
 				ginkgo.GinkgoLogr.Info("Inspect with: podman exec frr vtysh -c 'show bgp l2vpn evpn summary'")
-				time.Sleep(10 * time.Minute)
+				if os.Getenv("EVPN_DEBUG_PAUSE_ON_FAILURE") == "true" {
+					time.Sleep(10 * time.Minute)
+				}
 			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openshift/test/evpn.go` around lines 336 - 340, The unconditional 10-minute
time.Sleep after logging BGP failure should be gated behind a debug flag;
replace the direct time.Sleep(10 * time.Minute) in the evpn failure path with a
conditional that checks a boolean (e.g., debugPauseOnBGP or an env var like
DEBUG_BGP_PAUSE) before sleeping, log that the pause is enabled and its
duration, and ensure the flag defaults to false so CI/serial lanes are not
delayed; update references around ginkgo.GinkgoLogr.Info and the time.Sleep call
accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@openshift/test/evpn.go`:
- Around line 191-193: The three calls to e2e.RandomVTEPSubnets() assign the
error return to blank and thus discard failures; capture the error for each call
(e.g., use l3VtepV4, err := e2e.RandomVTEPSubnets()) and handle it immediately —
in this test function return or fail the test (e.g., t.Fatalf or t.Fatal) with a
clear message including the error and which VTEP (l3VtepV4, l2VtepV4,
l2l3VtepV4) failed, so invalid/empty CIDRs are not propagated later from
RandomVTEPSubnets.

In `@test/e2e/route_advertisements.go`:
- Around line 3024-3027: The shared EVPN mode case (cudnAdvertisedEVPNShared)
sets targetVRF and networkLabels but selects FRRConfiguration via
frrConfigurationLabels = map[string]string{"network": testName}, which won't
match the shared FRR config; update the EVPN infra setup code that creates
FRRConfigurations to handle the cudnAdvertisedEVPNShared branch and pass/label
the shared FRR config name, and change the selector here
(frrConfigurationLabels) to reference the shared FRR configuration key (e.g.,
use the shared FRR config name or label like {"network": networkName} or
{"shared-frr": testName}) so RouteAdvertisements can match the shared
FRRConfiguration; ensure the symbol cudnAdvertisedEVPNShared and
frrConfigurationLabels (and any FRR config creation function in the EVPN infra
path) are updated consistently.

---

Nitpick comments:
In `@openshift/test/evpn.go`:
- Around line 336-340: The unconditional 10-minute time.Sleep after logging BGP
failure should be gated behind a debug flag; replace the direct time.Sleep(10 *
time.Minute) in the evpn failure path with a conditional that checks a boolean
(e.g., debugPauseOnBGP or an env var like DEBUG_BGP_PAUSE) before sleeping, log
that the pause is enabled and its duration, and ensure the flag defaults to
false so CI/serial lanes are not delayed; update references around
ginkgo.GinkgoLogr.Info and the time.Sleep call accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Enterprise

Run ID: 3b17c4e7-0a22-4423-a019-be7b1028a2e7

📥 Commits

Reviewing files that changed from the base of the PR and between 476eacd and cb362e6.

⛔ Files ignored due to path filters (1)
  • openshift/pkg/generated/zz_generated.annotations.go is excluded by !**/generated/**
📒 Files selected for processing (24)
  • openshift/cmd/annotate/main.go
  • openshift/cmd/ovn-kubernetes-tests-ext/main.go
  • openshift/cmd/ovn-kubernetes-tests-ext/provider.go
  • openshift/go.mod
  • openshift/hack/update-tests-annotation.sh
  • openshift/pkg/annotate/annotate.go
  • openshift/pkg/annotate/rules.go
  • openshift/pkg/deploymentconfig/openshift.go
  • openshift/pkg/e2e_test.go
  • openshift/pkg/infraprovider/baremetal.go
  • openshift/pkg/infraprovider/openshift.go
  • openshift/pkg/namespace.go
  • openshift/pkg/tests.go
  • openshift/test/evpn.go
  • test/e2e/deploymentconfig/api/api.go
  • test/e2e/deploymentconfig/configs/kind/kind.go
  • test/e2e/e2e.go
  • test/e2e/evpn.go
  • test/e2e/infraprovider/api/api.go
  • test/e2e/infraprovider/providers/kind/kind.go
  • test/e2e/network_segmentation.go
  • test/e2e/network_segmentation_utils.go
  • test/e2e/route_advertisements.go
  • test/e2e/util.go
✅ Files skipped from review due to trivial changes (3)
  • openshift/hack/update-tests-annotation.sh
  • openshift/pkg/annotate/rules.go
  • test/e2e/network_segmentation_utils.go
🚧 Files skipped from review as they are similar to previous changes (11)
  • test/e2e/deploymentconfig/api/api.go
  • test/e2e/deploymentconfig/configs/kind/kind.go
  • openshift/pkg/deploymentconfig/openshift.go
  • openshift/cmd/annotate/main.go
  • openshift/cmd/ovn-kubernetes-tests-ext/provider.go
  • openshift/pkg/infraprovider/openshift.go
  • test/e2e/e2e.go
  • test/e2e/infraprovider/api/api.go
  • test/e2e/network_segmentation.go
  • test/e2e/util.go
  • test/e2e/evpn.go

Comment thread openshift/test/evpn.go
Comment thread test/e2e/route_advertisements.go
Support multiple EVPN namespaces sharing one FRRConfiguration: RouteAdvertisements
selectors use a per-network advertise label while FRRConfiguration uses {network: testName}.

Signed-off-by: Jean Chen <jechen@redhat.com>
- Make FRR setup idempotent: apply FRRConfiguration, write memory, tolerate
  cleanup errors, idempotent VTEP loopback addresses
- Use shared FRRConfiguration for multiple EVPN sessions
- Disruptive flow: EVPNDisruptiveState, destroy/reapply kernel state on external
  FRR, FRR daemon restart, FRR-k8s and ovnkube-node/DS checks, BGP and VNI
  verification (parse show evpn vni json; L2 numRemoteVteps, L3 numArpNd)
- Export SetupEVPNNetworkWithServers, network spec factories, and pod reachability
  helpers; rename to RestartExternalFRRDaemons

Signed-off-by: Jean Chen <jechen@redhat.com>
Relocate shared OpenShift test registration (lists, generated annotations, annotate
rules, deploymentconfig and infraprovider shims) from openshift/test/ to
openshift/pkg/ and update annotate, go generate, and ovn-kubernetes-tests-ext imports.

Add openshift/test/evpn.go: multi-VPN EVPN coverage with node, OVN-K, FRR-K8s, and
external FRR (spine) disruption using the shared helpers in test/e2e.

Signed-off-by: Jean Chen <jechen@redhat.com>
@jechen0648 jechen0648 force-pushed the evpn_disruptive_case branch from cb362e6 to ff5558e Compare April 28, 2026 11:59
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 29, 2026

@jechen0648: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/4.22-upgrade-from-stable-4.21-e2e-gcp-ovn-rt-upgrade 7813f24 link true /test 4.22-upgrade-from-stable-4.21-e2e-gcp-ovn-rt-upgrade
ci/prow/e2e-aws-ovn-upgrade-local-gateway 7813f24 link true /test e2e-aws-ovn-upgrade-local-gateway
ci/prow/e2e-azure-ovn-upgrade 7813f24 link true /test e2e-azure-ovn-upgrade
ci/prow/security 7813f24 link false /test security
ci/prow/e2e-aws-ovn-rhcos10-techpreview 7813f24 link false /test e2e-aws-ovn-rhcos10-techpreview
ci/prow/e2e-aws-ovn-upgrade 7813f24 link true /test e2e-aws-ovn-upgrade
ci/prow/e2e-gcp-ovn-techpreview 7813f24 link true /test e2e-gcp-ovn-techpreview

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant