Skip to content

Network traffic stops after Retina DaemonSet restart on AKS with Cilium #1804

@jan-machacek-kosik

Description

@jan-machacek-kosik

cilium.log
retina.log

We are running an AKS cluster with Cilium installed and would like to use Retina for network observability.

However, after restarting the Retina DaemonSet (e.g., due to a configuration update), all network traffic in the AKS cluster stops. The only way to restore connectivity is by restarting the Cilium agent on the affected nodes.

Additionally, after Retina restarts, the following warning repeatedly appears in the Cilium logs:

level=warning msg="Detected unexpected endpoint BPF program removal. Consider investigating whether other software running on this machine is removing Cilium's endpoint BPF programs. If endpoint BPF programs are removed, the associated pods will lose connectivity and only reinstating the programs will restore connectivity." count=12 subsys=daemon

This makes Retina a no-go for us in production.

Steps to reproduce:

  • Deploy Retina in an AKS cluster with Cilium.
  • Restart the Retina DaemonSet (e.g., kubectl rollout restart daemonset retina).
  • Observe that network traffic stops.
  • Check Cilium logs for warnings about BPF program removal.
  • Restart the Cilium agent on the affected nodes to restore connectivity.

Expected behavior:
Restarting Retina should not interfere with Cilium’s BPF programs or disrupt network traffic.

Environment:

  • AKS
  • Cilium version: 1.17.4
  • Retina version: v0.0.36
  • Kubernetes version: 1.32.3

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions