Skip to content

network policies and metrics tweaks#1013

Open
rphillips wants to merge 1 commit intoopenshift:mainfrom
rphillips:bug/93479
Open

network policies and metrics tweaks#1013
rphillips wants to merge 1 commit intoopenshift:mainfrom
rphillips:bug/93479

Conversation

@rphillips
Copy link
Contributor

@rphillips rphillips commented Nov 13, 2025

This PR creates the necessary network policies at operator startup. It also creates a separate metrics kubernetes service just for the operator. This way we get metrics for the operator without creating the Kueue Custom Resource. Once the Kueue Custom Resource is created then everything else gets deployed which includes the operand metrics service monitor.

Network Policy Reorganization

  • Reorganized network policies into operator/ and operand/ subdirectories under bindata/assets/kueue-operator/networkpolicy/
  • Added runlevel prefixes to ensure proper ordering (01-05 for allow policies, 99 for deny-all)
  • Operator policies:
    • 01-allow-egress-api.yaml - egress to kube-apiserver
    • 02-allow-egress-cluster-dns.yaml - egress to cluster DNS
    • 03-allow-ingress-metrics.yaml - ingress from Prometheus
    • 99-deny-all.yaml - default deny
  • Operand policies:
    • 01-allow-egress-api.yaml - egress to kube-apiserver
    • 02-allow-egress-cluster-dns.yaml - egress to cluster DNS
    • 03-allow-ingress-webhook.yaml - ingress for webhooks
    • 04-allow-ingress-visibility.yaml - ingress for visibility
    • 05-allow-ingress-metrics.yaml - ingress for metrics
    • 99-deny-all.yaml - default deny

Operator Metrics & Monitoring

  • Added TLS-secured metrics endpoint on port 8443
  • Added ServiceMonitor for Prometheus Operator integration
  • Added cert-manager Certificate and Issuer resources for automatic TLS certificate management
  • Added metrics Service exposing port 60000 (targeting container port 8443)
  • Added Prometheus RBAC resources for metrics secret access

RBAC Enhancements

  • Added subjectaccessreviews.authorization.k8s.io create permission
  • Added tokenreviews.authentication.k8s.io create permission
  • Required for metrics endpoint authentication and authorization

Deployment Changes

  • Updated TLS certificate volume mount path from /etc/metrics-tls to /var/run/secrets/serving-cert
  • Changed metrics containerPort from 60000 to 8443
  • Leverages library-go's built-in DynamicServingCertificateController for automatic certificate reloading

Code Changes

  • Added ensureOperatorNetworkPolicies() - applies operator network policies during controller initialization
  • Added ensureOperatorServiceMonitor() - creates ServiceMonitor if CRD is available
  • Added ensurePrometheusRBAC() - creates RBAC resources for Prometheus metrics access
  • Enhanced e2e tests for network policy and metrics endpoint validation

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 13, 2025
@openshift-ci
Copy link

openshift-ci bot commented Nov 13, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rphillips

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 13, 2025
@rphillips rphillips changed the title WIP: network policies and metrics tweaks network policies and metrics tweaks Nov 13, 2025
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 13, 2025
@rphillips rphillips changed the title network policies and metrics tweaks OCPBUGS-93479: network policies and metrics tweaks Nov 14, 2025
@openshift-ci-robot
Copy link

@rphillips: No Jira issue with key OCPBUGS-93479 exists in the tracker at https://issues.redhat.com/.
Once a valid jira issue is referenced in the title of this pull request, request a refresh with /jira refresh.

Details

In response to this:

This PR creates the necessary network policies at operator startup. It also creates a separate metrics kubernetes service just for the operator. This way we get metrics for the operator without creating the Kueue Custom Resource. Once the Kueue Custom Resource is created then everything else gets deployed which includes the operand metrics service monitor.

Network Policy Reorganization

  • Reorganized network policies into operator/ and operand/ subdirectories under bindata/assets/kueue-operator/networkpolicy/
  • Added runlevel prefixes to ensure proper ordering (01-05 for allow policies, 99 for deny-all)
  • Operator policies:
    • 01-allow-egress-api.yaml - egress to kube-apiserver
    • 02-allow-egress-cluster-dns.yaml - egress to cluster DNS
    • 03-allow-ingress-metrics.yaml - ingress from Prometheus
    • 99-deny-all.yaml - default deny
  • Operand policies:
    • 01-allow-egress-api.yaml - egress to kube-apiserver
    • 02-allow-egress-cluster-dns.yaml - egress to cluster DNS
    • 03-allow-ingress-webhook.yaml - ingress for webhooks
    • 04-allow-ingress-visibility.yaml - ingress for visibility
    • 05-allow-ingress-metrics.yaml - ingress for metrics
    • 99-deny-all.yaml - default deny

Operator Metrics & Monitoring

  • Added TLS-secured metrics endpoint on port 8443
  • Added ServiceMonitor for Prometheus Operator integration
  • Added cert-manager Certificate and Issuer resources for automatic TLS certificate management
  • Added metrics Service exposing port 60000 (targeting container port 8443)
  • Added Prometheus RBAC resources for metrics secret access

RBAC Enhancements

  • Added subjectaccessreviews.authorization.k8s.io create permission
  • Added tokenreviews.authentication.k8s.io create permission
  • Required for metrics endpoint authentication and authorization

Deployment Changes

  • Updated TLS certificate volume mount path from /etc/metrics-tls to /var/run/secrets/serving-cert
  • Changed metrics containerPort from 60000 to 8443
  • Leverages library-go's built-in DynamicServingCertificateController for automatic certificate reloading

Code Changes

  • Added ensureOperatorNetworkPolicies() - applies operator network policies during controller initialization
  • Added ensureOperatorServiceMonitor() - creates ServiceMonitor if CRD is available
  • Added ensurePrometheusRBAC() - creates RBAC resources for Prometheus metrics access
  • Enhanced e2e tests for network policy and metrics endpoint validation

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@rphillips rphillips changed the title OCPBUGS-93479: network policies and metrics tweaks network policies and metrics tweaks Nov 14, 2025
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 1, 2025
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 9, 2025
@@ -1,4 +1,10 @@
<<<<<<<< HEAD:bindata/assets/kueue-operator/networkpolicy/10-allow-ingress-egress-metrics.yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like there are some conflict issues here.

@openshift-ci
Copy link

openshift-ci bot commented Jan 28, 2026

@rphillips: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/test-e2e-4-20 e772e32 link true /test test-e2e-4-20
ci/prow/test-e2e-4-19 e92b05d link true /test test-e2e-4-19
ci/prow/test-e2e-downstream-4-20 e92b05d link true /test test-e2e-downstream-4-20
ci/prow/test-e2e-4-18 e92b05d link true /test test-e2e-4-18
ci/prow/test-e2e-upstream-4-20 e92b05d link true /test test-e2e-upstream-4-20
ci/prow/test-e2e-downstream-4-21 e92b05d link true /test test-e2e-downstream-4-21
ci/prow/test-e2e-upstream-4-21 e92b05d link true /test test-e2e-upstream-4-21

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants