Controller to deploy KSM per HCP #5383
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: venkateshsredhat The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Pull request overview
Adds a new mgmt-agent controller that watches HyperShift HostedControlPlane objects and deploys kube-state-metrics plus a ServiceMonitor into each HCP namespace, enabling node-health metrics scraping via the HCP kube-apiserver.
Changes:
- Introduces
ksmhcpcontroller code to reconcileDeployment,Service, andServiceMonitorper HCP namespace. - Wires the new controller into the mgmt-agent binary with a
--ksm-imageflag and Helm chart plumbing for the image. - Expands RBAC and updates Go dependencies to include HyperShift clients and Prometheus Operator monitoring types.
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| mgmt-agent/values.yaml | Adds Helm values for the kube-state-metrics image (registry/repo/digest). |
| mgmt-agent/deploy/templates/deployment.yaml | Passes --ksm-image=... into the controller container args. |
| mgmt-agent/deploy/templates/clusterrole.yaml | Adds RBAC for HostedControlPlanes + managing Deployments/Services/ServiceMonitors. |
| mgmt-agent/cmd/options.go | Creates HyperShift + dynamic clients, instantiates and runs the new controller, adds --ksm-image flag. |
| mgmt-agent/pkg/controller/ksmhcp/controller.go | New controller reconcile loop + ensure helpers for Deployment/Service/ServiceMonitor. |
| mgmt-agent/pkg/controller/ksmhcp/resources.go | Builders for Deployment/Service/ServiceMonitor resources. |
| mgmt-agent/go.mod | Adds/adjusts module dependencies (HyperShift + Prometheus Operator APIs) and replace directives. |
| mgmt-agent/go.sum | Dependency checksum updates. |
| func (c *KSMHCPController) ensureServiceMonitor(ctx context.Context, desired *unstructured.Unstructured) error { | ||
| client := c.dynamicClient.Resource(serviceMonitorGVR).Namespace(desired.GetNamespace()) | ||
|
|
||
| _, err := client.Get(ctx, desired.GetName(), metav1.GetOptions{}) | ||
| if apierrors.IsNotFound(err) { | ||
| _, err = client.Create(ctx, desired, metav1.CreateOptions{FieldManager: fieldManager}) | ||
| return err | ||
| } | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| _, err = client.Update(ctx, desired, metav1.UpdateOptions{FieldManager: fieldManager}) | ||
| return err | ||
| } |
| u, _ := toUnstructured(sm) | ||
| return u |
| cmd.Flags().StringVar(&o.HealthAddress, "health-address", o.HealthAddress, "The bind address for the health check server (e.g., ':8080')") | ||
| cmd.Flags().StringVar(&o.Kubeconfig, "kubeconfig", "", "Path to a kubeconfig. Optional.") | ||
| cmd.Flags().StringVar(&o.Namespace, "namespace", os.Getenv("POD_NAMESPACE"), "The namespace where the mgmt-agent controller is deployed.") | ||
| cmd.Flags().IntVar(&o.Workers, "workers", o.Workers, "Number of reconcile workers to run") | ||
| cmd.Flags().StringVar(&o.KSMImage, "ksm-image", o.KSMImage, "Container image for kube-state-metrics deployed per HCP namespace") |
| // ksm-hcp controller with leader election | ||
| g.Go(func() error { | ||
| logger.Info("Starting KSM HCP controller") | ||
| if err := controller.RunWithLeaderElection(ctx, "ksm-hcp", o.leaderElectionCfg, func(leaderCtx context.Context) error { | ||
| return o.ksmCtrl.Run(leaderCtx, o.workers) |
| replace ( | ||
| github.com/aws/karpenter-provider-aws => github.com/aws/karpenter-provider-aws v1.0.0 | ||
| github.com/openshift/hypershift/api => github.com/openshift/hypershift/api v0.0.0-20260226113135-8ab86680f975 | ||
| sigs.k8s.io/karpenter => sigs.k8s.io/karpenter v1.0.0 | ||
| ) |
| - create | ||
| - update | ||
| - patch | ||
| - delete |
| - create | ||
| - update | ||
| - patch | ||
| - delete |
| - create | ||
| - update | ||
| - patch | ||
| - delete |
| Kind: "HostedControlPlane", | ||
| Name: hcp.Name, | ||
| UID: hcp.UID, | ||
| } |
There was a problem hiding this comment.
This needs to be checked and changed .
There was a problem hiding this comment.
Garbage collection now ties KSM resource we create to Hypershift owner based cleanup , we should have our own deletion reconciler to do the cleanup .
What
Controller to deploy KSM per HCP
Why
We are trying to implement End to End alerting for Customer Notification . Initially need metrics around the Customer Node health and we are using KSM to scrape directly from the HCP kubeapi server .
Testing
Special notes for your reviewer
Still in draft .