Skip to content

Controller to deploy KSM per HCP #5383

Draft
venkateshsredhat wants to merge 1 commit into
mainfrom
ksm-controller
Draft

Controller to deploy KSM per HCP #5383
venkateshsredhat wants to merge 1 commit into
mainfrom
ksm-controller

Conversation

@venkateshsredhat
Copy link
Copy Markdown
Collaborator

What

Controller to deploy KSM per HCP

Why

We are trying to implement End to End alerting for Customer Notification . Initially need metrics around the Customer Node health and we are using KSM to scrape directly from the HCP kubeapi server .

Testing

Special notes for your reviewer

Still in draft .

Copilot AI review requested due to automatic review settings May 26, 2026 07:35
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 26, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: venkateshsredhat
Once this PR has been reviewed and has the lgtm label, please assign deads2k for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new mgmt-agent controller that watches HyperShift HostedControlPlane objects and deploys kube-state-metrics plus a ServiceMonitor into each HCP namespace, enabling node-health metrics scraping via the HCP kube-apiserver.

Changes:

  • Introduces ksmhcp controller code to reconcile Deployment, Service, and ServiceMonitor per HCP namespace.
  • Wires the new controller into the mgmt-agent binary with a --ksm-image flag and Helm chart plumbing for the image.
  • Expands RBAC and updates Go dependencies to include HyperShift clients and Prometheus Operator monitoring types.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
mgmt-agent/values.yaml Adds Helm values for the kube-state-metrics image (registry/repo/digest).
mgmt-agent/deploy/templates/deployment.yaml Passes --ksm-image=... into the controller container args.
mgmt-agent/deploy/templates/clusterrole.yaml Adds RBAC for HostedControlPlanes + managing Deployments/Services/ServiceMonitors.
mgmt-agent/cmd/options.go Creates HyperShift + dynamic clients, instantiates and runs the new controller, adds --ksm-image flag.
mgmt-agent/pkg/controller/ksmhcp/controller.go New controller reconcile loop + ensure helpers for Deployment/Service/ServiceMonitor.
mgmt-agent/pkg/controller/ksmhcp/resources.go Builders for Deployment/Service/ServiceMonitor resources.
mgmt-agent/go.mod Adds/adjusts module dependencies (HyperShift + Prometheus Operator APIs) and replace directives.
mgmt-agent/go.sum Dependency checksum updates.

Comment on lines +252 to +266
func (c *KSMHCPController) ensureServiceMonitor(ctx context.Context, desired *unstructured.Unstructured) error {
client := c.dynamicClient.Resource(serviceMonitorGVR).Namespace(desired.GetNamespace())

_, err := client.Get(ctx, desired.GetName(), metav1.GetOptions{})
if apierrors.IsNotFound(err) {
_, err = client.Create(ctx, desired, metav1.CreateOptions{FieldManager: fieldManager})
return err
}
if err != nil {
return err
}

_, err = client.Update(ctx, desired, metav1.UpdateOptions{FieldManager: fieldManager})
return err
}
Comment on lines +197 to +198
u, _ := toUnstructured(sm)
return u
Comment thread mgmt-agent/cmd/options.go
Comment on lines 69 to +73
cmd.Flags().StringVar(&o.HealthAddress, "health-address", o.HealthAddress, "The bind address for the health check server (e.g., ':8080')")
cmd.Flags().StringVar(&o.Kubeconfig, "kubeconfig", "", "Path to a kubeconfig. Optional.")
cmd.Flags().StringVar(&o.Namespace, "namespace", os.Getenv("POD_NAMESPACE"), "The namespace where the mgmt-agent controller is deployed.")
cmd.Flags().IntVar(&o.Workers, "workers", o.Workers, "Number of reconcile workers to run")
cmd.Flags().StringVar(&o.KSMImage, "ksm-image", o.KSMImage, "Container image for kube-state-metrics deployed per HCP namespace")
Comment thread mgmt-agent/cmd/options.go
Comment on lines +256 to +260
// ksm-hcp controller with leader election
g.Go(func() error {
logger.Info("Starting KSM HCP controller")
if err := controller.RunWithLeaderElection(ctx, "ksm-hcp", o.leaderElectionCfg, func(leaderCtx context.Context) error {
return o.ksmCtrl.Run(leaderCtx, o.workers)
Comment thread mgmt-agent/go.mod
Comment on lines +5 to +9
replace (
github.com/aws/karpenter-provider-aws => github.com/aws/karpenter-provider-aws v1.0.0
github.com/openshift/hypershift/api => github.com/openshift/hypershift/api v0.0.0-20260226113135-8ab86680f975
sigs.k8s.io/karpenter => sigs.k8s.io/karpenter v1.0.0
)
- create
- update
- patch
- delete
- create
- update
- patch
- delete
- create
- update
- patch
- delete
Kind: "HostedControlPlane",
Name: hcp.Name,
UID: hcp.UID,
}
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be checked and changed .

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Garbage collection now ties KSM resource we create to Hypershift owner based cleanup , we should have our own deletion reconciler to do the cleanup .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants