Skip to content

Add fleet controller with data dump controller#5389

Open
geoberle wants to merge 1 commit into
Azure:mainfrom
geoberle:phase2-fleet-controller-base
Open

Add fleet controller with data dump controller#5389
geoberle wants to merge 1 commit into
Azure:mainfrom
geoberle:phase2-fleet-controller-base

Conversation

@geoberle
Copy link
Copy Markdown
Collaborator

What

Introduce the fleet controller component with the controller manager and the data dump controller.

This commit includes:

  • Fleet Go module, Dockerfile, Makefile, and build integration
  • Helm chart with deployment, RBAC, service account, and metrics
  • Deployment pipeline with image mirroring and Helm deploy steps
  • Azure infrastructure: managed identity, CosmosDB access, fleet lookup
  • Config schema and values for fleet (managedIdentityName, k8s namespace)
  • Prometheus rules and observability queries
  • Topology entry for the service cluster fleet deployment

https://redhat.atlassian.net/browse/ARO-27232

Why

Testing

Special notes for your reviewer

PR Checklist

  • PR is scoped to a single task (no mixed concerns)
  • Title follows Conventional Commits format
  • Summary explains the "Why" behind the change
  • Linked to relevant ticket/issue
  • Screenshots included (if graph/UI/metrics changes)
  • Self-reviewed the diff
  • CI/CD checks are passing (ignore Tide)
  • Draft PR used for WIP (if applicable)
  • Commit history is clean (rebased/squashed)
  • Tricky code blocks are commented
  • Specific reviewers tagged
  • All comment threads resolved before merge

Copilot AI review requested due to automatic review settings May 26, 2026 12:38
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 26, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: geoberle

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a new fleet service/component (Go module + controller manager + Helm deployment + pipeline + infra wiring) and integrates it into service-cluster topology and observability so it can be built, deployed, and monitored like existing ARO-HCP services.

Changes:

  • Add the Fleet controller binary/module (manager, leader election, base controller helpers, initial “data dump” controller) plus build/docker integration.
  • Add Fleet Helm chart + deployment pipeline and wire Azure infra dependencies (managed identity + CosmosDB access + lookup templates).
  • Extend observability (Prometheus rules + ad-hoc query panels) and register Fleet rollout in topology/config.

Reviewed changes

Copilot reviewed 45 out of 46 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
topology.yaml Registers Fleet as a serviceGroup with its own pipeline.
test/cmd/aro-hcp-tests/gather-observability/queries.yaml Adds Fleet-focused Prometheus queries/panels.
observability/observability.yaml Includes Fleet PrometheusRule in the aggregated rule set.
Makefile Adds build-fleet and includes it in build-services/record-services-override.
internal/azsdk/clientoptions.go Adds ComponentFleet and a cloud-name→azcore cloud configuration mapper.
go.work Adds the fleet module to the workspace.
fleet/zz_fixture_TestHelmTemplate_dev_westus3_svc_1_fleet.yaml Adds rendered Helm fixture output for Fleet chart.
fleet/values.yaml Fleet Helm values (templated via config preprocessing).
fleet/pkg/manager/manager.go Fleet controller manager: metrics/health HTTP servers + leader election + controller wiring.
fleet/pkg/manager/leader_election_wiring.go Builds Leases-based leader election lock and defines timing constants.
fleet/pkg/controllers/datadump/stamp_data_dumper.go Initial “data dump” controller watching stamps/management clusters.
fleet/pkg/controllers/base/stamp_watching_controller.go Shared controller base for stamp-scoped reconciliation with cooldown + etag gating.
fleet/pkg/controllers/base/management_cluster_watching_controller.go Shared controller base for management-cluster reconciliation with cooldown + etag gating.
fleet/pkg/controllers/base/management_cluster_watching_controller_test.go Unit tests for the management-cluster controller base.
fleet/pkg/controllers/base/controller_metrics.go Adds fleet_controller_reconcile_total metric.
fleet/pipeline.yaml Fleet rollout pipeline: image mirroring + infra lookup + Helm deploy.
fleet/namespace.yaml Fleet namespace manifest applied before Helm release.
fleet/Makefile Fleet build/image/push and override recording targets.
fleet/main.go Fleet CLI entrypoint and logging setup; wires controller subcommand.
fleet/go.sum Fleet module dependency lockfile.
fleet/go.mod Fleet module definition and dependencies.
fleet/Env.mk Fleet environment template variables for build/deploy.
fleet/Dockerfile Fleet container build (builder + distroless runtime).
fleet/deploy/templates/servicemonitor.yaml ServiceMonitor for scraping Fleet metrics.
fleet/deploy/templates/serviceaccount.yaml ServiceAccount with workload identity annotations.
fleet/deploy/templates/peerauthentication.yaml Istio PeerAuthentication for metrics port.
fleet/deploy/templates/metrics.service.yaml Metrics Service exposing port 8081.
fleet/deploy/templates/leader-election.rolebinding.yaml RBAC binding for lease operations.
fleet/deploy/templates/leader-election.role.yaml RBAC role for coordination.k8s.io leases.
fleet/deploy/templates/controller.deployment.yaml Fleet controller Deployment spec and runtime args/probes.
fleet/deploy/templates/allow-metrics.authorizationpolicy.yaml Istio AuthorizationPolicy allowing /metrics.
fleet/deploy/Chart.yaml Fleet Helm chart metadata.
fleet/cmd/controller/options.go Controller command options/validation/completion wiring (Cosmos + leader election lock).
fleet/cmd/controller/options_test.go Unit tests for controller options validation.
fleet/cmd/controller/cmd.go Controller subcommand creation and run pipeline.
fleet/alerts/fleet-prometheusRule.yaml Prometheus alerts for Fleet workqueues and panics.
fleet/alerts/fleet-prometheusRule_test.yaml promtool-style rule tests for Fleet alerts.
fleet/.gitignore Ignores the built fleet binary in-module.
dev-infrastructure/templates/svc-cluster.bicep Adds Fleet workload identity and includes it in Cosmos role assignments.
dev-infrastructure/templates/mgmt-infra-lookup.bicep Extends infra lookup to include MSI Key Vault lookup/outputs.
dev-infrastructure/modules/fleet/fleet-lookup.bicep New lookup module for Fleet MI client ID + Cosmos endpoint + DNS zone id.
dev-infrastructure/configurations/svc-cluster.tmpl.bicepparam Wires Fleet MI + namespace + serviceaccount params into svc-cluster template.
dev-infrastructure/configurations/mgmt-infra-lookup.tmpl.bicepparam Supplies msiKeyVaultName param for mgmt-infra lookup template.
dev-infrastructure/configurations/fleet-lookup.tmpl.bicepparam Parameter file for fleet-lookup module.
config/config.yaml Adds Fleet defaults for managed identity name + namespace/service account.
config/config.schema.json Adds/requires fleet.managedIdentityName and fleet.k8s fields in schema.

Comment thread fleet/deploy/templates/controller.deployment.yaml Outdated
Comment thread fleet/pkg/controllers/datadump/stamp_data_dumper.go
Comment thread config/config.schema.json
Comment thread fleet/pkg/controllers/base/stamp_watching_controller.go
Comment thread fleet/Dockerfile Outdated
@geoberle geoberle force-pushed the phase2-fleet-controller-base branch 2 times, most recently from 710df46 to 8e3c451 Compare May 26, 2026 13:13
Copilot AI review requested due to automatic review settings May 26, 2026 13:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 52 out of 53 changed files in this pull request and generated 5 comments.

Comment thread fleet/zz_fixture_TestHelmTemplate_dev_westus3_svc_1_fleet.yaml Outdated
Comment thread fleet/pkg/controllers/datadump/stamp_data_dumper.go
Comment thread observability/observability.yaml
Comment thread fleet/pkg/manager/manager.go
Comment thread topology.yaml Outdated
@geoberle geoberle force-pushed the phase2-fleet-controller-base branch from 8e3c451 to 4b56ae2 Compare May 26, 2026 14:29
Copilot AI review requested due to automatic review settings May 26, 2026 14:47
@geoberle geoberle force-pushed the phase2-fleet-controller-base branch from 4b56ae2 to a505c26 Compare May 26, 2026 14:47
@geoberle geoberle force-pushed the phase2-fleet-controller-base branch from a505c26 to 07f3146 Compare May 26, 2026 14:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 51 out of 52 changed files in this pull request and generated 2 comments.

Comment thread fleet/pkg/manager/manager.go
Comment thread fleet/cmd/controller/options.go
Introduce the fleet controller component with the controller manager
and the data dump controller.

- Fleet Go module, Dockerfile, Makefile, and build integration
- Helm chart with deployment, RBAC, service account, and metrics
- Deployment pipeline with image mirroring and Helm deploy steps
- Azure infrastructure: managed identity, CosmosDB access, fleet lookup
- Config schema and values for fleet (managedIdentityName, k8s namespace)
- Prometheus rules and observability queries
- Topology entry for the service cluster fleet deployment
- StampWatchingController as the single base controller for all fleet
  controllers — management clusters share the same StampKey since each
  stamp maps to exactly one management cluster
- HTTP server goroutines cancel the manager context on failure, matching
  the backend's fail-fast pattern
@geoberle geoberle force-pushed the phase2-fleet-controller-base branch from 07f3146 to de00ee7 Compare May 26, 2026 15:33
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 26, 2026

@geoberle: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/verify de00ee7 link true /test verify

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants