Skip to content

ARO-26057: add ADX Grafana datasource provisioning#4878

Open
swiencki wants to merge 13 commits into
mainfrom
adx-grafana-datasource-upstream
Open

ARO-26057: add ADX Grafana datasource provisioning#4878
swiencki wants to merge 13 commits into
mainfrom
adx-grafana-datasource-upstream

Conversation

@swiencki
Copy link
Copy Markdown
Collaborator

@swiencki swiencki commented Apr 14, 2026

ARO-26057

What

Adds ADX/Kusto Grafana datasource provisioning through the geography rollout using the typed GrafanaDatasources action. The step forwards ADX desired state to grafanactl modify datasource reconcile, grants the Grafana managed identity Kusto Viewer access to ServiceLogs, and removes the prior rollout-time go run shell path.

Why

This follows the existing typed/prebuilt-binary pattern for Grafana datasource rollout while enabling SRE dashboards to query Kusto data from Grafana.

Testing

  • GOWORK=/tmp/aro-hcp-adx-local.work go test ./pkg/pipeline from tooling/templatize
  • git diff --check
  • Local validation against the paired ARO-Tools change

Special notes for your reviewer

Depends on Azure/ARO-Tools#228. Downstream sdp-pipelines rollout packaging is https://dev.azure.com/msazure/AzureRedHatOpenShift/_git/sdp-pipelines/pullrequest/15393796.

Required merge order: ARO-Tools first, then this ARO-HCP PR with the ARO-Tools pin bump, then sdp-pipelines with the ARO-Tools/ARO-HCP pin bumps and make -C hcp/ update run twice.

The Kusto role-assignment rename fixes a name collision between ingest and viewer grants. After first deployment, old grant-${guid(...)} Kusto role assignments may remain as harmless duplicates; ops can do a one-time cleanup if desired.

Copilot AI review requested due to automatic review settings April 14, 2026 21:56
@openshift-ci openshift-ci Bot requested review from geoberle and janboll April 14, 2026 21:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds ARO-HCP support to provision and clean up Azure Data Explorer (Kusto) data sources in the shared Managed Grafana workspace, integrated into the existing geography rollout flow.

Changes:

  • Extend Kusto IaC to optionally grant Grafana’s managed identity DB-level Viewer access and output the cluster URI.
  • Add geography pipeline steps to fetch global Grafana outputs and provision/update ADX Grafana datasources via Azure CLI.
  • Introduce configuration flags/schema + docs, and update the Kusto delete cleanup to remove the matching Grafana datasource.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
docs/monitoring.md Documents ADX/Kusto datasource provisioning gates, naming, validation, and teardown behavior.
dev-infrastructure/templates/kusto.bicep Threads optional Grafana resource ID into the Kusto module and exposes kustoUri output.
dev-infrastructure/modules/logs/kusto/main.bicep Grants Grafana MI Viewer on ServiceLogs when provided and surfaces cluster URI output.
dev-infrastructure/modules/logs/kusto/cluster.bicep Exposes Kusto cluster uri as an output.
dev-infrastructure/geography-pipeline.yaml Adds global output step and a shell step to create/update ADX Grafana datasources.
dev-infrastructure/configurations/kusto.tmpl.bicepparam Adds grafanaResourceId parameter placeholder for pipeline substitution.
dev-infrastructure/cleanup/delete.kusto.instance.sh Extends cleanup to also delete the corresponding Grafana datasource.
dev-infrastructure/cleanup/delete.kusto.instance.pipeline.yaml Wires Grafana resource ID from global outputs into the cleanup script run.
config/rendered/dev/swft/uksouth.yaml Adds default monitoring flags for ADX datasource provisioning (disabled).
config/rendered/dev/prow/westus3.yaml Adds default monitoring flags for ADX datasource provisioning (disabled).
config/rendered/dev/pers/westus3.yaml Adds default monitoring flags for ADX datasource provisioning (disabled).
config/rendered/dev/perf/westus3.yaml Adds default monitoring flags for ADX datasource provisioning (disabled).
config/rendered/dev/dev/westus3.yaml Adds default monitoring flags for ADX datasource provisioning (disabled).
config/rendered/dev/cspr/westus3.yaml Adds default monitoring flags for ADX datasource provisioning (disabled).
config/config.yaml Introduces new monitoring defaults for ADX provisioning flags.
config/config.schema.json Defines schema/validation for the new monitoring configuration knobs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread dev-infrastructure/geography-pipeline.yaml Outdated
Comment thread dev-infrastructure/geography-pipeline.yaml Outdated
Comment thread dev-infrastructure/configurations/kusto.tmpl.bicepparam Outdated
Comment thread dev-infrastructure/cleanup/delete.kusto.instance.sh Outdated
@swiencki swiencki changed the title ARO-22279: add ADX Grafana datasource provisioning ARO-26057: add ADX Grafana datasource provisioning Apr 15, 2026
@janboll
Copy link
Copy Markdown
Collaborator

janboll commented Apr 16, 2026

/hold
please use https://github.com/Azure/ARO-Tools/blob/main/tools/grafanactl/cmd/modify/cmd.go#L32
for doing modifications to grafana. Ideally we should update grafana ONCE in the pipeline, cause running an update on the instance will block it for several minutes.

@swiencki
Copy link
Copy Markdown
Collaborator Author

Thanks for the feedback, I'll move this to grafanactl in Azure/ARO-Tools.

On the "update once" concern: the existing modify datasource reconcile runs globally because UpdateGrafanaIntegrations is an ARM-level mutation that blocks the Grafana instance for several minutes. ADX datasources go through the Grafana REST API directly (POST/PUT /api/datasources), which is a lightweight config write, no ARM-level instance lock. So per-geography invocation is safe here. Is this good with you?

Copilot AI review requested due to automatic review settings April 28, 2026 18:45
@swiencki swiencki force-pushed the adx-grafana-datasource-upstream branch from 8d14303 to a80a076 Compare April 28, 2026 18:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread dev-infrastructure/geography-pipeline.yaml
Copilot AI review requested due to automatic review settings April 28, 2026 19:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@swiencki
Copy link
Copy Markdown
Collaborator Author

/unhold

Updated to use grafanactl.

Comment thread dev-infrastructure/geography-pipeline.yaml Outdated
Copilot AI review requested due to automatic review settings April 30, 2026 21:02
@swiencki swiencki force-pushed the adx-grafana-datasource-upstream branch from 5d069ae to c091e9c Compare April 30, 2026 21:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tooling/templatize/pkg/pipeline/grafana_datasources_test.go
swiencki added a commit that referenced this pull request May 8, 2026
…peline

Per PR #4878 review. Drops the new GrafanaDatasources call from
geography-pipeline.yaml and extends the existing add-grafana-datasource
in region-pipeline.yaml with the ADX block. clusterUrl sources from
regional kusto-lookup output. AzureMonitor reconcile preserved via the
ARO-Tools default-true. Triple-AND gate keeps the three feature flags
agreeing by construction. No Go/Bicep/config/test changes.
Copilot AI review requested due to automatic review settings May 8, 2026 15:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

Comment thread docs/monitoring.md
Comment thread docs/monitoring.md
swiencki added a commit that referenced this pull request May 8, 2026
…peline

Per PR #4878 review. Drops the new GrafanaDatasources call from
geography-pipeline.yaml and extends the existing add-grafana-datasource
in region-pipeline.yaml with the ADX block. clusterUrl sources from
regional kusto-lookup output. AzureMonitor reconcile preserved via the
ARO-Tools default-true. Triple-AND gate keeps the three feature flags
agreeing by construction. No Go/Bicep/config/test changes.
@swiencki swiencki force-pushed the adx-grafana-datasource-upstream branch from ef9d6e8 to 656403e Compare May 8, 2026 16:09
Copy link
Copy Markdown
Collaborator

@SudoBrendan SudoBrendan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks clean, maybe some orphaned resources, can you validate? I'll have a look at the ARO-Tools change as well. Thanks Simon!

Comment thread dev-infrastructure/modules/logs/kusto/grant-access.bicep
Comment thread tooling/templatize/pkg/pipeline/grafana_datasources.go
Copilot AI review requested due to automatic review settings May 13, 2026 20:20
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 13, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: swiencki
Once this PR has been reviewed and has the lgtm label, please assign janboll for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 4 comments.

Comment thread tooling/templatize/pkg/pipeline/grafana_datasources.go
Comment thread tooling/templatize/pkg/pipeline/grafana_datasources.go
Comment thread dev-infrastructure/region-pipeline.yaml
Comment thread tooling/templatize/pkg/pipeline/grafana_datasources.go
@swiencki swiencki requested a review from janboll May 13, 2026 21:22
swiencki added 13 commits May 28, 2026 09:12
Replace the temporary shell datasource script with a direct grafanactl invocation and keep cleanup datasource deletion out of this rollout scope.

Depends on Azure/ARO-Tools#228.
…peline

Per PR #4878 review. Drops the new GrafanaDatasources call from
geography-pipeline.yaml and extends the existing add-grafana-datasource
in region-pipeline.yaml with the ADX block. clusterUrl sources from
regional kusto-lookup output. AzureMonitor reconcile preserved via the
ARO-Tools default-true. Triple-AND gate keeps the three feature flags
agreeing by construction. No Go/Bicep/config/test changes.
Move the Grafana ServiceLogs Viewer grant out of the core Kusto
deployment (main.bicep) into its own geography-pipeline step with
automatedRetry on AAD principal errors.

This follows the established pattern from kusto-grant-ingest.bicep
(used in mgmt/svc pipelines with automatedRetry) so that a transient
principal-not-found failure does not fail the entire Kusto rollout.

Also add an early return in resolveGrafanaADXOptions when ADX is
disabled: skip resolving cluster URL, datasource name, and other
fields that are only needed for the create/update path. This keeps
non-owner regions (where the triple-AND gate evaluates to false)
from holding a resolved DatasourceName that could interact with
deleteWhenDisabled. The delete-on-disable path for disallowed
geographies is unaffected because enabled=true there and the
geography allowlist check runs inside grafanactl Validate.

Note: renaming grant-access.bicep resource names from grant-* to
grant-ingest-*/grant-viewer-* (done in a prior commit) will leave
the old principalAssignments in place as harmless duplicates.
@swiencki swiencki force-pushed the adx-grafana-datasource-upstream branch from e3d6689 to 159012b Compare May 28, 2026 16:12
@swiencki
Copy link
Copy Markdown
Collaborator Author

/hold - This PR requires Azure/ARO-Tools#228 to be merged first, CI is expected to fail without it

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 28, 2026

@swiencki: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/cspr 8d14303 link true /test cspr
ci/prow/verify 159012b link true /test verify
ci/prow/config-change-detection 159012b link true /test config-change-detection
ci/prow/lint 159012b link true /test lint
ci/prow/test-unit 159012b link true /test test-unit
ci/prow/e2e-images 159012b link true /test e2e-images
ci/prow/mega-linter 159012b link true /test mega-linter
ci/prow/image-updater-images 159012b link true /test image-updater-images
ci/prow/images 159012b link true /test images
ci/prow/e2e-parallel 159012b link true /test e2e-parallel

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants