Skip to content

Conversation

@sradco
Copy link

@sradco sradco commented Jan 7, 2026

This enhancement describes the alert rules classification
mapping details, logic and motivation.

This enhancements is related to #1822

@openshift-ci openshift-ci bot requested review from jan--f and moadz January 7, 2026 11:08
@sradco
Copy link
Author

sradco commented Jan 7, 2026

@simonpasquier @avlitman @machadovilaca Hi,
I added this enhancement proposal to outline the mapping of group(updated to layer, to align to incident detection terminology) and component.
Since we had to move operator alerts to an external mapping, I propose here to move it across all alert types.

@sradco sradco force-pushed the add_alert_rule_classification_mapping_enhancement branch from 1bd174b to 8daffb4 Compare January 7, 2026 12:01
@simonpasquier
Copy link
Contributor

/unassign @moadz
/assign

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 8, 2026

@sradco: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test markdownlint

Use /test all to run all jobs.

Details

In response to this:

/test ci/prow/markdownlint

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@sradco
Copy link
Author

sradco commented Jan 8, 2026

/test markdownlint

@sradco sradco force-pushed the add_alert_rule_classification_mapping_enhancement branch from 8daffb4 to 948ae4b Compare January 8, 2026 19:23
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 8, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from simonpasquier. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sradco sradco force-pushed the add_alert_rule_classification_mapping_enhancement branch from 948ae4b to 86158ff Compare January 8, 2026 19:24
@sradco sradco force-pushed the add_alert_rule_classification_mapping_enhancement branch 10 times, most recently from 8303044 to 269ec99 Compare January 26, 2026 10:31
Signed-off-by: Shirly Radco <sradco@redhat.com>
@sradco sradco force-pushed the add_alert_rule_classification_mapping_enhancement branch from 269ec99 to f828ba8 Compare January 26, 2026 13:12
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 26, 2026

@sradco: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/markdownlint f828ba8 link true /test markdownlint

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

- **Special-cases first**: alert families that require per-alert-instance classification (for example, where the effective component depends on runtime alert labels) are evaluated before general rules.
- **General mappings next**: stable mappings for well-known platform and workload areas.
- **Fallback last**: when nothing matches, we apply the documented fallback behavior.
- Adding or updating heuristics is a code change in the monitoring plugin (see `monitoring-plugin-machadovilaca/pkg/alertcomponent/matcher.go`), and should remain aligned with `cluster-health-analyzer` to avoid drift between backend and UI behavior.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to extend the cluster-health-analyzer then? Health analyzer is deployed via COO as well so it seems a little bit redundant to have the same logic in the monitoring-plugin (which is also deployed via COO if I am not mistaken).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually is the plan to introduce it in monitoring-plugin or monitoring-console-plugin? I am always a little bit confused by these two, but one can be deployed via COO whereas the second is default in OpenShift AFAIK.

## Proposal

### User Stories
- As a multi-cluster admin, I will be able to see which clusters require my attention drill down by impacted component and scope of impact to the specific related alerts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the components don't have any hierarchy right? I think hiearchical structure of the components (as proposed in the cluster/component health feature for the cluster-health-analyzer) might be more interesting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants