OCPBUGS-79699: rules: add rhel to label_node_openshift_io_os_id#582
OCPBUGS-79699: rules: add rhel to label_node_openshift_io_os_id#582danielmellado wants to merge 1 commit intoopenshift:mainfrom
Conversation
4.19+ reports label_node_openshift_io_os_id=rhel while remaining CoreOS; the previous rhcos-only selector dropped cluster:capacity_effective_cpu_cores from telemetry. This commit adds rhel to the rule and a test case. Signed-off-by: Daniel Mellado <dmellado@fedoraproject.org>
|
@danielmellado: This pull request references Jira Issue OCPBUGS-79699, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughThe changes extend OpenShift CPU capacity monitoring to include both RHCOS and RHEL nodes. Recording rules are updated to match both OS types using regex patterns, and a test case is added to validate the new behavior for RHEL worker nodes. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danielmellado The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/jira refresh |
|
@juzhao: This pull request references Jira Issue OCPBUGS-79699, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
| record: 'cluster:cpu_capacity_cores:_id', | ||
| expr: ||| | ||
| group by(_id, tenant_id) (cluster:capacity_cpu_cores:sum{label_node_openshift_io_os_id="rhcos"}) * 0 | ||
| group by(_id, tenant_id) (cluster:capacity_cpu_cores:sum{label_node_openshift_io_os_id=~"rhcos|rhel"}) * 0 |
There was a problem hiding this comment.
label_node_openshift_io_os_id=~"rhcos|rhel"} is safe
I think since 4.19+ the os id is changed to rhel now, how about leave rhel only
label_node_openshift_io_os_id="rhel"}), same for others
There was a problem hiding this comment.
Thanks for the review! If we stop backporting this in 4.19+ then that'd be fine. Using only label_node_openshift_io_os_id="rhel" would fix 4.19+ but would stop matching clusters still on 4.18 and earlier, where the label remains rhcos.
My main concern here is that telemeter rules run against the full fleet, so we may need both values. The regex keeps backward compatibility with the original rhcos filter and adds rhel for 4.19+, wdyt?
There was a problem hiding this comment.
I see, you are right, since the telemeter server side configuration only have main branch for production, example: https://github.com/rhobs/configuration/blob/main/resources/services/rhobs-thanos-operator/staging/telemeter-rules.yaml#L149-L160, keep rhcos is right for 4.18 and below, add rhel is for 4.19+, and seems we also need to update in the rhobs for the same after this PR merged
|
thanks, LGTM, waiting for others to review |
|
/verified by @juzhao |
|
@juzhao: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/hold |
|
@danielmellado: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
4.19+ reports label_node_openshift_io_os_id=rhel while remaining CoreOS;
the previous rhcos-only selector dropped cluster:capacity_effective_cpu_cores
from telemetry. This commit adds rhel to the rule and a test case.
Signed-off-by: Daniel Mellado dmellado@fedoraproject.org