Application Log Health: Shift all Logs-related tasks from deployment + statefulset + daemonset healthcheck codebundles#606
Application Log Health: Shift all Logs-related tasks from deployment + statefulset + daemonset healthcheck codebundles#606akshayrw25 wants to merge 10 commits intomainfrom
Conversation
|
@cursor review |
|
You have run out of free Bugbot PR reviews for this billing cycle. This will reset on February 11. To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial. |
|
bugbot run |
|
@akshayrw25 This codebundle should either a) be renamed to include the word "deployment" if it is only focused on deployments, or made to support other resources types (e.g. statefulsets, daemonsets, etc) if it is to keep the current name This isn't a full review, just a current comment since a decision will need to be made regarding the scope of the codebundle. |
- Rename task to "Scan Application Logs for Errors and Stacktraces" for clarity - Add timestamp tracking for log extraction to support accurate issue reporting - Enhance log contents display with structured format showing last N lines per file - Add LOG_START/LOG_END markers for better log parsing - Use recorded timestamp when no issues are found - Re-enable log cleanup after analysis
- Add LOG_SIZE/MAX_LOG_SIZE variable to control max log bytes fetched (default 2MB) - Increase default LOG_LINES from 100 to 1000 for better log coverage - Pass max_log_lines and max_log_bytes to log fetching in runbook - Remove unused EVENT_AGE, EVENT_THRESHOLD, CHECK_SERVICE_ENDPOINTS from SLI - Remove unnecessary "no issues found" dummy issue creation - Rename final SLI task to "Generate Application Health Score"
…shifted "Analyze workload stacktraces" task from stacktrace-CB to applog-health-CB
20a424d to
2591933
Compare
… shorten interval- Templates: use WORKLOAD_NAME + WORKLOAD_TYPE (from match_resource.kind) instead of DEPLOYMENT_NAME; - runbook.robot: remove 'Analyze Workload Stacktraces' task; rely on log pattern analysis only; cleanup temp files inside conditional- sli.robot: replace DEPLOYMENT_NAME with WORKLOAD_NAME/WORKLOAD_TYPE; remove 'Get Stacktrace Health Score' task; final health score from log_health_score only; scale-down timestamp logic only for deployment kind
- delete the "Detect Log Anomalies" task from daemonset-healthcheck and statefulset-healthcheck"
…);removed the redundant "detect log anomalies" task from statefulset-healthcheck
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
This PR is being reviewed by Cursor Bugbot
Details
You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
| spec: | ||
| generationRules: | ||
| - resourceTypes: | ||
| - deployment |
There was a problem hiding this comment.
Generation excludes non-deployment workloads
High Severity
k8s-applog-health only generates for deployment resources, while log tasks were removed from the statefulset and daemonset healthcheck runbooks. This leaves statefulset and daemonset workloads without generated applog SLIs/runbooks, so the consolidation does not actually apply to all supported workload types.
Additional Locations (2)
There was a problem hiding this comment.
@stewartshea @Rohit-Ekbote
to include the other two, i.e. statefulset and daemonset , will this be correct:
- deployment
- statefulset
- daemonsetAlso, do we have any other codebundle with generation Rules applicable to more than 1 resource type?
There was a problem hiding this comment.
yes, that looks correct (you can validate against the other generation rules for those resource types)
and yes, we have other examples with azure codebundles etc.
codebundles/k8s-applog-health/.runwhen/templates/k8s-applog-health-sli.yaml
Show resolved
Hide resolved
- rectified the k8s-applog-health runbook and sli SKIP_HEALTH_CHECKS evaluation - removed the "Critical Log Errors" sub metric from deployment-healthcheck SLI(looks at logs, this codebundle shouldn't ideally be looking at logs)
stewartshea
left a comment
There was a problem hiding this comment.
Needs to be split into two codebundles according to data type. Please also add tags to the tasks, either "data:logs-bullk" or "data:logs-stacktrace"


Summary
What’s new
codebundles/k8s-applog-health/Analyze Application Log Patterns for ${WORKLOAD_TYPE} ${WORKLOAD_NAME} in Namespace ${NAMESPACE}(fetch logs, scan with configurable patterns, create issues, report health score) andFetch Workload Logs for ${WORKLOAD_TYPE} ${WORKLOAD_NAME} in Namespace ${NAMESPACE}(raw logs for manual review).WORKLOAD_NAME+WORKLOAD_TYPE(frommatch_resource.kind) so the same codebundle is used for deployments, statefulsets, and daemonsets.runbook_patterns.json(full categories for runbook) andsli_critical_patterns.json(critical-only for SLI).WORKLOAD_TYPE/WORKLOAD_NAMEin config).What’s removed from existing runbooks
Analyze Application Log Patterns for Deployment ...,Fetch Deployment Logs for ....Analyze Application Log Patterns for StatefulSet ...,Detect Log Anomalies for StatefulSet ....Analyze Application Log Patterns for DaemonSet ...,Detect Log Anomalies for DaemonSet ....Note: “Detect Log Anomalies” (log anomaly analysis) is not re-added in k8s-applog-health in this PR; the consolidated codebundle currently focuses on pattern-based log scanning and fetch-only.
Focus points for reviewers
Variable naming and wiring
Confirm that
WORKLOAD_NAMEandWORKLOAD_TYPEare used consistently in k8s-applog-health (runbook, SLI, templates) and that TaskSet/SLI templates passmatch_resource.resource.metadata.nameandmatch_resource.kind | lowercorrectly for deployment/statefulset/daemonset.Parity with previous behavior
LOGS_EXCLUDE_PATTERN, custom patterns file, health score, and issue creation should match the previous deployment runbook behavior (with workload type generalized).${WORKLOAD_TYPE}/${WORKLOAD_NAME}.Removed tasks and discoverability
Users who previously used deployment/statefulset/daemonset runbooks for log analysis will now need to use the k8s-applog-health codebundle (or a TaskSet that includes it). Confirm that removal of the log tasks from the three healthcheck runbooks is intentional and that docs/runbook names make the migration path clear.
Log anomaly tasks
“Detect Log Anomalies” was removed from statefulset and daemonset and is not reimplemented in k8s-applog-health. Confirm whether that’s intentional for this PR or if we should track adding it to k8s-applog-health (or elsewhere) in a follow-up.
SLI behavior and scaling
SLI uses critical-pattern scan and handles scaled-to-zero (e.g. score 1.0 when replicas = 0). Check that
Get Critical Log Errors and ScoreandGenerate Application Health Scorebehave as expected for 0 replicas and that metric/reporting still make sense.Secrets and config
Spot-check that k8s-applog-health templates use the same secret/auth pattern as other K8s codebundles (e.g.
kubeconfig,kubernetes-auth.yaml) and that required config (e.g.NAMESPACE,CONTEXT,WORKLOAD_NAME,WORKLOAD_TYPE, log limits) is documented in the README and wired in the TaskSet/SLI templates.Note
Medium Risk
Moderate risk due to behavioral/UX changes: existing healthcheck runbooks lose log/anomaly tasks and users must adopt the new bundle, and the new runbook/SLI rely on templated variable wiring and log-pattern configuration to produce correct results.
Overview
Centralizes Kubernetes application log health checks into a new
k8s-applog-healthcodebundle, including a generalized runbook (analyze patterns + fetch logs), an SLI that scores critical log errors and treats scaled-to-zero workloads as healthy, and JSON pattern catalogs for runbook vs SLI scans.Adds RunWhen generation rules and templates to emit
SLX/SLI/Runbookinstances wired viaWORKLOAD_TYPE+WORKLOAD_NAME, and documents configuration/usage in a new README.Removes the log-analysis/log-fetch and anomaly-related tasks from the existing
k8s-deployment-healthcheck,k8s-statefulset-healthcheck, andk8s-daemonset-healthcheckrunbooks so log triage is no longer duplicated across those bundles.Written by Cursor Bugbot for commit 284c11e. This will update automatically on new commits. Configure here.