Application Log Health: Shift all Logs-related tasks from deployment + statefulset + daemonset healthcheck codebundles by akshayrw25 · Pull Request #606 · runwhen-contrib/rw-cli-codecollection

akshayrw25 · 2026-01-15T14:05:00Z

Summary

This PR introduces a dedicated k8s-applog-health codebundle and moves all application-log health logic into it.
Log-related tasks are removed from the deployment, statefulset, and daemonset healthcheck runbooks so log analysis is handled in one place and works for any supported workload type (deployment, statefulset, daemonset).

What’s new

New codebundle codebundles/k8s-applog-health/
- Runbook: Analyze Application Log Patterns for ${WORKLOAD_TYPE} ${WORKLOAD_NAME} in Namespace ${NAMESPACE} (fetch logs, scan with configurable patterns, create issues, report health score) and Fetch Workload Logs for ${WORKLOAD_TYPE} ${WORKLOAD_NAME} in Namespace ${NAMESPACE} (raw logs for manual review).
- SLI: Critical log-error scoring; final applog health score; supports scaled-to-zero.
- Config: Uses WORKLOAD_NAME + WORKLOAD_TYPE (from match_resource.kind) so the same codebundle is used for deployments, statefulsets, and daemonsets.
- Patterns: runbook_patterns.json (full categories for runbook) and sli_critical_patterns.json (critical-only for SLI).
- RunWhen: Generation rules and templates for SLI, SLX, and TaskSet (including WORKLOAD_TYPE / WORKLOAD_NAME in config).
- README: Tasks, pattern categories, configuration variables, and requirements.

What’s removed from existing runbooks

k8s-deployment-healthcheck: Removed tasks: Analyze Application Log Patterns for Deployment ..., Fetch Deployment Logs for ....
k8s-statefulset-healthcheck: Removed tasks: Analyze Application Log Patterns for StatefulSet ..., Detect Log Anomalies for StatefulSet ....
k8s-daemonset-healthcheck: Removed tasks: Analyze Application Log Patterns for DaemonSet ..., Detect Log Anomalies for DaemonSet ....

Note: “Detect Log Anomalies” (log anomaly analysis) is not re-added in k8s-applog-health in this PR; the consolidated codebundle currently focuses on pattern-based log scanning and fetch-only.

Focus points for reviewers

Variable naming and wiring
Confirm that WORKLOAD_NAME and WORKLOAD_TYPE are used consistently in k8s-applog-health (runbook, SLI, templates) and that TaskSet/SLI templates pass match_resource.resource.metadata.name and match_resource.kind | lower correctly for deployment/statefulset/daemonset.
Parity with previous behavior
- Runbook: Error/pattern scanning, severity threshold, LOGS_EXCLUDE_PATTERN, custom patterns file, health score, and issue creation should match the previous deployment runbook behavior (with workload type generalized).
- “Fetch logs” task: Behavior (including health-check filtering and report format) should align with the old “Fetch Deployment Logs” task, generalized for ${WORKLOAD_TYPE} / ${WORKLOAD_NAME}.
Removed tasks and discoverability
Users who previously used deployment/statefulset/daemonset runbooks for log analysis will now need to use the k8s-applog-health codebundle (or a TaskSet that includes it). Confirm that removal of the log tasks from the three healthcheck runbooks is intentional and that docs/runbook names make the migration path clear.
Log anomaly tasks
“Detect Log Anomalies” was removed from statefulset and daemonset and is not reimplemented in k8s-applog-health. Confirm whether that’s intentional for this PR or if we should track adding it to k8s-applog-health (or elsewhere) in a follow-up.
SLI behavior and scaling
SLI uses critical-pattern scan and handles scaled-to-zero (e.g. score 1.0 when replicas = 0). Check that Get Critical Log Errors and Score and Generate Application Health Score behave as expected for 0 replicas and that metric/reporting still make sense.
Secrets and config
Spot-check that k8s-applog-health templates use the same secret/auth pattern as other K8s codebundles (e.g. kubeconfig, kubernetes-auth.yaml) and that required config (e.g. NAMESPACE, CONTEXT, WORKLOAD_NAME, WORKLOAD_TYPE, log limits) is documented in the README and wired in the TaskSet/SLI templates.

Note

Medium Risk
Moderate risk due to behavioral/UX changes: existing healthcheck runbooks lose log/anomaly tasks and users must adopt the new bundle, and the new runbook/SLI rely on templated variable wiring and log-pattern configuration to produce correct results.

Overview
Centralizes Kubernetes application log health checks into a new k8s-applog-health codebundle, including a generalized runbook (analyze patterns + fetch logs), an SLI that scores critical log errors and treats scaled-to-zero workloads as healthy, and JSON pattern catalogs for runbook vs SLI scans.

Adds RunWhen generation rules and templates to emit SLX/SLI/Runbook instances wired via WORKLOAD_TYPE + WORKLOAD_NAME, and documents configuration/usage in a new README.

Removes the log-analysis/log-fetch and anomaly-related tasks from the existing k8s-deployment-healthcheck, k8s-statefulset-healthcheck, and k8s-daemonset-healthcheck runbooks so log triage is no longer duplicated across those bundles.

^{Written by Cursor Bugbot for commit 284c11e. This will update automatically on new commits. Configure here.}

akshayrw25 · 2026-01-15T14:07:24Z

@cursor review

cursor · 2026-01-16T13:42:46Z

You have run out of free Bugbot PR reviews for this billing cycle. This will reset on February 11.

To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

stewartshea · 2026-01-29T01:50:13Z

bugbot run

stewartshea · 2026-01-29T01:53:18Z

@akshayrw25 This codebundle should either a) be renamed to include the word "deployment" if it is only focused on deployments, or made to support other resources types (e.g. statefulsets, daemonsets, etc) if it is to keep the current name

This isn't a full review, just a current comment since a decision will need to be made regarding the scope of the codebundle.

- Rename task to "Scan Application Logs for Errors and Stacktraces" for clarity - Add timestamp tracking for log extraction to support accurate issue reporting - Enhance log contents display with structured format showing last N lines per file - Add LOG_START/LOG_END markers for better log parsing - Use recorded timestamp when no issues are found - Re-enable log cleanup after analysis

…ppLog Analysis

- Add LOG_SIZE/MAX_LOG_SIZE variable to control max log bytes fetched (default 2MB) - Increase default LOG_LINES from 100 to 1000 for better log coverage - Pass max_log_lines and max_log_bytes to log fetching in runbook - Remove unused EVENT_AGE, EVENT_THRESHOLD, CHECK_SERVICE_ENDPOINTS from SLI - Remove unnecessary "no issues found" dummy issue creation - Rename final SLI task to "Generate Application Health Score"

…shifted "Analyze workload stacktraces" task from stacktrace-CB to applog-health-CB

… shorten interval- Templates: use WORKLOAD_NAME + WORKLOAD_TYPE (from match_resource.kind) instead of DEPLOYMENT_NAME; - runbook.robot: remove 'Analyze Workload Stacktraces' task; rely on log pattern analysis only; cleanup temp files inside conditional- sli.robot: replace DEPLOYMENT_NAME with WORKLOAD_NAME/WORKLOAD_TYPE; remove 'Get Stacktrace Health Score' task; final health score from log_health_score only; scale-down timestamp logic only for deployment kind

- delete the "Detect Log Anomalies" task from daemonset-healthcheck and statefulset-healthcheck"

…);removed the redundant "detect log anomalies" task from statefulset-healthcheck

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

cursor · 2026-02-12T05:24:27Z

codebundles/k8s-applog-health/.runwhen/generation-rules/k8s-applog-health.yaml

+spec:
+  generationRules:
+    - resourceTypes:
+        - deployment


Generation excludes non-deployment workloads

High Severity

k8s-applog-health only generates for deployment resources, while log tasks were removed from the statefulset and daemonset healthcheck runbooks. This leaves statefulset and daemonset workloads without generated applog SLIs/runbooks, so the consolidation does not actually apply to all supported workload types.

Additional Locations (2)

codebundles/k8s-statefulset-healthcheck/runbook.robot#L21-L22

codebundles/k8s-daemonset-healthcheck/runbook.robot#L21-L22

@stewartshea @Rohit-Ekbote
to include the other two, i.e. statefulset and daemonset , will this be correct:

- deployment - statefulset - daemonset

Also, do we have any other codebundle with generation Rules applicable to more than 1 resource type?

yes, that looks correct (you can validate against the other generation rules for those resource types)
and yes, we have other examples with azure codebundles etc.

codebundles/k8s-applog-health/sli.robot

codebundles/k8s-applog-health/.runwhen/templates/k8s-applog-health-sli.yaml

- rectified the k8s-applog-health runbook and sli SKIP_HEALTH_CHECKS evaluation - removed the "Critical Log Errors" sub metric from deployment-healthcheck SLI(looks at logs, this codebundle shouldn't ideally be looking at logs)

stewartshea

Needs to be split into two codebundles according to data type. Please also add tags to the tasks, either "data:logs-bullk" or "data:logs-stacktrace"

akshayrw25 requested a review from a team as a code owner January 15, 2026 14:05

akshayrw25 self-assigned this Jan 15, 2026

akshayrw25 added 5 commits February 11, 2026 15:28

RWENGG-1350: initial writeup of the applog health codebundle

97ffeb3

Update display name from Kubernetes Deployment Triage to Kubernetes A…

8f8d391

…ppLog Analysis

shifted "Analyze applog " task from healthcheck to applog-health CB; …

2591933

…shifted "Analyze workload stacktraces" task from stacktrace-CB to applog-health-CB

akshayrw25 force-pushed the RWENGG-1350 branch from 20a424d to 2591933 Compare February 11, 2026 09:58

- shift the "Fetch Deployment Logs" task to applog codebundle

4573ff4

- delete the "Detect Log Anomalies" task from daemonset-healthcheck and statefulset-healthcheck"

added README for k8s-applog-health(the new application log codebundle…

cc77008

…);removed the redundant "detect log anomalies" task from statefulset-healthcheck

akshayrw25 changed the title ~~Application Log Health: Combine all Logs-related tasks from deployment + statefulset + daemonset healthcheck codebundles~~ Application Log Health: Shift all Logs-related tasks from deployment + statefulset + daemonset healthcheck codebundles Feb 12, 2026

add next_action kwarg to distinguish applog issues in platform

284c11e

cursor bot reviewed Feb 12, 2026

View reviewed changes

stewartshea requested changes Feb 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Application Log Health: Shift all Logs-related tasks from deployment + statefulset + daemonset healthcheck codebundles#606

Application Log Health: Shift all Logs-related tasks from deployment + statefulset + daemonset healthcheck codebundles#606
akshayrw25 wants to merge 10 commits intomainfrom
RWENGG-1350

akshayrw25 commented Jan 15, 2026 •

edited by cursor bot

Loading

Uh oh!

akshayrw25 commented Jan 15, 2026 •

edited

Loading

Uh oh!

cursor bot commented Jan 16, 2026

Uh oh!

stewartshea commented Jan 29, 2026

Uh oh!

stewartshea commented Jan 29, 2026 •

edited

Loading

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Feb 12, 2026

Uh oh!

akshayrw25 Feb 12, 2026

Uh oh!

stewartshea Feb 12, 2026

Uh oh!

Uh oh!

Uh oh!

stewartshea left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

akshayrw25 commented Jan 15, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Focus points for reviewers

Uh oh!

akshayrw25 commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor bot commented Jan 16, 2026

Uh oh!

stewartshea commented Jan 29, 2026

Uh oh!

stewartshea commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

This PR is being reviewed by Cursor Bugbot

Uh oh!

cursor bot Feb 12, 2026

Choose a reason for hiding this comment

Generation excludes non-deployment workloads

Uh oh!

akshayrw25 Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

stewartshea Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

stewartshea left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

akshayrw25 commented Jan 15, 2026 •

edited by cursor bot

Loading

akshayrw25 commented Jan 15, 2026 •

edited

Loading

stewartshea commented Jan 29, 2026 •

edited

Loading