Skip to content

Application Log Health: Shift all Logs-related tasks from deployment + statefulset + daemonset healthcheck codebundles#606

Open
akshayrw25 wants to merge 10 commits intomainfrom
RWENGG-1350
Open

Application Log Health: Shift all Logs-related tasks from deployment + statefulset + daemonset healthcheck codebundles#606
akshayrw25 wants to merge 10 commits intomainfrom
RWENGG-1350

Conversation

@akshayrw25
Copy link
Contributor

@akshayrw25 akshayrw25 commented Jan 15, 2026

Summary

  • This PR introduces a dedicated k8s-applog-health codebundle and moves all application-log health logic into it.
  • Log-related tasks are removed from the deployment, statefulset, and daemonset healthcheck runbooks so log analysis is handled in one place and works for any supported workload type (deployment, statefulset, daemonset).

What’s new

  • New codebundle codebundles/k8s-applog-health/
    • Runbook: Analyze Application Log Patterns for ${WORKLOAD_TYPE} ${WORKLOAD_NAME} in Namespace ${NAMESPACE} (fetch logs, scan with configurable patterns, create issues, report health score) and Fetch Workload Logs for ${WORKLOAD_TYPE} ${WORKLOAD_NAME} in Namespace ${NAMESPACE} (raw logs for manual review).
    • SLI: Critical log-error scoring; final applog health score; supports scaled-to-zero.
    • Config: Uses WORKLOAD_NAME + WORKLOAD_TYPE (from match_resource.kind) so the same codebundle is used for deployments, statefulsets, and daemonsets.
    • Patterns: runbook_patterns.json (full categories for runbook) and sli_critical_patterns.json (critical-only for SLI).
    • RunWhen: Generation rules and templates for SLI, SLX, and TaskSet (including WORKLOAD_TYPE / WORKLOAD_NAME in config).
    • README: Tasks, pattern categories, configuration variables, and requirements.

What’s removed from existing runbooks

  • k8s-deployment-healthcheck: Removed tasks: Analyze Application Log Patterns for Deployment ..., Fetch Deployment Logs for ....
  • k8s-statefulset-healthcheck: Removed tasks: Analyze Application Log Patterns for StatefulSet ..., Detect Log Anomalies for StatefulSet ....
  • k8s-daemonset-healthcheck: Removed tasks: Analyze Application Log Patterns for DaemonSet ..., Detect Log Anomalies for DaemonSet ....

Note: “Detect Log Anomalies” (log anomaly analysis) is not re-added in k8s-applog-health in this PR; the consolidated codebundle currently focuses on pattern-based log scanning and fetch-only.


Focus points for reviewers

  1. Variable naming and wiring
    Confirm that WORKLOAD_NAME and WORKLOAD_TYPE are used consistently in k8s-applog-health (runbook, SLI, templates) and that TaskSet/SLI templates pass match_resource.resource.metadata.name and match_resource.kind | lower correctly for deployment/statefulset/daemonset.

  2. Parity with previous behavior

    • Runbook: Error/pattern scanning, severity threshold, LOGS_EXCLUDE_PATTERN, custom patterns file, health score, and issue creation should match the previous deployment runbook behavior (with workload type generalized).
    • “Fetch logs” task: Behavior (including health-check filtering and report format) should align with the old “Fetch Deployment Logs” task, generalized for ${WORKLOAD_TYPE} / ${WORKLOAD_NAME}.
  3. Removed tasks and discoverability
    Users who previously used deployment/statefulset/daemonset runbooks for log analysis will now need to use the k8s-applog-health codebundle (or a TaskSet that includes it). Confirm that removal of the log tasks from the three healthcheck runbooks is intentional and that docs/runbook names make the migration path clear.

  4. Log anomaly tasks
    “Detect Log Anomalies” was removed from statefulset and daemonset and is not reimplemented in k8s-applog-health. Confirm whether that’s intentional for this PR or if we should track adding it to k8s-applog-health (or elsewhere) in a follow-up.

  5. SLI behavior and scaling
    SLI uses critical-pattern scan and handles scaled-to-zero (e.g. score 1.0 when replicas = 0). Check that Get Critical Log Errors and Score and Generate Application Health Score behave as expected for 0 replicas and that metric/reporting still make sense.

  6. Secrets and config
    Spot-check that k8s-applog-health templates use the same secret/auth pattern as other K8s codebundles (e.g. kubeconfig, kubernetes-auth.yaml) and that required config (e.g. NAMESPACE, CONTEXT, WORKLOAD_NAME, WORKLOAD_TYPE, log limits) is documented in the README and wired in the TaskSet/SLI templates.


Note

Medium Risk
Moderate risk due to behavioral/UX changes: existing healthcheck runbooks lose log/anomaly tasks and users must adopt the new bundle, and the new runbook/SLI rely on templated variable wiring and log-pattern configuration to produce correct results.

Overview
Centralizes Kubernetes application log health checks into a new k8s-applog-health codebundle, including a generalized runbook (analyze patterns + fetch logs), an SLI that scores critical log errors and treats scaled-to-zero workloads as healthy, and JSON pattern catalogs for runbook vs SLI scans.

Adds RunWhen generation rules and templates to emit SLX/SLI/Runbook instances wired via WORKLOAD_TYPE + WORKLOAD_NAME, and documents configuration/usage in a new README.

Removes the log-analysis/log-fetch and anomaly-related tasks from the existing k8s-deployment-healthcheck, k8s-statefulset-healthcheck, and k8s-daemonset-healthcheck runbooks so log triage is no longer duplicated across those bundles.

Written by Cursor Bugbot for commit 284c11e. This will update automatically on new commits. Configure here.

@akshayrw25 akshayrw25 requested a review from a team as a code owner January 15, 2026 14:05
@akshayrw25 akshayrw25 self-assigned this Jan 15, 2026
@akshayrw25
Copy link
Contributor Author

akshayrw25 commented Jan 15, 2026

@cursor review

@cursor
Copy link

cursor bot commented Jan 16, 2026

You have run out of free Bugbot PR reviews for this billing cycle. This will reset on February 11.

To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

@stewartshea
Copy link
Contributor

bugbot run

@stewartshea
Copy link
Contributor

stewartshea commented Jan 29, 2026

@akshayrw25 This codebundle should either a) be renamed to include the word "deployment" if it is only focused on deployments, or made to support other resources types (e.g. statefulsets, daemonsets, etc) if it is to keep the current name

This isn't a full review, just a current comment since a decision will need to be made regarding the scope of the codebundle.

- Rename task to "Scan Application Logs for Errors and Stacktraces" for clarity
- Add timestamp tracking for log extraction to support accurate issue reporting
- Enhance log contents display with structured format showing last N lines per file
- Add LOG_START/LOG_END markers for better log parsing
- Use recorded timestamp when no issues are found
- Re-enable log cleanup after analysis
- Add LOG_SIZE/MAX_LOG_SIZE variable to control max log bytes fetched (default 2MB)
- Increase default LOG_LINES from 100 to 1000 for better log coverage
- Pass max_log_lines and max_log_bytes to log fetching in runbook
- Remove unused EVENT_AGE, EVENT_THRESHOLD, CHECK_SERVICE_ENDPOINTS from SLI
- Remove unnecessary "no issues found" dummy issue creation
- Rename final SLI task to "Generate Application Health Score"
…shifted "Analyze workload stacktraces" task from stacktrace-CB to applog-health-CB
… shorten interval- Templates: use WORKLOAD_NAME + WORKLOAD_TYPE (from match_resource.kind) instead of DEPLOYMENT_NAME; - runbook.robot: remove 'Analyze Workload Stacktraces' task; rely on log pattern analysis only; cleanup temp files inside conditional- sli.robot: replace DEPLOYMENT_NAME with WORKLOAD_NAME/WORKLOAD_TYPE; remove 'Get Stacktrace Health Score' task; final health score from log_health_score only; scale-down timestamp logic only for deployment kind
@akshayrw25 akshayrw25 changed the title Application Log Health: Combination of "Analyze application log patterns..." and "Analyze Workload Stacktraces..." to detect and segregate log-health based issues Application Log Health: Combination of "Analyze application log patterns..." across deployment + statefulset + daemonset healthcheck codebundles to detect and segregate log-health based issues Feb 11, 2026
- delete the "Detect Log Anomalies" task from daemonset-healthcheck and statefulset-healthcheck"
@akshayrw25 akshayrw25 changed the title Application Log Health: Combination of "Analyze application log patterns..." across deployment + statefulset + daemonset healthcheck codebundles to detect and segregate log-health based issues Application Log Health: Combine all Logs-related tasks from deployment + statefulset + daemonset healthcheck codebundles Feb 11, 2026
…);removed the redundant "detect log anomalies" task from statefulset-healthcheck
@akshayrw25 akshayrw25 changed the title Application Log Health: Combine all Logs-related tasks from deployment + statefulset + daemonset healthcheck codebundles Application Log Health: Shift all Logs-related tasks from deployment + statefulset + daemonset healthcheck codebundles Feb 12, 2026
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

spec:
generationRules:
- resourceTypes:
- deployment
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generation excludes non-deployment workloads

High Severity

k8s-applog-health only generates for deployment resources, while log tasks were removed from the statefulset and daemonset healthcheck runbooks. This leaves statefulset and daemonset workloads without generated applog SLIs/runbooks, so the consolidation does not actually apply to all supported workload types.

Additional Locations (2)

Fix in Cursor Fix in Web

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stewartshea @Rohit-Ekbote
to include the other two, i.e. statefulset and daemonset , will this be correct:

- deployment
- statefulset
- daemonset

Also, do we have any other codebundle with generation Rules applicable to more than 1 resource type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that looks correct (you can validate against the other generation rules for those resource types)
and yes, we have other examples with azure codebundles etc.

- rectified the k8s-applog-health runbook and sli SKIP_HEALTH_CHECKS evaluation
- removed the "Critical Log Errors" sub metric from deployment-healthcheck SLI(looks at logs, this codebundle shouldn't ideally be looking at logs)
Copy link
Contributor

@stewartshea stewartshea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be split into two codebundles according to data type. Please also add tags to the tasks, either "data:logs-bullk" or "data:logs-stacktrace"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants