From 03bc2b753fdc2bb260b5aaaf56cbd20506f3460d Mon Sep 17 00:00:00 2001 From: Robusta Runner Date: Thu, 6 Nov 2025 10:50:44 +0200 Subject: [PATCH 1/5] fixes --- docs/conf.py | 3 +- .../grafana-self-hosted.rst | 2 +- docs/index.rst | 3 +- .../builtin-alert-enrichment.rst | 46 ++++++++++++++++--- .../defining-playbooks/builtin-playbooks.rst | 2 +- .../creating-notifications.rst | 2 +- .../defining-playbooks/playbook-advanced.rst | 2 +- .../defining-playbooks/playbook-basics.rst | 2 +- docs/playbook-reference/index.rst | 2 +- docs/playbook-reference/overview.rst | 5 ++ .../prometheus-examples/index.rst | 24 ---------- .../link-alert-enrichment.rst | 2 +- docs/setup-robusta/installation-faq.rst | 2 +- 13 files changed, 56 insertions(+), 41 deletions(-) delete mode 100644 docs/playbook-reference/prometheus-examples/index.rst diff --git a/docs/conf.py b/docs/conf.py index c533376c2..3497561f4 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -108,7 +108,8 @@ "configuration/configuring-sinks.html": "/master/notification-routing/configuring-sinks.html", "how-it-works/privacy-and-security.html": "/master/setup-robusta/privacy-and-security.html", "how-it-works/index.html": "/master/playbook-reference/what-are-playbooks.html", - "playbook-reference/examples.html": "/master/playbook-reference/prometheus-examples/index.html", + "playbook-reference/examples.html": "/master/playbook-reference/builtin-alert-enrichment.html", + "playbook-reference/prometheus-examples/index.html": "/master/playbook-reference/builtin-alert-enrichment.html", "tutorials/playbook-track-changes.html": "/master/playbook-reference/kubernetes-examples/kubernetes-change-notifications.html", "tutorials/playbook-job-failure.html": "/master/playbook-reference/kubernetes-examples/kubernetes-change-notifications.html", "tutorials/playbook-failed-liveness.html": "/master/playbook-reference/kubernetes-examples/kubernetes-change-notifications.html", diff --git a/docs/configuration/alertmanager-integration/grafana-self-hosted.rst b/docs/configuration/alertmanager-integration/grafana-self-hosted.rst index 28630d6d1..e70756243 100644 --- a/docs/configuration/alertmanager-integration/grafana-self-hosted.rst +++ b/docs/configuration/alertmanager-integration/grafana-self-hosted.rst @@ -93,7 +93,7 @@ To enable Robusta to correlate your Grafana alerts with the specific Kubernetes This is only required for Kubernetes alerts. You can send any alert to the Robusta timeline, including non-Kubernetes alerts. Option 2: Inline Alert Enrichment and Routing -=========================================== +============================================== Use Robusta to enrich alerts inline with extra context and route them to :doc:`other systems ` (Slack, Microsoft Teams, etc.). Learn more about :doc:`alert routing `. diff --git a/docs/index.rst b/docs/index.rst index 86477de80..4345f82da 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -144,8 +144,7 @@ :hidden: playbook-reference/index - Builtin Alert Enrichment - Custom Alert Enrichment + Enrich Alerts Kubernetes Change Notifications Cost Savings - KRR diff --git a/docs/playbook-reference/builtin-alert-enrichment.rst b/docs/playbook-reference/builtin-alert-enrichment.rst index adbc038d1..84e2d3ece 100644 --- a/docs/playbook-reference/builtin-alert-enrichment.rst +++ b/docs/playbook-reference/builtin-alert-enrichment.rst @@ -1,26 +1,60 @@ .. _builtin-alert-enrichment: -Builtin Alert Enrichment +Enrich Alerts ######################################## -Robusta takes Prometheus to the next level by correlating alerts with other observability data. +Ever feel overwhelmed by alerts that lack context? Robusta enriches alerts automatically and lets you create custom enrichment rules. + +.. note:: -Testing out Prometheus alerts + **Looking for automatic AI enrichment?** Check out :doc:`HolmesGPT ` for zero-configuration AI-powered alert enrichment that automatically investigates alerts and provides root cause analysis. + +Builtin Alert Enrichment ********************************* -1. Deploy a broken pod that will be stuck in pending state: + +Robusta automatically enriches Prometheus alerts with relevant Kubernetes context: + +* **Pod events** - Recent events related to the affected pod +* **Pod logs** - Relevant log excerpts from crashing or failing containers +* **Resource metrics** - CPU, memory, and other resource usage data +* **Related Kubernetes objects** - Deployments, ReplicaSets, ConfigMaps, etc. + +This happens automatically for common Prometheus alerts without any configuration. To extend it to your own Prometheus alerts you can define custom playbooks. + +Testing Alert Enrichment +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +1. Deploy a broken pod: .. code-block:: bash :name: cb-apply-pendingpod kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/pending_pods/pending_pod_resources.yaml -2. Trigger a Prometheus alert immediately, skipping the normal delays: +2. Trigger a Prometheus alert immediately: .. code-block:: bash :name: cb-trigger-prometheus-alert robusta playbooks trigger prometheus_alert alert_name=KubePodCrashLooping namespace=default pod_name=example-pod -.. admonition:: Example Slack Message +.. admonition:: Example Enriched Alert .. image:: /images/simulatedprometheusalert.png + +Custom Alert Enrichment +********************************* + +Create custom enrichment rules to: + +* Reduce MTTR by automatically gathering system state and logs when alerts fire +* Make faster decisions on which team needs to investigate +* Link alerts to runbooks and documentation for better knowledge sharing + +Get started with these examples: + +.. toctree:: + :maxdepth: 1 + + prometheus-examples/bash-alert-enrichment + prometheus-examples/link-alert-enrichment diff --git a/docs/playbook-reference/defining-playbooks/builtin-playbooks.rst b/docs/playbook-reference/defining-playbooks/builtin-playbooks.rst index f26d6c100..b866ed404 100644 --- a/docs/playbook-reference/defining-playbooks/builtin-playbooks.rst +++ b/docs/playbook-reference/defining-playbooks/builtin-playbooks.rst @@ -25,7 +25,7 @@ The following default playbook handles all Prometheus alerts that Robusta receiv There are additional enrichments for specific alerts. For example: -To define additional playbooks for your own alerts, refer to the :doc:`Custom Alert Enrichment ` guide. +To define additional playbooks for your own alerts, refer to the :doc:`Enrich Alerts ` guide. Default Prometheus Silencing -------------------------------- diff --git a/docs/playbook-reference/defining-playbooks/creating-notifications.rst b/docs/playbook-reference/defining-playbooks/creating-notifications.rst index e4022dedc..c128f8af6 100644 --- a/docs/playbook-reference/defining-playbooks/creating-notifications.rst +++ b/docs/playbook-reference/defining-playbooks/creating-notifications.rst @@ -1,4 +1,4 @@ -Creating Notifications +Playbook Notifications ###################### Playbooks can generate notifications to *let a human know* about something in your cluster. diff --git a/docs/playbook-reference/defining-playbooks/playbook-advanced.rst b/docs/playbook-reference/defining-playbooks/playbook-advanced.rst index 56e6c991a..98d8120f7 100644 --- a/docs/playbook-reference/defining-playbooks/playbook-advanced.rst +++ b/docs/playbook-reference/defining-playbooks/playbook-advanced.rst @@ -3,7 +3,7 @@ Advanced Playbook Techniques ################################ -This guide assumes you already know :ref:`playbook basics ` and how to :ref:`create notifications `. It explains +This guide assumes you already know :ref:`playbook basics ` and how to :ref:`create notifications `. It explains implementation details and common techniques. Using Filters to Restrict Triggers diff --git a/docs/playbook-reference/defining-playbooks/playbook-basics.rst b/docs/playbook-reference/defining-playbooks/playbook-basics.rst index cd839c190..e40986f6e 100644 --- a/docs/playbook-reference/defining-playbooks/playbook-basics.rst +++ b/docs/playbook-reference/defining-playbooks/playbook-basics.rst @@ -186,7 +186,7 @@ Understanding Notifications In Robusta, notifications are called Findings, as they represent something the playbook discovered. -In the above example, a Finding was generated by the ``create_finding`` action. Refer to :ref:`Creating Notifications` +In the above example, a Finding was generated by the ``create_finding`` action. Refer to :ref:`Playbook Notifications` for more details. Matching Actions to Triggers diff --git a/docs/playbook-reference/index.rst b/docs/playbook-reference/index.rst index 232b8caf6..79838a5ed 100644 --- a/docs/playbook-reference/index.rst +++ b/docs/playbook-reference/index.rst @@ -6,7 +6,7 @@ overview Playbook Basics - Creating Notifications + Playbook Notifications Advanced Playbook Techniques Matching Actions to Triggers Loading External Actions diff --git a/docs/playbook-reference/overview.rst b/docs/playbook-reference/overview.rst index 711214089..1bc3b7a84 100644 --- a/docs/playbook-reference/overview.rst +++ b/docs/playbook-reference/overview.rst @@ -6,6 +6,11 @@ Playbooks are deterministic rules for responding to alerts and unhealthy conditi Playbooks are recommended for advanced use cases. Most users should start with :doc:`AI Analysis ` of alerts first, which requires far less configuration. +Quick Start +--------------------- + +New to playbooks? Start with the :doc:`Playbook Basics ` guide to learn how to create your first playbook. + How Playbooks Work --------------------- diff --git a/docs/playbook-reference/prometheus-examples/index.rst b/docs/playbook-reference/prometheus-examples/index.rst deleted file mode 100644 index 55b7e856a..000000000 --- a/docs/playbook-reference/prometheus-examples/index.rst +++ /dev/null @@ -1,24 +0,0 @@ -:hide-toc: - -Custom Alert Enrichment -============================== - -Ever feel overwhelmed by Prometheus alerts that lack context? In this section, you will learn to enrich alerts with deterministic rules using Robusta. - -.. note:: - - **Looking for automatic AI enrichment?** Check out :doc:`HolmesGPT ` for zero-configuration AI-powered alert enrichment that automatically investigates alerts and provides root cause analysis. - -By creating custom enrichment rules, you can: - -* Reduce mean time to resolution (MTTR) by automatically gathering system state and logs when alerts fire -* Make faster decisions on which team needs to investigate the alert -* Link alerts to runbooks and documentation, to improve knowledge sharing - -Get started: - -.. toctree:: - :maxdepth: 1 - - bash-alert-enrichment - link-alert-enrichment diff --git a/docs/playbook-reference/prometheus-examples/link-alert-enrichment.rst b/docs/playbook-reference/prometheus-examples/link-alert-enrichment.rst index e4746424e..f0b522fe5 100644 --- a/docs/playbook-reference/prometheus-examples/link-alert-enrichment.rst +++ b/docs/playbook-reference/prometheus-examples/link-alert-enrichment.rst @@ -8,7 +8,7 @@ This guide demonstrates how to link Prometheus alerts with external links to you Implementation ----------------- -In this example, we add links to the alert ``KubeContainerCPURequestAlert`` that we created in a :ref:`previous tutorial `. +In this example, we'll add links to a Prometheus alert. We're using ``KubeContainerCPURequestAlert`` as an example, but you can apply this to any alert. Below there are three alternatives ways to enrich the alert with links. Apply the YAML to the ``customPlaybooks`` Helm value and :ref:`update Robusta `. diff --git a/docs/setup-robusta/installation-faq.rst b/docs/setup-robusta/installation-faq.rst index 7c2ed67b0..a42da90a7 100644 --- a/docs/setup-robusta/installation-faq.rst +++ b/docs/setup-robusta/installation-faq.rst @@ -64,6 +64,6 @@ It's being planned, speak to us on Slack. Does Robusta replace monitoring tools? ============================================================ -Robusta's :ref:`all-in-one package ` is a complete monitoring and observability solution. +Robusta's :ref:`all-in-one package ` is a complete monitoring and observability solution. Alternatively, you can keep your existing tools and add-on robusta. From 4b83b0179a3e864b974dcb908019ff0e0a558e46 Mon Sep 17 00:00:00 2001 From: Robusta Runner Date: Thu, 6 Nov 2025 20:11:20 +0200 Subject: [PATCH 2/5] more improvements --- docs/conf.py | 16 +- docs/help.rst | 2 + docs/index.rst | 10 +- .../routing-silencing.rst | 1 + .../actions/develop-actions/index.rst | 1 + .../loading-custom-actions.rst | 2 + .../develop-actions/playbook-repositories.rst | 2 + docs/playbook-reference/actions/index.rst | 2 + .../builtin-alert-enrichment.rst | 2 +- .../external-playbook-repositories.rst | 2 + .../defining-playbooks/index.rst | 5 +- .../defining-playbooks/playbook-advanced.rst | 120 ------ .../defining-playbooks/playbook-basics.rst | 203 ---------- .../defining-playbooks/silencer-playbooks.rst | 111 ++++++ .../trigger-action-binding.rst | 70 ---- docs/playbook-reference/index.rst | 367 +++++++++++++++++- .../kubernetes-change-notifications.rst | 2 +- docs/playbook-reference/overview.rst | 23 -- docs/playbook-reference/triggers/index.rst | 2 + .../triggers/kubernetes.rst | 2 +- .../installation/_generate_config.jinja | 2 +- docs/track-changes/kubernetes-changes.rst | 2 +- 22 files changed, 500 insertions(+), 449 deletions(-) delete mode 100644 docs/playbook-reference/defining-playbooks/playbook-advanced.rst delete mode 100644 docs/playbook-reference/defining-playbooks/playbook-basics.rst create mode 100644 docs/playbook-reference/defining-playbooks/silencer-playbooks.rst delete mode 100644 docs/playbook-reference/defining-playbooks/trigger-action-binding.rst delete mode 100644 docs/playbook-reference/overview.rst diff --git a/docs/conf.py b/docs/conf.py index 3497561f4..ea71868d6 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -92,9 +92,11 @@ "configuration/defining-playbooks/creating-notifications.html": "/master/playbook-reference/defining-playbooks/creating-notifications.html", "configuration/defining-playbooks/external-playbook-repositories.html": "/master/playbook-reference/defining-playbooks/external-playbook-repositories.html", "configuration/defining-playbooks/index.html": "/master/playbook-reference/defining-playbooks/index.html", - "configuration/defining-playbooks/playbook-advanced.html": "/master/playbook-reference/defining-playbooks/playbook-advanced.html", - "configuration/defining-playbooks/playbook-basics.html": "/master/playbook-reference/defining-playbooks/playbook-basics.html", - "configuration/defining-playbooks/trigger-action-binding.html": "/master/playbook-reference/defining-playbooks/trigger-action-binding.html", + "configuration/defining-playbooks/playbook-advanced.html": "/master/playbook-reference/index.html", + "playbook-reference/defining-playbooks/playbook-advanced.html": "/master/playbook-reference/index.html", + "configuration/defining-playbooks/playbook-basics.html": "/master/playbook-reference/index.html", + "configuration/defining-playbooks/trigger-action-binding.html": "/master/playbook-reference/index.html#matching-actions-to-triggers", + "playbook-reference/defining-playbooks/trigger-action-binding.html": "/master/playbook-reference/index.html#matching-actions-to-triggers", "configuration/additional-settings.html": "/master/setup-robusta/additional-settings.html", "developer-guide/writing-playbooks.html": "/master/playbook-reference/defining-playbooks/index.html", "user-guide/slack.html": "/master/configuration/sinks/slack.html", @@ -168,7 +170,7 @@ "user-guide/embedded-prometheus.html": "/master/configuration/alertmanager-integration/embedded-prometheus.html#enabling-the-embedded-prometheus", "user-guide/node-selector.html": "/master/setup-robusta/node-selector.html", "user-guide/interactivity.html": "/master/setup-robusta/additional-settings.html#two-way-interactivity", - "user-guide/flow-control.html": "/master/playbook-reference/defining-playbooks/playbook-advanced.html#using-filters-to-restrict-triggers", + "user-guide/flow-control.html": "/master/playbook-reference/index.html#using-filters-to-restrict-triggers", "catalog/triggers/index.html": "/master/playbook-reference/triggers/index.html", "catalog/triggers/kubernetes.html": "/master/playbook-reference/triggers/kubernetes.html", "catalog/triggers/smart.html": "/master/playbook-reference/triggers/kubernetes.html", @@ -190,7 +192,7 @@ "catalog/sinks/webex.html": "/master/configuration/sinks/webex.html", "catalog/sinks/VictorOps.html": "/master/configuration/sinks/VictorOps.html", "catalog/sinks/file.html": "/master/configuration/sinks/file.html", - "user-guide/trigger-action-binding.html": "/master/playbook-reference/defining-playbooks/playbook-basics.html#understanding-actions", + "user-guide/trigger-action-binding.html": "/master/playbook-reference/index.html#understanding-actions", "advanced/privacy-and-security.html": "/master/setup-robusta/privacy-and-security.html", "advanced/robusta-ui-triggers.html": "/master/setup-robusta/installation/index.html", "developer-guide/actions/index.html": "/master/playbook-reference/actions/index.html", @@ -210,7 +212,9 @@ "configuration/alertmanager-integration/troubleshooting-alertmanager.html": "/master/configuration/exporting/send-alerts-api.html", "configuration/alertmanager-integration/grafana-alert-manager.html": "/master/configuration/alertmanager-integration/grafana-self-hosted.html", "configuration/alertmanager-integration/grafana-cloud-mimir.html": "/master/configuration/alertmanager-integration/grafana-cloud.html", - "playbook-reference/what-are-playbooks.html": "/master/playbook-reference/overview.html", + "playbook-reference/what-are-playbooks.html": "/master/playbook-reference/index.html", + "playbook-reference/overview.html": "/master/playbook-reference/index.html", + "playbook-reference/defining-playbooks/playbook-basics.html": "/master/playbook-reference/index.html", "how-it-works/alert-builtin-enrichment.html": "/master/playbook-reference/builtin-alert-enrichment.html", "setup-robusta/installation/extend-prometheus-installation.html": "/master/setup-robusta/installation/standalone-installation.html", "playbook-reference/defining-playbooks/index.html": "/master/playbook-reference/index.html", diff --git a/docs/help.rst b/docs/help.rst index 81abd3f9c..f0e3557cc 100644 --- a/docs/help.rst +++ b/docs/help.rst @@ -1,5 +1,7 @@ :hide-toc: +.. _Getting Support: + Getting Support ================ diff --git a/docs/index.rst b/docs/index.rst index 4345f82da..6099376bf 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -144,8 +144,14 @@ :hidden: playbook-reference/index - Enrich Alerts - Kubernetes Change Notifications + Playbook Notifications + Alert Enrichment + Automatic Remediation + Silencer Playbooks + Triggers Reference + Actions Reference + Log Based Alerting + K8s Change Notification Cost Savings - KRR .. toctree:: diff --git a/docs/notification-routing/routing-silencing.rst b/docs/notification-routing/routing-silencing.rst index fbcf269c8..51fa3054c 100644 --- a/docs/notification-routing/routing-silencing.rst +++ b/docs/notification-routing/routing-silencing.rst @@ -35,4 +35,5 @@ node that just restarted. Further Reading ----------------- +* Learn more about :ref:`Silencer Playbooks ` * View all :ref:`Prometheus Silencer ` actions diff --git a/docs/playbook-reference/actions/develop-actions/index.rst b/docs/playbook-reference/actions/develop-actions/index.rst index f30f184df..046036d88 100644 --- a/docs/playbook-reference/actions/develop-actions/index.rst +++ b/docs/playbook-reference/actions/develop-actions/index.rst @@ -18,6 +18,7 @@ Please consider sharing your custom actions with the community. my-first-custom-action playbook-repositories loading-custom-actions + Loading External Actions <../../defining-playbooks/external-playbook-repositories> overriding-builtin-actions findings-api triggers-and-events diff --git a/docs/playbook-reference/actions/develop-actions/loading-custom-actions.rst b/docs/playbook-reference/actions/develop-actions/loading-custom-actions.rst index f0674a3f5..b627faf89 100644 --- a/docs/playbook-reference/actions/develop-actions/loading-custom-actions.rst +++ b/docs/playbook-reference/actions/develop-actions/loading-custom-actions.rst @@ -1,3 +1,5 @@ +.. _Loading Custom Actions into Robusta: + Loading Custom Actions into Robusta #################################### diff --git a/docs/playbook-reference/actions/develop-actions/playbook-repositories.rst b/docs/playbook-reference/actions/develop-actions/playbook-repositories.rst index 027677e8d..608764ec5 100644 --- a/docs/playbook-reference/actions/develop-actions/playbook-repositories.rst +++ b/docs/playbook-reference/actions/develop-actions/playbook-repositories.rst @@ -1,3 +1,5 @@ +.. _Creating Playbook Repositories: + Creating Playbook Repositories ################################ diff --git a/docs/playbook-reference/actions/index.rst b/docs/playbook-reference/actions/index.rst index 39f82cdee..defe8ba53 100644 --- a/docs/playbook-reference/actions/index.rst +++ b/docs/playbook-reference/actions/index.rst @@ -1,5 +1,7 @@ :hide-toc: +.. _Actions Reference: + Actions Reference ================== diff --git a/docs/playbook-reference/builtin-alert-enrichment.rst b/docs/playbook-reference/builtin-alert-enrichment.rst index 84e2d3ece..46c9729f7 100644 --- a/docs/playbook-reference/builtin-alert-enrichment.rst +++ b/docs/playbook-reference/builtin-alert-enrichment.rst @@ -1,6 +1,6 @@ .. _builtin-alert-enrichment: -Enrich Alerts +Alert Enrichment ######################################## Ever feel overwhelmed by alerts that lack context? Robusta enriches alerts automatically and lets you create custom enrichment rules. diff --git a/docs/playbook-reference/defining-playbooks/external-playbook-repositories.rst b/docs/playbook-reference/defining-playbooks/external-playbook-repositories.rst index abf95e0e6..25638f6b1 100644 --- a/docs/playbook-reference/defining-playbooks/external-playbook-repositories.rst +++ b/docs/playbook-reference/defining-playbooks/external-playbook-repositories.rst @@ -1,3 +1,5 @@ +.. _Loading External Actions: + Loading External Actions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ diff --git a/docs/playbook-reference/defining-playbooks/index.rst b/docs/playbook-reference/defining-playbooks/index.rst index 13a44a063..6e9bd2bad 100644 --- a/docs/playbook-reference/defining-playbooks/index.rst +++ b/docs/playbook-reference/defining-playbooks/index.rst @@ -8,11 +8,8 @@ Learn how to define Robusta playbooks. .. toctree:: :maxdepth: 1 - playbook-basics creating-notifications - playbook-advanced - trigger-action-binding - external-playbook-repositories + silencer-playbooks List of All Triggers and Actions --------------------------------- diff --git a/docs/playbook-reference/defining-playbooks/playbook-advanced.rst b/docs/playbook-reference/defining-playbooks/playbook-advanced.rst deleted file mode 100644 index 98d8120f7..000000000 --- a/docs/playbook-reference/defining-playbooks/playbook-advanced.rst +++ /dev/null @@ -1,120 +0,0 @@ -.. _playbooks-201: - -Advanced Playbook Techniques -################################ - -This guide assumes you already know :ref:`playbook basics ` and how to :ref:`create notifications `. It explains -implementation details and common techniques. - -Using Filters to Restrict Triggers -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -Many triggers have parameters that restrict when they fire: - -.. code-block:: - - - triggers: - - on_pod_crash_loop: - restart_reason: "CrashLoopBackOff" - name_prefix: fluentbit - namespace_prefix: kube-system - -Most Kubernetes-related triggers support at least ``name`` and ``namespace``. Refer to :ref:`Triggers Reference` for -details. - -Running Multiple Playbooks on the Same Event -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -If multiple triggers match an incoming event, all relevant playbooks execute in the order they were defined. For example: - -.. code-block:: yaml - - # first playbook - - triggers: - - on_deployment_create: {} - actions: - - my_first_action: {} - - # second playbook - - triggers: - - on_deployment_create: {} - actions: - - my_second_action: {} - -In the example above, ``my_first_action`` runs before ``my_second_action``. - -Multiple Playbook Instances -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -Likewise, you can enable identical playbooks multiple times with different parameters: - -.. code-block:: yaml - - customPlaybooks: - - triggers: - - on_deployment_update: - name_prefix: MyApp - actions: - - add_deployment_lines_to_grafana: - grafana_api_key: grafana_key_goes_here - grafana_dashboard_uid: id_for_dashboard1 - grafana_url: http://grafana.namespace.svc - sinks: - - "main_slack_sink" - - - triggers: - - on_deployment_update: - name_prefix: OtherApp - actions: - - add_deployment_lines_to_grafana: - grafana_api_key: grafana_key_goes_here - grafana_dashboard_uid: id_for_dashboard2 - grafana_url: http://grafana.namespace.svc - sinks: - - "main_slack_sink" - -If the triggers in multiple playbooks match the same incoming event, all relevant playbooks will run. - -Global Configuration for Playbook Parameters -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -In the previous example, ``grafana_api_key`` and ``grafana_url`` were defined multiple times with the same value. - -To avoid repeating yourself, you can define parameters globally for all playbooks. These parameters will be applied to -any action or trigger which expects a parameter with the same name. - -.. code-block:: yaml - - globalConfig: - cluster_name: "my-staging-cluster" - grafana_api_key: "grafana_key_goes_here" - grafana_url: http://grafana.namespace.svc - - customPlaybooks: - - triggers: - - on_deployment_update: - name_prefix: MyApp - actions: - - add_deployment_lines_to_grafana: - grafana_dashboard_uid: id_for_dashboard1 - sinks: - - "main_slack_sink" - - - triggers: - - on_deployment_update: - name_prefix: OtherApp - actions: - - add_deployment_lines_to_grafana: - grafana_dashboard_uid: id_for_dashboard2 - sinks: - - "main_slack_sink" - -Stopping Playbook Execution -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -An action can :ref:`stop the processing flow ` if needed, preventing subsequent actions from being run. - -This is useful for *silencing* actions like :ref:`node_restart_silencer `. These actions -need to stop alerts from being propogated to other playbooks. - -Only actions following the current action will be stopped. Therefore, silencers must be defined before other playbooks. diff --git a/docs/playbook-reference/defining-playbooks/playbook-basics.rst b/docs/playbook-reference/defining-playbooks/playbook-basics.rst deleted file mode 100644 index e40986f6e..000000000 --- a/docs/playbook-reference/defining-playbooks/playbook-basics.rst +++ /dev/null @@ -1,203 +0,0 @@ -.. _customPlaybooks: - -Playbook Basics -################## - -A playbook is an automation rule for detecting, investigating, or fixing problems in your cluster. - -For a gentle introduction, see :doc:`Playbook Overview ` - -Overview -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -Every playbook consists of a condition (*trigger*) and instructions (*actions*) defining the response. - -Playbooks behave like pipelines: - -1. Events come into Robusta and are checked against triggers. -2. When there is a match, a trigger fires -3. The relevant playbook runs -4. All playbook actions execute, receiving the event as context -5. If notifications were generated by the playbook, they are sent to :ref:`sinks `. - -Defining Custom Playbooks -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -Using a custom playbook, we can get notified in Slack whenever a Pod's Liveness probe fails. - -Use the ``customPlaybooks`` Helm value: - -.. code-block:: yaml - - customPlaybooks: - - triggers: - - on_kubernetes_warning_event_create: - include: ["Liveness"] # fires on failed Liveness probes - actions: - - create_finding: - severity: HIGH - title: "Failed liveness probe: $name" - - event_resource_events: {} - -Perform a :ref:`Helm Upgrade ` to apply the custom playbook. - -Next time a Liveness probe fails, you will get notified. - - .. image:: /images/failedlivenessprobe.png - :alt: Failing Kubernetes liveness probe notification on Slack - :align: center - -Apply the following command the simulate a failing liveness probe. - -.. code-block:: yaml - - kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/liveness_probe_fail/failing_liveness_probe.yaml - - -Let's explore each part of the above playbook in depth. - -Modifying Default Playbooks -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -By default, Robusta has a default set of ``playbooks`` configured. These are used to create notifications for all common Kubernetes issues and Prometheus alerts. - -You can disable any of the ``default playbooks``, or change the configuration of a given ``playbook``. - -In order to disable a default playbook, add the playbook name to the ``disabledPlayooks`` helm value (Playbook name is in the ``name`` attribute of each playbook) - -For example, to disable the ``ImagePullBackOff`` playbook, use: - -.. code-block:: yaml - - disabledPlaybooks: - - ImagePullBackOff - -In order to override the default configuration of the same playbook, both disable it, and add it to ``customPlaybooks`` with the override configuration: - -.. code-block:: yaml - - disabledPlaybooks: - - ImagePullBackOff - - customPlaybooks: - - name: "CustomImagePullBackOff" - triggers: - - on_image_pull_backoff: - fire_delay: 300 # fire only if failing to pull the image for 5 min - actions: - - image_pull_backoff_reporter: {} - - -Organizing Playbooks -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -Using ``namedCustomPlaybooks``, you can define playbooks by name. This is useful when you want to define a base set of playbooks for all clusters/teams and then use additional Helm values files to override some of the base playbooks or add new ones. - -They are all merged together into a single playbooks list. This allows you to split away the custom playbooks from ``generated_values.yaml`` to separate files and organize your playbooks. - -First, add the custom playbooks as a dictionary into a file named ``app_a_playbooks.yaml`` as shown below: - - -.. code-block:: yaml - - namedCustomPlaybooks: - team-a-app-a: - - triggers: - - on_prometheus_alert: - namespace_prefix: "app-a" - actions: - - create_finding: - aggregation_key: "This is app-a - Requires your attention" - severity: HIGH - title: "Check app-a out" - description: "@monitoring.monitoring this is for you" - team-b-app-b: - - triggers: - - on_prometheus_alert: - namespace_prefix: "app-b" - actions: - # Actions for team-b-app-b here - -Then run a Helm upgrade by passing the new file using the ``-f`` flag. - -.. code-block:: yaml - - helm ugprade --install robusta -f generated_values.yaml -f app_a_playbooks.yaml - - -Understanding Triggers -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -**Triggers** are event-driven, firing at specific moments when something occurs in your cluster. Even a Kubernetes cluster doing nothing generates a constant stream of events. Using triggers, you can find and react to the events that matter. - -Going back to the above example, we saw the trigger ``on_kubernetes_warning_event_create``. -Breaking down the name, you'll notice the format ``on__``. This is a general pattern. -``on_kubernetes_warning_event_create`` fires when new Warning Events (``kubectl get events --all-namespaces --field-selector type=Warning``) are created. - -The trigger also had an *include* filter, limiting which Warning Events cause the playbook to run. In this case its a Liveness probe event. -See each trigger's documentation to learn which filters are supported. - -Common Triggers -******************************** -Popular triggers include: - -* :ref:`on_prometheus_alert` -* :ref:`on_pod_crash_loop` -* :ref:`on_deployment_update` - -All triggers can be found under :ref:`Triggers Reference`. - -Understanding Actions -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -**Actions** perform tasks in response to triggers, such as collecting information, investigating issues, or fixing problems. - -In the above example, there were two actions. When playbooks contain multiple actions, they are executed in order: - -* ``create_finding`` - this generates the notification message -* ``event_resource_events`` - this is a specific action for ``on_kubernetes_warning_event_create`` which attaches relevant events to the notification - -The latter action has a funny name, which reflects that it takes a Kubernetes Warning Event as input, finds the related Kubernetes -resource (e.g. a Pod), and then fetches all the related Kubernetes Warning Events for that resource. - -.. _actions-vs-enrichers-vs-silencers: - -.. admonition:: Actions, Enrichers, and Silencers - - Many actions in Robusta were written for a specific purpose, like *enriching* alerts or *silencing* them. - - By convention, these actions are called *enrichers* and *silencers*, but those names are just convention. - - Under the hood, enrichers and silencers are plain old actions, nothing more. - -Common Actions -******************************** -Popular actions include: - -* :ref:`logs_enricher` - fetch a Pod's logs -* :ref:`node_bash_enricher` - run a bash command on a Node -* :ref:`pod_bash_enricher` - run a bash command on a Pod -* :ref:`pod_graph_enricher` - attach a graph of Pod memory/CPU/disk usage - -All actions can be found under :ref:`Actions Reference`. - -Understanding Notifications -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -In Robusta, notifications are called Findings, as they represent something the playbook discovered. - -In the above example, a Finding was generated by the ``create_finding`` action. Refer to :ref:`Playbook Notifications` -for more details. - -Matching Actions to Triggers -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Triggers output *typed events* when they fire. For example: - -* The ``on_prometheus_alert`` trigger outputs a *PrometheusAlert* event -* The ``on_pod_update`` trigger outputs a *PodChangeEvent* event - -Each action is compatible with a subset of event types. - -For instance, ``logs_enricher`` requires an event with a Pod object, such as *PrometheusAlert*, *PodEvent*, or *PodChangeEvent*. - -Refer to docs :ref:`for each action ` , to see supported events. diff --git a/docs/playbook-reference/defining-playbooks/silencer-playbooks.rst b/docs/playbook-reference/defining-playbooks/silencer-playbooks.rst new file mode 100644 index 000000000..58772861d --- /dev/null +++ b/docs/playbook-reference/defining-playbooks/silencer-playbooks.rst @@ -0,0 +1,111 @@ +.. _Silencer Playbooks: + +Silencer Playbooks +################## + +Silencer playbooks prevent alerts from being sent by stopping playbook execution before notifications reach sinks. + +They're useful for: + +* Implementing *silencing as code* in a YAML file +* Selectively silencing with *smart logic*, not just according to labels +* Reducing alert noise by filtering expected transient conditions + +How Silencers Work +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +An action can :ref:`stop the processing flow ` if needed, preventing subsequent actions from being run. + +Silencer actions evaluate conditions and stop alert propagation when those conditions are met. This prevents alerts from being sent to other playbooks and notification sinks. + +**Important:** Silencers must be defined before other playbooks to work correctly. Only actions following the silencer will be stopped. + +Example: Node Restart Silencer +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This example silences ``KubePodCrashLooping`` alerts when they fire within 10 minutes of a node restart: + +.. code-block:: yaml + + customPlaybooks: + - triggers: + - on_prometheus_alert: + alert_name: KubePodCrashLooping + actions: + - node_restart_silencer: + post_restart_silence: 600 # seconds + +The ``node_restart_silencer`` is context-aware. It will only silence ``KubePodCrashLooping`` for Pods running on the node that just restarted. + +Example: Severity-Based Silencing +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Silence alerts below a certain severity threshold: + +.. code-block:: yaml + + customPlaybooks: + - triggers: + - on_prometheus_alert: {} + actions: + - severity_silencer: + severity: LOW # silence all LOW severity alerts + +Example: Name-Based Silencing +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Silence specific alerts by name pattern: + +.. code-block:: yaml + + customPlaybooks: + - triggers: + - on_prometheus_alert: {} + actions: + - name_silencer: + names: + - "Watchdog" + - "InfoInhibitor" + +Example: Pod Status Silencing +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Silence alerts for pods in specific states. + +Exclude pods in certain states (stop processing if pod is in these states): + +.. code-block:: yaml + + customPlaybooks: + - triggers: + - on_prometheus_alert: {} + actions: + - pod_status_silencer: + exclude: + - "Pending" + - "ContainerCreating" + +Or include only certain states (stop processing unless pod is in these states): + +.. code-block:: yaml + + customPlaybooks: + - triggers: + - on_prometheus_alert: {} + actions: + - pod_status_silencer: + include: + - "Running" + +Available Silencer Actions +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +View the complete reference for all silencer actions and their parameters: + +* :ref:`node_restart_silencer ` - Silence alerts during node restarts +* :ref:`severity_silencer ` - Filter alerts by severity level +* :ref:`name_silencer ` - Silence specific alerts by name +* :ref:`silence_alert ` - General-purpose silencing mechanism +* :ref:`pod_status_silencer ` - Silence alerts for pods in specific states + +See the full :ref:`Prometheus Silencers ` reference for detailed documentation on each action. diff --git a/docs/playbook-reference/defining-playbooks/trigger-action-binding.rst b/docs/playbook-reference/defining-playbooks/trigger-action-binding.rst deleted file mode 100644 index bc42328ee..000000000 --- a/docs/playbook-reference/defining-playbooks/trigger-action-binding.rst +++ /dev/null @@ -1,70 +0,0 @@ -Matching Actions to Triggers -################################ -Each trigger outputs an event of a specific type, and each action expects a typed event as input. - -For example, the ``on_prometheus_alert`` trigger outputs a *PrometheusAlert* event, while ``on_pod_update`` outputs a *PodChangeEvent.* - -These events flow into the actions section, where each action is compatible with a subset of event types. -For instance, the ``logs_enricher`` action expects to receive events that have a Pod object, such as *PrometheusAlert*, *PodEvent*, or *PodChangeEvent*. - -When configuring Robusta playbooks you don't need to worry about all these details. You can just look at each trigger and see which actions are supported. - -This page defines in-depth how triggers are bound to actions. - -Simple actions ------------------ - -Simple actions take no special parameters and can therefore run on every trigger. - -Resource-related actions --------------------------- - -Some actions require Kubernetes resources as input. - -For example, the ``logs_enricher`` action requires a pod as input. - -Therefore, ``logs_enricher`` can only be connected to triggers which output a pod. For example: - -* ``on_pod_create`` -* ``on_pod_update`` -* ``on_prometheus_alert`` - for alerts with a ``pod`` label -* :ref:`manual trigger ` - by passing the pod's name as a cli argument - -Trigger hierarchies -------------------------------- - -All of the triggers in Robusta form a hierarchy. If an action supports a specific trigger, it also supports -descendants of that trigger. - -For example, the trigger ``on_deployment_all_changes`` has a child trigger ``on_deployment_create``. -The latter may be used wherever the former is expected. - - -.. graphviz:: - - digraph trigger_inheritance { - bgcolor=transparent; - rankdir=LR; - size="8.0, 12.0"; - node [fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled"] - "on_schedule"; - "on_prometheus_alert**"; - "on_kubernetes_any_resource_all_changes"; - "on_deployment_all_changes"; - "on_deployment_create"; - "on_deployment_update"; - "on_deployment_delete"; - "on_deployment_all_changes" -> "on_deployment_create" [arrowsize=0.5,style="setlinewidth(0.5)"]; - "on_deployment_all_changes" -> "on_deployment_update" [arrowsize=0.5,style="setlinewidth(0.5)"]; - "on_deployment_all_changes" -> "on_deployment_delete" [arrowsize=0.5,style="setlinewidth(0.5)"]; - "on_kubernetes_any_resource_all_changes" -> "on_deployment_all_changes" [arrowsize=0.5,style="setlinewidth(0.5)"]; - } - -.. note:: - - ``on_prometheus_alert`` is compatible with most Robusta actions that take Kubernetes resources. - -Developer Guide ------------------ - -If you extend Robusta with custom actions in Python, refer to :ref:`the developer guide `. diff --git a/docs/playbook-reference/index.rst b/docs/playbook-reference/index.rst index 79838a5ed..1624ddb44 100644 --- a/docs/playbook-reference/index.rst +++ b/docs/playbook-reference/index.rst @@ -1,16 +1,351 @@ -:hide-toc: - -.. toctree:: - :maxdepth: 1 - :hidden: - - overview - Playbook Basics - Playbook Notifications - Advanced Playbook Techniques - Matching Actions to Triggers - Loading External Actions - ⚡️ Triggers - 💥 Actions - automatic-remediation-examples/index - Log Based Alerting +.. _customPlaybooks: +.. _Playbook Basics: + +Playbooks Basics +################## + +Playbooks are deterministic rules for responding to alerts and unhealthy conditions in a Kubernetes cluster. + +Playbooks are recommended for advanced use cases. Most users should start with :doc:`AI Analysis ` of alerts first, which requires far less configuration. + +How Playbooks Work +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Every playbook consists of two parts: + +* A :ref:`Trigger ` condition that defines when the automation runs +* An :ref:`Action ` that defines what the automation does + +Playbooks behave like pipelines: + +1. Events come into Robusta and are checked against triggers +2. When there is a match, a trigger fires +3. The relevant playbook runs +4. All playbook actions execute, receiving the event as context +5. If :ref:`notifications ` were generated, they are sent to :ref:`sinks ` + +Defining Custom Playbooks +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Using a custom playbook, we can get notified in Slack whenever a Pod's Liveness probe fails. + +Use the ``customPlaybooks`` Helm value: + +.. code-block:: yaml + + customPlaybooks: + - triggers: + - on_kubernetes_warning_event_create: + include: ["Liveness"] # fires on failed Liveness probes + actions: + - create_finding: + severity: HIGH + title: "Failed liveness probe: $name" + - event_resource_events: {} + +Perform a :ref:`Helm Upgrade ` to apply the custom playbook. + +Next time a Liveness probe fails, you will get notified. + + .. image:: /images/failedlivenessprobe.png + :alt: Failing Kubernetes liveness probe notification on Slack + :align: center + +Apply the following command the simulate a failing liveness probe. + +.. code-block:: yaml + + kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/liveness_probe_fail/failing_liveness_probe.yaml + + +Let's explore each part of the above playbook in depth. + +Using Filters to Restrict Triggers +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Many triggers have parameters that restrict when they fire: + +.. code-block:: yaml + + - triggers: + - on_pod_crash_loop: + restart_reason: "CrashLoopBackOff" + name_prefix: fluentbit + namespace_prefix: kube-system + +Most Kubernetes-related triggers support at least ``name`` and ``namespace``. Refer to :ref:`Triggers Reference` for details. + +Running Multiple Playbooks +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +If multiple triggers match an incoming event, all relevant playbooks execute in the order they were defined. For example: + +.. code-block:: yaml + + # first playbook + - triggers: + - on_deployment_create: {} + actions: + - my_first_action: {} + + # second playbook + - triggers: + - on_deployment_create: {} + actions: + - my_second_action: {} + +In the example above, ``my_first_action`` runs before ``my_second_action``. + +You can enable identical playbooks multiple times with different parameters: + +.. code-block:: yaml + + customPlaybooks: + - triggers: + - on_deployment_update: + name_prefix: MyApp + actions: + - add_deployment_lines_to_grafana: + grafana_api_key: grafana_key_goes_here + grafana_dashboard_uid: id_for_dashboard1 + grafana_url: http://grafana.namespace.svc + + - triggers: + - on_deployment_update: + name_prefix: OtherApp + actions: + - add_deployment_lines_to_grafana: + grafana_api_key: grafana_key_goes_here + grafana_dashboard_uid: id_for_dashboard2 + grafana_url: http://grafana.namespace.svc + +If the triggers in multiple playbooks match the same incoming event, all relevant playbooks will run. + +Understanding Triggers +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +**Triggers** are event-driven, firing at specific moments when something occurs in your cluster. Even a Kubernetes cluster doing nothing generates a constant stream of events. Using triggers, you can find and react to the events that matter. + +Going back to the above example, we saw the trigger ``on_kubernetes_warning_event_create``. +Breaking down the name, you'll notice the format ``on__``. This is a general pattern. +``on_kubernetes_warning_event_create`` fires when new Warning Events (``kubectl get events --all-namespaces --field-selector type=Warning``) are created. + +The trigger also had an *include* filter, limiting which Warning Events cause the playbook to run. In this case its a Liveness probe event. +See each trigger's documentation to learn which filters are supported. + +Common Triggers +******************************** +Popular triggers include: + +* :ref:`on_prometheus_alert` +* :ref:`on_pod_crash_loop` +* :ref:`on_deployment_update` + +All triggers can be found under :ref:`Triggers Reference`. + +Understanding Actions +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +**Actions** perform tasks in response to triggers, such as collecting information, investigating issues, or fixing problems. + +In the above example, there were two actions. When playbooks contain multiple actions, they are executed in order: + +* ``create_finding`` - this generates the notification message +* ``event_resource_events`` - this is a specific action for ``on_kubernetes_warning_event_create`` which attaches relevant events to the notification + +The latter action has a funny name, which reflects that it takes a Kubernetes Warning Event as input, finds the related Kubernetes +resource (e.g. a Pod), and then fetches all the related Kubernetes Warning Events for that resource. + +.. _actions-vs-enrichers-vs-silencers: + +.. admonition:: Actions, Enrichers, and Silencers + + Many actions in Robusta were written for a specific purpose, like *enriching* alerts or *silencing* them. + + By convention, these actions are called *enrichers* and *silencers*, but those names are just convention. + + Under the hood, enrichers and silencers are plain old actions, nothing more. + +Common Actions +******************************** +Popular actions include: + +* :ref:`logs_enricher` - fetch a Pod's logs +* :ref:`node_bash_enricher` - run a bash command on a Node +* :ref:`pod_bash_enricher` - run a bash command on a Pod +* :ref:`pod_graph_enricher` - attach a graph of Pod memory/CPU/disk usage + +All actions can be found under :ref:`Actions Reference`. + +Understanding Notifications +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In Robusta, notifications are called Findings, as they represent something the playbook discovered. + +In the above example, a Finding was generated by the ``create_finding`` action. Refer to :ref:`Playbook Notifications` +for more details. + +.. _Matching Actions to Triggers: + +Matching Actions to Triggers +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Each trigger outputs an event of a specific type, and each action expects a typed event as input. + +For example, the ``on_prometheus_alert`` trigger outputs a *PrometheusAlert* event, while ``on_pod_update`` outputs a *PodChangeEvent*. + +These events flow into the actions section, where each action is compatible with a subset of event types. +For instance, the ``logs_enricher`` action expects to receive events that have a Pod object, such as *PrometheusAlert*, *PodEvent*, or *PodChangeEvent*. + +When configuring Robusta playbooks you don't need to worry about all these details. You can just look at each trigger and see which actions are supported. + +Simple Actions +******************************** + +Simple actions take no special parameters and can therefore run on every trigger. + +Resource-Related Actions +******************************** + +Some actions require Kubernetes resources as input. + +For example, the ``logs_enricher`` action requires a pod as input. + +Therefore, ``logs_enricher`` can only be connected to triggers which output a pod. For example: + +* ``on_pod_create`` +* ``on_pod_update`` +* ``on_prometheus_alert`` - for alerts with a ``pod`` label +* :ref:`manual trigger ` - by passing the pod's name as a cli argument + +Trigger Hierarchies +******************************** + +All of the triggers in Robusta form a hierarchy. If an action supports a specific trigger, it also supports +descendants of that trigger. + +For example, the trigger ``on_deployment_all_changes`` has a child trigger ``on_deployment_create``. +The latter may be used wherever the former is expected. + +.. graphviz:: + + digraph trigger_inheritance { + bgcolor=transparent; + rankdir=LR; + size="8.0, 12.0"; + node [fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled"] + "on_schedule"; + "on_prometheus_alert**"; + "on_kubernetes_any_resource_all_changes"; + "on_deployment_all_changes"; + "on_deployment_create"; + "on_deployment_update"; + "on_deployment_delete"; + "on_deployment_all_changes" -> "on_deployment_create" [arrowsize=0.5,style="setlinewidth(0.5)"]; + "on_deployment_all_changes" -> "on_deployment_update" [arrowsize=0.5,style="setlinewidth(0.5)"]; + "on_deployment_all_changes" -> "on_deployment_delete" [arrowsize=0.5,style="setlinewidth(0.5)"]; + "on_kubernetes_any_resource_all_changes" -> "on_deployment_all_changes" [arrowsize=0.5,style="setlinewidth(0.5)"]; + } + +.. note:: + + ``on_prometheus_alert`` is compatible with most Robusta actions that take Kubernetes resources. + +For more details, refer to :ref:`Actions Reference` to see which events each action supports. If you extend Robusta with custom actions in Python, refer to :ref:`the developer guide `. + +Modifying Default Playbooks +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +By default, Robusta has a default set of ``playbooks`` configured. These are used to create notifications for all common Kubernetes issues and Prometheus alerts. + +You can disable any of the ``default playbooks``, or change the configuration of a given ``playbook``. + +In order to disable a default playbook, add the playbook name to the ``disabledPlayooks`` helm value (Playbook name is in the ``name`` attribute of each playbook) + +For example, to disable the ``ImagePullBackOff`` playbook, use: + +.. code-block:: yaml + + disabledPlaybooks: + - ImagePullBackOff + +In order to override the default configuration of the same playbook, both disable it, and add it to ``customPlaybooks`` with the override configuration: + +.. code-block:: yaml + + disabledPlaybooks: + - ImagePullBackOff + + customPlaybooks: + - name: "CustomImagePullBackOff" + triggers: + - on_image_pull_backoff: + fire_delay: 300 # fire only if failing to pull the image for 5 min + actions: + - image_pull_backoff_reporter: {} + + +Organizing Playbooks +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Using ``namedCustomPlaybooks``, you can define playbooks by name. This is useful when you want to define a base set of playbooks for all clusters/teams and then use additional Helm values files to override some of the base playbooks or add new ones. + +They are all merged together into a single playbooks list. This allows you to split away the custom playbooks from ``generated_values.yaml`` to separate files and organize your playbooks. + +First, add the custom playbooks as a dictionary into a file named ``app_a_playbooks.yaml`` as shown below: + + +.. code-block:: yaml + + namedCustomPlaybooks: + team-a-app-a: + - triggers: + - on_prometheus_alert: + namespace_prefix: "app-a" + actions: + - create_finding: + aggregation_key: "This is app-a - Requires your attention" + severity: HIGH + title: "Check app-a out" + description: "@monitoring.monitoring this is for you" + team-b-app-b: + - triggers: + - on_prometheus_alert: + namespace_prefix: "app-b" + actions: + # Actions for team-b-app-b here + +Then run a Helm upgrade by passing the new file using the ``-f`` flag. + +.. code-block:: yaml + + helm ugprade --install robusta -f generated_values.yaml -f app_a_playbooks.yaml + +Global Configuration +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +To avoid repeating parameters across multiple playbooks, define them globally. These parameters will be applied to any action or trigger that expects a parameter with the same name. + +For example, instead of repeating ``grafana_api_key`` and ``grafana_url``: + +.. code-block:: yaml + + globalConfig: + cluster_name: "my-staging-cluster" + grafana_api_key: "grafana_key_goes_here" + grafana_url: http://grafana.namespace.svc + + customPlaybooks: + - triggers: + - on_deployment_update: + name_prefix: MyApp + actions: + - add_deployment_lines_to_grafana: + grafana_dashboard_uid: id_for_dashboard1 + + - triggers: + - on_deployment_update: + name_prefix: OtherApp + actions: + - add_deployment_lines_to_grafana: + grafana_dashboard_uid: id_for_dashboard2 diff --git a/docs/playbook-reference/kubernetes-examples/kubernetes-change-notifications.rst b/docs/playbook-reference/kubernetes-examples/kubernetes-change-notifications.rst index e4673df99..82a7c3429 100644 --- a/docs/playbook-reference/kubernetes-examples/kubernetes-change-notifications.rst +++ b/docs/playbook-reference/kubernetes-examples/kubernetes-change-notifications.rst @@ -1,4 +1,4 @@ -Kubernetes Change Notifications +K8s Change Notification ################################ You can configure Robusta to send push notifications when Kubernetes resources change or become unhealthy. This is done by listening to API Server changes with `kubewatch `_ and then filtering the stream of events in a Robusta playbook. diff --git a/docs/playbook-reference/overview.rst b/docs/playbook-reference/overview.rst deleted file mode 100644 index 1bc3b7a84..000000000 --- a/docs/playbook-reference/overview.rst +++ /dev/null @@ -1,23 +0,0 @@ - -Overview -=========== - -Playbooks are deterministic rules for responding to alerts and unhealthy conditions in a Kubernetes cluster. - -Playbooks are recommended for advanced use cases. Most users should start with :doc:`AI Analysis ` of alerts first, which requires far less configuration. - -Quick Start ---------------------- - -New to playbooks? Start with the :doc:`Playbook Basics ` guide to learn how to create your first playbook. - -How Playbooks Work ---------------------- - -Automations in Robusta are called playbooks and they are defined in YAML in your Robusta Helm values. - -Every playbook consists of two parts: - -* A :ref:`Trigger ` condition that defines when the automation runs -* An :ref:`Action ` that defines what the automation does (typically falling into the above categories like enrich, remediate, silence, etc) - diff --git a/docs/playbook-reference/triggers/index.rst b/docs/playbook-reference/triggers/index.rst index 13623511f..91a9bc24c 100644 --- a/docs/playbook-reference/triggers/index.rst +++ b/docs/playbook-reference/triggers/index.rst @@ -1,5 +1,7 @@ :hide-toc: +.. _Triggers Reference: + Triggers Reference =================== diff --git a/docs/playbook-reference/triggers/kubernetes.rst b/docs/playbook-reference/triggers/kubernetes.rst index 2e5753095..adb5c8e6f 100644 --- a/docs/playbook-reference/triggers/kubernetes.rst +++ b/docs/playbook-reference/triggers/kubernetes.rst @@ -9,7 +9,7 @@ These triggers work even when Prometheus is not connected to Robusta. They're tr .. details:: Related Tutorials - * :doc:`Kubernetes Change Notifications ` + * :doc:`K8s Change Notification ` Crashing Pod Triggers diff --git a/docs/setup-robusta/installation/_generate_config.jinja b/docs/setup-robusta/installation/_generate_config.jinja index 520fd072e..496ed8705 100644 --- a/docs/setup-robusta/installation/_generate_config.jinja +++ b/docs/setup-robusta/installation/_generate_config.jinja @@ -40,7 +40,7 @@ Choose a configuration method below: * Python 3.7 or higher is required. * Use ``pip3`` on systems with both Python 2 and Python 3. - * A ``command not found: robusta`` error means :ref:`Python's script directory is not your PATH.`. + * A ``command not found: robusta`` error means :ref:`Python's script directory is not in your PATH `. .. tab-item:: docker :name: docker-cli-tab diff --git a/docs/track-changes/kubernetes-changes.rst b/docs/track-changes/kubernetes-changes.rst index 90f7f6c48..7a49524ef 100644 --- a/docs/track-changes/kubernetes-changes.rst +++ b/docs/track-changes/kubernetes-changes.rst @@ -5,4 +5,4 @@ When using Robusta SaaS, Robusta automatically tracks all Kubernetes changes and This provides context about recent changes when investigating issues, helping you quickly identify if a deployment, configuration update, or other change caused a problem. -Looking to get push notifications (e.g. Slack or other sinks) when Kubernetes resources change? See the :doc:`Kubernetes Change Notifications ` guide in the Advanced section. +Looking to get push notifications (e.g. Slack or other sinks) when Kubernetes resources change? See the :doc:`K8s Change Notification ` guide in the Advanced section. From 219de1240fc654df1d15a59c403cc439fc283b67 Mon Sep 17 00:00:00 2001 From: Robusta Runner Date: Thu, 6 Nov 2025 20:12:17 +0200 Subject: [PATCH 3/5] update menu order --- docs/index.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/index.rst b/docs/index.rst index 6099376bf..3e22e6116 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -148,11 +148,11 @@ Alert Enrichment Automatic Remediation Silencer Playbooks - Triggers Reference - Actions Reference Log Based Alerting K8s Change Notification Cost Savings - KRR + Triggers Reference + Actions Reference .. toctree:: :maxdepth: 4 From 2f57b11aaeaeda62f27f889dbd0cfc5e2d404efe Mon Sep 17 00:00:00 2001 From: Robusta Runner Date: Thu, 6 Nov 2025 20:30:10 +0200 Subject: [PATCH 4/5] fixes --- .../alert-custom-prometheus.rst | 7 +++---- docs/configuration/holmesgpt/getting-started.rst | 2 ++ docs/configuration/resource-recommender.rst | 10 +++++----- docs/help.rst | 2 +- docs/how-it-works/coverage.rst | 5 ++--- docs/index.rst | 2 +- docs/notification-routing/configuring-sinks.rst | 2 +- docs/notification-routing/index.rst | 2 +- .../kubernetes-change-notifications.rst | 2 +- docs/playbook-reference/triggers/kubernetes.rst | 2 +- docs/track-changes/kubernetes-changes.rst | 2 +- 11 files changed, 19 insertions(+), 19 deletions(-) diff --git a/docs/configuration/alertmanager-integration/alert-custom-prometheus.rst b/docs/configuration/alertmanager-integration/alert-custom-prometheus.rst index f97fb6182..3664329f9 100644 --- a/docs/configuration/alertmanager-integration/alert-custom-prometheus.rst +++ b/docs/configuration/alertmanager-integration/alert-custom-prometheus.rst @@ -8,11 +8,10 @@ Create Custom Prometheus Alerts You can define new alerts in two ways using Robusta: 1. Prometheus Alerts - Using PromQL -2. Robusta Playbooks - Using :ref:`customPlaybooks YAML ` +2. Robusta Playbooks - Using :ref:`customPlaybooks YAML ` These methods are not mutually exclusive. Robusta playbooks can respond to Prometheus alerts, or they can generate -alerts themselves by listening directly to the Kubernetes APIServer. To better understand the trade-offs, refer to -:ref:`Should I generate alerts with Robusta or with Prometheus? ` +alerts themselves by listening directly to the Kubernetes APIServer. In this tutorial, we use the first method to generate a custom Prometheus alert using PromQL. In the next tutorial, we define a custom Robusta playbook that enhances the alert and makes it better. @@ -109,4 +108,4 @@ Next Steps Learn how to enrich Prometheus alerts with more context, so that you can respond faster: -* :ref:`Prometheus Alert Enrichment` +* :ref:`Alert Enrichment ` diff --git a/docs/configuration/holmesgpt/getting-started.rst b/docs/configuration/holmesgpt/getting-started.rst index 937a57c6b..f66083d18 100644 --- a/docs/configuration/holmesgpt/getting-started.rst +++ b/docs/configuration/holmesgpt/getting-started.rst @@ -142,6 +142,8 @@ Instead of Robusta AI, you can use your own OpenAI, Azure, or AWS Bedrock accoun name: holmes-secrets key: awsSecretAccessKey +.. _Reading the Robusta UI Token from a secret in HolmesGPT: + Using Existing Secrets ---------------------- diff --git a/docs/configuration/resource-recommender.rst b/docs/configuration/resource-recommender.rst index c112bfc1d..7d6897a25 100644 --- a/docs/configuration/resource-recommender.rst +++ b/docs/configuration/resource-recommender.rst @@ -1,13 +1,13 @@ Cost Savings (KRR) ************************************************************ -Robustas `KRR `_ is a CLI tool for optimizing resource allocation in Kubernetes clusters. -It gathers pod usage data from Prometheus and recommends requests and limits for CPU and memory. This reduces costs and improves performance. +`KRR `_ is a CLI tool that optimizes resource allocation in Kubernetes clusters. +It gathers pod usage data from Prometheus and recommends requests and limits for CPU and memory, reducing costs and improving performance. -By optionally integrating KRR with Robusta you can: +Robusta can run KRR scans on a :ref:`schedule ` using playbooks. Because KRR is so popular, it has dedicated documentation here. You can: -1. Get weekly KRR scan reports in Slack via Robusta OSS (disabled by default, see below to configure) -2. View KRR scans from all your clusters in the Robusta UI (enabled by default for UI users) +1. Send weekly scan reports to Slack or other sinks via Robusta OSS (disabled by default, configure below) +2. View scans from all your clusters in the Robusta UI (enabled by default for UI users) Sending Weekly KRR Scan Reports to Slack diff --git a/docs/help.rst b/docs/help.rst index f0e3557cc..cea2849d7 100644 --- a/docs/help.rst +++ b/docs/help.rst @@ -250,7 +250,7 @@ Holmes It's often because the ``Robusta UI Token`` is pulled from a secret, and Holmes cannot read it. - See :ref:`Reading the Robusta UI Token from a secret in HolmesGPT` to configure Holmes to read the ``token`` + See :ref:`Using Existing Secrets ` to configure Holmes to read the ``token`` Phase 4: Integration Issues ^^^^^^^^^^^^^^^^^^^^^^^^^^^ diff --git a/docs/how-it-works/coverage.rst b/docs/how-it-works/coverage.rst index 295a1b8cf..65288de8b 100644 --- a/docs/how-it-works/coverage.rst +++ b/docs/how-it-works/coverage.rst @@ -20,7 +20,7 @@ Prometheus Alerts .. warning:: - You must :ref:`send your Prometheus alerts to Robusta by webhook ` for these to work. + You must send your Prometheus alerts to Robusta by webhook for these to work. See :doc:`AlertManager Integration `. Other errors ---------------- @@ -39,8 +39,7 @@ Change Tracking By default all changes to Deployments, DaemonSets, and StatefulSets are sent to the Robusta UI for correlation with Prometheus alerts and other errors. -These changes are not sent to other sinks (e.g. Slack) by default because they are spammy. :ref:`Routing Cookbook` -explains how to selectively track changes you care about in Slack as well. +These changes are not sent to other sinks (e.g. Slack) by default because they are spammy. See :doc:`Notification Routing ` to learn how to selectively track changes you care about in Slack as well. We also wrote a blog post `Why everyone should track Kubernetes changes and top four ways to do so `_ diff --git a/docs/index.rst b/docs/index.rst index 3e22e6116..8b1e7c3c4 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -147,9 +147,9 @@ Playbook Notifications Alert Enrichment Automatic Remediation + Change Tracking Playbooks Silencer Playbooks Log Based Alerting - K8s Change Notification Cost Savings - KRR Triggers Reference Actions Reference diff --git a/docs/notification-routing/configuring-sinks.rst b/docs/notification-routing/configuring-sinks.rst index e70e28e68..eea558ae2 100644 --- a/docs/notification-routing/configuring-sinks.rst +++ b/docs/notification-routing/configuring-sinks.rst @@ -9,7 +9,7 @@ A Simple Sink Configuration ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Sinks are defined in Robusta's Helm chart, using the ``sinksConfig`` value. -For example, lets add a :ref:`Microsoft Teams Sink `: +For example, lets add a :ref:`Microsoft Teams Sink `: .. code-block:: yaml diff --git a/docs/notification-routing/index.rst b/docs/notification-routing/index.rst index 570c36897..b491d6ab5 100644 --- a/docs/notification-routing/index.rst +++ b/docs/notification-routing/index.rst @@ -68,7 +68,7 @@ Popular Sinks .. grid-item-card:: PagerDuty :class-card: sd-bg-light sd-bg-text-light - :link: ../configuration/sinks/pagerduty + :link: ../configuration/sinks/PagerDuty :link-type: doc .. grid-item-card:: View All Sinks diff --git a/docs/playbook-reference/kubernetes-examples/kubernetes-change-notifications.rst b/docs/playbook-reference/kubernetes-examples/kubernetes-change-notifications.rst index 82a7c3429..d78357a67 100644 --- a/docs/playbook-reference/kubernetes-examples/kubernetes-change-notifications.rst +++ b/docs/playbook-reference/kubernetes-examples/kubernetes-change-notifications.rst @@ -1,4 +1,4 @@ -K8s Change Notification +Change Tracking Playbooks ################################ You can configure Robusta to send push notifications when Kubernetes resources change or become unhealthy. This is done by listening to API Server changes with `kubewatch `_ and then filtering the stream of events in a Robusta playbook. diff --git a/docs/playbook-reference/triggers/kubernetes.rst b/docs/playbook-reference/triggers/kubernetes.rst index adb5c8e6f..6c515ffa8 100644 --- a/docs/playbook-reference/triggers/kubernetes.rst +++ b/docs/playbook-reference/triggers/kubernetes.rst @@ -9,7 +9,7 @@ These triggers work even when Prometheus is not connected to Robusta. They're tr .. details:: Related Tutorials - * :doc:`K8s Change Notification ` + * :doc:`Change Tracking Playbooks ` Crashing Pod Triggers diff --git a/docs/track-changes/kubernetes-changes.rst b/docs/track-changes/kubernetes-changes.rst index 7a49524ef..bca7e4b1a 100644 --- a/docs/track-changes/kubernetes-changes.rst +++ b/docs/track-changes/kubernetes-changes.rst @@ -5,4 +5,4 @@ When using Robusta SaaS, Robusta automatically tracks all Kubernetes changes and This provides context about recent changes when investigating issues, helping you quickly identify if a deployment, configuration update, or other change caused a problem. -Looking to get push notifications (e.g. Slack or other sinks) when Kubernetes resources change? See the :doc:`K8s Change Notification ` guide in the Advanced section. +Looking to get push notifications (e.g. Slack or other sinks) when Kubernetes resources change? See the :doc:`Change Tracking Playbooks ` guide in the Advanced section. From 3b4ca837392e9651a676a2ef287fe488ba919dd5 Mon Sep 17 00:00:00 2001 From: Natan Yellin Date: Thu, 6 Nov 2025 20:53:10 +0200 Subject: [PATCH 5/5] Update docs/playbook-reference/index.rst Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> --- docs/playbook-reference/index.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/playbook-reference/index.rst b/docs/playbook-reference/index.rst index 1624ddb44..7a5ab9e1f 100644 --- a/docs/playbook-reference/index.rst +++ b/docs/playbook-reference/index.rst @@ -319,7 +319,7 @@ Then run a Helm upgrade by passing the new file using the ``-f`` flag. .. code-block:: yaml - helm ugprade --install robusta -f generated_values.yaml -f app_a_playbooks.yaml + helm upgrade --install robusta -f generated_values.yaml -f app_a_playbooks.yaml Global Configuration ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^