Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 19 additions & 5 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,10 +109,15 @@
"how-it-works/privacy-and-security.html": "/master/setup-robusta/privacy-and-security.html",
"how-it-works/index.html": "/master/playbook-reference/what-are-playbooks.html",
"playbook-reference/examples.html": "/master/playbook-reference/prometheus-examples/index.html",
"tutorials/playbook-track-changes.html": "/master/playbook-reference/kubernetes-examples/playbook-failed-liveness.html",
"tutorials/playbook-job-failure.html": "/master/playbook-reference/kubernetes-examples/playbook-job-failure.html",
"tutorials/playbook-failed-liveness.html": "/master/playbook-reference/kubernetes-examples/playbook-failed-liveness.html",
"tutorials/playbook-track-secrets.html": "/master/playbook-reference/kubernetes-examples//playbook-track-secrets.html",
"tutorials/playbook-track-changes.html": "/master/playbook-reference/kubernetes-examples/kubernetes-change-notifications.html",
"tutorials/playbook-job-failure.html": "/master/playbook-reference/kubernetes-examples/kubernetes-change-notifications.html",
"tutorials/playbook-failed-liveness.html": "/master/playbook-reference/kubernetes-examples/kubernetes-change-notifications.html",
"tutorials/playbook-track-secrets.html": "/master/playbook-reference/kubernetes-examples/kubernetes-change-notifications.html",
"playbook-reference/kubernetes-examples/playbook-failed-liveness.html": "/master/playbook-reference/kubernetes-examples/kubernetes-change-notifications.html",
"playbook-reference/kubernetes-examples/playbook-job-failure.html": "/master/playbook-reference/kubernetes-examples/kubernetes-change-notifications.html",
"playbook-reference/kubernetes-examples/playbook-track-changes.html": "/master/playbook-reference/kubernetes-examples/kubernetes-change-notifications.html",
"playbook-reference/kubernetes-examples/playbook-track-secrets.html": "/master/playbook-reference/kubernetes-examples/kubernetes-change-notifications.html",
"playbook-reference/kubernetes-examples/track-kubernetes-changes.html": "/master/playbook-reference/kubernetes-examples/kubernetes-change-notifications.html",
"tutorials/alert-remediation.html": "/master/playbook-reference/prometheus-examples/alert-remediation.html",
"tutorials/alert-custom-enrichment.html": "/master/playbook-reference/prometheus-examples/alert-custom-enrichment.html",
"catalog/sinks/slack.html": "/master/configuration/sinks/slack.html",
Expand Down Expand Up @@ -201,7 +206,16 @@
"user-guide/robusta-cli.html": "/master/setup-robusta/installation/index.html",
"advanced/index.html": "/master/setup-robusta/installation/index.html",
"configuration/exporting/exporting-data.html": "/master/configuration/exporting/send-alerts-api.html",
"configuration/alertmanager-integration/troubleshooting-alertmanager.html": "/master/configuration/exporting/send-alerts-api.html"
"configuration/alertmanager-integration/troubleshooting-alertmanager.html": "/master/configuration/exporting/send-alerts-api.html",
"configuration/alertmanager-integration/grafana-alert-manager.html": "/master/configuration/alertmanager-integration/grafana-self-hosted.html",
"configuration/alertmanager-integration/grafana-cloud-mimir.html": "/master/configuration/alertmanager-integration/grafana-cloud.html",
"playbook-reference/what-are-playbooks.html": "/master/playbook-reference/overview.html",
"how-it-works/alert-builtin-enrichment.html": "/master/playbook-reference/builtin-alert-enrichment.html",
"setup-robusta/installation/extend-prometheus-installation.html": "/master/setup-robusta/installation/standalone-installation.html",
"playbook-reference/defining-playbooks/index.html": "/master/playbook-reference/index.html",
"configuration/alertmanager-integration/alert-custom-prometheus.html": "/master/configuration/alertmanager-integration/embedded-prometheus.html#creating-custom-prometheus-alerts",
"configuration/alertmanager-integration/index.html": "/master/configuration/index.html",
"notification-routing/notification-routing-examples.html": "/master/notification-routing/index.html"
}


Expand Down

This file was deleted.

6 changes: 3 additions & 3 deletions docs/configuration/alertmanager-integration/alert-manager.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
In-cluster AlertManager Integration
****************************************
AlertManager - in-cluster
**************************

This guide shows how to send alerts from an existing AlertManager to Robusta in the same cluster.

If your AlertManager is in a different cluster, refer to :ref:`External Prometheus`.
If your AlertManager is in a different cluster, refer to :doc:`AlertManager - external <outofcluster-prometheus>`.

Send Alerts to Robusta
============================
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
Customize Labels and Priorities
=================================

Relabel Prometheus Alerts
--------------------------

When sending Prometheus alerts to Robusta, alerts are mapped onto related Kubernetes resources, when possible. This mapping relies on the alerts having the following labels:

+---------------------------+-------------------------------------------+
| Kubernetes Resource | Alert Labels |
+===========================+===========================================+
| Deployment | deployment, namespace |
+---------------------------+-------------------------------------------+
| DaemonSet | daemonset, namespace |
+---------------------------+-------------------------------------------+
| StatefulSet | statefulset, namespace |
+---------------------------+-------------------------------------------+
| Job | job_name, namespace |
+---------------------------+-------------------------------------------+
| Pod | pod, namespace |
+---------------------------+-------------------------------------------+
| HorizontalPodAutoscaler | horizontalpodautoscaler, namespace |
+---------------------------+-------------------------------------------+
| Node | node or instance (fallback if node |
| | doesn't exist) |
+---------------------------+-------------------------------------------+

If your alerts have different labels, you can change the mapping with the ``alertRelabel`` helm value.

A relabeling has 3 attributes:

* ``source``: The label's name on your alerts (which differs from the expected value in the above table)
* ``target``: The standard label name that Robusta expects (a value from the table above)
* ``operation``: Either ``add`` (default) or ``replace``. If ``add``, your custom mapping will be recognized in addition to Robusta's default mapping.

For example:

.. code-block:: yaml

alertRelabel:
- source: "pod_name"
target: "pod"
operation: "add"
- source: "deployment_name"
target: "deployment"
operation: "replace"
- source: "job_name"
target: "job"

Mapping Custom Alert Severity
------------------------------

To help you prioritize alerts from different sources, Robusta maps alert severity to four standard levels:

* **HIGH** - requires your immediate attention - may indicate a service outage
* **LOW** - minor problems and areas for improvement (e.g. performance) - to be reviewed periodically on a weekly or bi-weekly cadence
* **INFO** - you probably want to be aware of these, but do not necessarily need to take action
* **DEBUG** - debug only - can be ignored unless you're actively debugging an issue

You are free to interpret these levels differently, but the above is a good starting point for most companies.

Prometheus alerts are normalized to the above levels as follows:

+----------------------+--------------------+
| Prometheus Severity | Robusta Severity |
+======================+====================+
| critical | HIGH |
+----------------------+--------------------+
| high | HIGH |
+----------------------+--------------------+
| medium | HIGH |
+----------------------+--------------------+
| error | HIGH |
+----------------------+--------------------+
| warning | LOW |
+----------------------+--------------------+
| low | LOW |
+----------------------+--------------------+
| info | INFO |
+----------------------+--------------------+
| debug | DEBUG |
+----------------------+--------------------+

Prometheus alerts with a severity not in the above list are mapped to Robusta's INFO level.

You can map your own Prometheus severities, using the ``custom_severity_map`` Helm value. For example:

.. code-block:: yaml

globalConfig:
custom_severity_map:
# maps a p1 value on your own alerts to Robusta's HIGH value
p1: high
# maps a p2 value on your own alerts to Robusta's LOW value
p2: low

The mapped values must be one of: ``high``, ``low``, ``info``, and ``debug``.
4 changes: 2 additions & 2 deletions docs/configuration/alertmanager-integration/dynatrace.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,13 +35,13 @@ Step 2: Create a Dynatrace Problems Webhook

5. Set the **Custom payload** to the Dynatrace macro:

.. code-block:: json
.. code-block:: text

{ProblemDetailsJSONv2}

6. Add the following **HTTP headers**:

.. code-block:: http
.. code-block:: text

Authorization: Bearer <api-key>
account-id: <account_id>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Since AWS Managed Prometheus doesn't have a built-in AlertManager, you'll need t

1. Set up Amazon Managed Grafana with your AMP workspace
2. Configure Grafana alerts to send to Robusta
3. See :doc:`grafana-alert-manager` for detailed Grafana alerting setup
3. See :doc:`grafana-self-hosted` for detailed Grafana alerting setup

Configure Metric Querying
=========================
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,76 @@ To allow the Grafana dashboard to persist after the Grafana instance restarts, y

Apply the change by performing a :ref:`Helm Upgrade <Simple Upgrade>`.

Creating Custom Prometheus Alerts
----------------------------------

Prometheus Alerts are defined on Kubernetes using the PrometheusRule CRD.

Prerequisites
^^^^^^^^^^^^^

Enable global rule selection for the Prometheus operator. Add the following config to your ``generated_values.yaml``. (By default Prometheus Operator picks up only certain new alerts, here we tell it to pick up all new alerts)

.. code-block:: yaml

kube-prometheus-stack:
prometheus:
prometheusSpec:
ruleNamespaceSelector: {} # (1)
ruleSelector: {} # (2)
ruleSelectorNilUsesHelmValues: false # (3)

.. code-annotations::
1. Add a namespace if you want Prometheus to identify rules created in specific namespaces. Leave ``{}`` to detect rules from any namespace.
2. Add a label if you want Prometheus to detect rules with a specific selector. Leave ``{}`` to detect rules with any label.
3. When set to `false`, Prometheus detects rules that are created directly, not just rules created using helm values file.

Defining an Alert
^^^^^^^^^^^^^^^^^

As an example, we'll define an alert to find Pods with CPU usage over their request.

Save the following YAML into ``my_alert.yaml`` and run ``kubectl apply -f my_alert.yaml``

.. code-block:: yaml

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: container-cpu-alert
labels:
prometheus: kube-prometheus
role: alert-rules
spec:
groups:
- name: container-cpu-usage
rules:
- alert: KubeContainerCPURequestAlert
expr: |
(rate(container_cpu_usage_seconds_total{container="stress"}[5m]) /
on (container) kube_pod_container_resource_requests{resource="cpu", container="stress"}) > 0.75
for: 1m
labels:
severity: warning
annotations:
summary: "Container CPU usage is above 75% of request for 5 minutes"
description: "The container is using more than 75% of its requested CPU for 5 minutes."

Testing the Alert
^^^^^^^^^^^^^^^^^

To test the alert, deploy a pod that uses more CPU than its request.

.. code-block:: bash

kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/cpu_throttling/throttling.yaml

You will know the alert was defined successfully when Prometheus fires an alert and you receive a notification in all configured sinks.

.. image:: /images/container_cpu_request_alert.png
:width: 600
:align: center

Troubleshooting
---------------------

Expand Down
Loading
Loading