Docs: reorganize and rewrite content#1894
Conversation
- Remove sinks/destinations from integrations overview - Restructure page to highlight "Forward Alerts to Robusta" - Add dedicated section for Prometheus, Nagios, and SolarWinds - Simplify description to focus on alert forwarding and AI analysis 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Move sinks reference from Integrations nav to Notifications & Routing nav - Place sinks as first item in Notifications & Routing section - Remove sinks from Integrations section to reduce confusion - Keep logical grouping of alert destinations with notification routing 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Redesign integrations overview to be truly high-level with categories - Rename "Integrating with Prometheus" to "Alert Sources" for broader scope - Remove duplicate content between overview and subpages - Focus overview on integration categories rather than detailed setup - Update navigation labels to match new page structure - Make user journey clearer: overview → specific alert source setup 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
… Automation - Rename "Integrations" section to "Alert Sources" - much clearer purpose - Create dedicated "AI Analysis" section for Holmes GPT only - Move KRR, Popeye, and data export to "Automation" section - Update Alert Sources overview to focus purely on monitoring integrations - Remove mixed content categories that confused users - Improve logical flow: Alert Sources → AI Analysis → Notifications → Automation 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Create new notifications overview page as entry point - Reorder navigation: Overview → Configuring Sinks → Sink Reference - Fix information hierarchy so users start with concepts, not specific sinks - Prevent sphinx menu expansion pushing basic setup pages down - Add clear getting started flow and common workflows - Update sink reference title to match new structure 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Move Nagios and SolarWinds from Prometheus subpages to top-level Alert Sources - Remove confusing navigation hierarchy where non-Prometheus systems appeared under Prometheus - Clean up Prometheus page to focus purely on Prometheus/AlertManager integrations - Remove "Other Alerting Systems" section since they're now top-level - Create clearer navigation: Overview → Prometheus → Nagios → SolarWinds 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Remove verbose explanations and get straight to the point - Lead with clear action: "Choose your monitoring system" - Make card descriptions more direct and useful - Add practical note about webhook compatibility for other systems - Cut word count by ~60% while improving clarity 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Change 'Legacy systems' to 'Nagios monitoring' - Change 'Enterprise monitoring' to 'SolarWinds monitoring' - Keep descriptions neutral and product-focused - Avoid terms that might make users feel bad about their tech choices 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Link 'HTTP webhooks' to webhook triggers documentation - Provides users with concrete next steps if their system isn't listed - Makes the fallback option actionable rather than just informational 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Link HTTP webhooks to exporting-data.rst which contains POST /api/alerts endpoint - This is the correct endpoint for sending alerts from any system via HTTP - Previous link was to webhook triggers which is for different use case 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Move exporting-data.rst from Automation to Alert Sources as "Custom Webhooks" - This contains the POST /api/alerts endpoint for sending alerts TO Robusta - Makes it discoverable for users wanting to integrate custom monitoring systems - Logical placement with other alert source integrations 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Remove navigation.tabs and navigation.tabs.sticky features - Add navigation.sections and navigation.expand for better sidebar - This provides more room for additional top-level sections - Makes navigation more scalable for future section additions 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
docs/configuration/alertmanager-integration/google-managed-prometheus.rst (2)
62-66: Fix code block language and missing closing quoteThe “Verify it Works” command is a shell command but marked as YAML, and it’s missing a closing quote. This will render incorrectly and mislead users.
-.. code-block:: yaml +.. code-block:: bash - robusta demo-alert --alertmanager-url='http://alertmanager.gmp-system.svc.cluster.local:9093 + robusta demo-alert --alertmanager-url='http://alertmanager.gmp-system.svc.cluster.local:9093'
53-56: Fix namespace mismatch: use the same namespace for both commandsThe secret is created in
gmp-public, but the demo-alert URL points togmp-system. This inconsistency will confuse users and likely cause failures. Please standardize the namespace across both snippets:• docs/configuration/alertmanager-integration/google-managed-prometheus.rst:
– Line 54:kubectl create secret generic alertmanager -n gmp-public …
– Line 64:robusta demo-alert --alertmanager-url='http://alertmanager.gmp-system.svc.cluster.local:9093'Options (pick one):
If AlertManager runs in gmp-system, update the secret command:
- kubectl create secret generic alertmanager \ - -n gmp-public \ + kubectl create secret generic alertmanager \ + -n gmp-system \ --from-file=alertmanager.yamlIf gmp-public is correct, update the demo-alert URL:
- robusta demo-alert --alertmanager-url='http://alertmanager.gmp-system.svc.cluster.local:9093' + robusta demo-alert --alertmanager-url='http://alertmanager.gmp-public.svc.cluster.local:9093'
♻️ Duplicate comments (4)
docs/configuration/metric-providers-victoria.rst (4)
48-51: Clarify prometheus_auth is sent verbatim as the Authorization header (duplicate of prior feedback).This avoids ambiguity about including the "Bearer " prefix or using Basic auth.
globalConfig: prometheus_url: "http://vmauth-victoria-metrics.default.svc.cluster.local:8427" - prometheus_auth: "Bearer YOUR_TOKEN" + # Will be sent as the Authorization header value (e.g., "Bearer <token>" or "Basic <base64(user:pass)>") + prometheus_auth: "Bearer YOUR_TOKEN"
59-64: Expand service discovery to include vmauth and vmalertmanager; add common namespace (duplicate of prior feedback).Covers typical VM deployments and improves discoverability.
# Find VictoriaMetrics services - kubectl get svc -A | grep -E "vmsingle|vmselect|vmalert" + kubectl get svc -A | grep -E "vmsingle|vmselect|vmauth|vmalert(manager)?" - # Check specific namespace - kubectl get svc -n victoria-metrics-system + # Check common namespaces + # Operator-based installs typically use: + kubectl get svc -n victoria-metrics-system + # Helm chart installs often default to: + kubectl get svc -n victoria-metrics
106-117: Document default and startup impact for check_prometheus_flags (duplicate of prior feedback).Make it explicit that this is read from globalConfig, defaults to true, and that disabling it reduces early validation by skipping /api/v1/status/flags at startup.
.. code-block:: yaml globalConfig: prometheus_url: "http://vmsingle-victoria-metrics.default.svc.cluster.local:8429" check_prometheus_flags: false - Some VictoriaMetrics configurations may not fully implement the Prometheus flags API, which can cause connection issues during Robusta initialization. + Some VictoriaMetrics configurations may not fully implement the Prometheus flags API, which can cause connection issues during Robusta initialization. + Note: check_prometheus_flags is read from globalConfig (Helm value) and defaults to true. Setting it to false disables the Prometheus flags API check during Robusta startup (skips calls to /api/v1/status/flags), which reduces early validation of the Prometheus endpoint.
127-129: Make kubectl run example portable across kubectl versions (duplicate of prior feedback).Adding --restart=Never and quoting the URL increases portability and avoids shell parsing issues.
- kubectl run test-curl --image=curlimages/curl --rm -it -- \ - curl -v http://vmsingle-victoria-metrics.default.svc.cluster.local:8429/api/v1/query?query=up + kubectl run -it --rm --restart=Never vm-curl --image=curlimages/curl -- \ + curl -v "http://vmsingle-victoria-metrics.default.svc.cluster.local:8429/api/v1/query?query=up"
🧹 Nitpick comments (10)
docs/configuration/exporting/namespace-resources-api.rst (5)
6-8: Replace vague “may be outdated” warning with a concrete, dated stability note (or remove).As written, this undermines trust in the page. Either pin the docs to a specific “last updated” date and link to a changelog, or remove the warning.
Apply this diff:
- Due to updates in the Namespace Resources API, these instructions may be outdated. - Please contact our team for support on Slack (https://bit.ly/robusta-slack) or by email (support@robusta.dev). - We're working on updating the documentation. + This API is in active development. The information below reflects the latest schema as of 2025-08-20. + For breaking changes, see the Robusta Release Notes. If you encounter issues, contact us on Slack (https://bit.ly/robusta-slack) or by email (support@robusta.dev).
74-87: Avoid embedding secrets in examples; prefer env var for the API key.Even though this uses a placeholder, examples often get copy-pasted with real keys. Use an environment variable in the Authorization header.
Apply this diff:
- --header 'Authorization: Bearer API-KEY-HERE' \ + --header 'Authorization: Bearer '"$ROBUSTA_API_KEY" \Add this snippet near the example to guide users:
# Export your API key securely before running the curl command export ROBUSTA_API_KEY='YOUR-API-KEY'Note: Static analysis flagged the current pattern as a potential secret leak (curl-auth-header).
91-95: Permissions naming and source-of-truth check.Confirm the exact permission name/path in the UI (e.g., “Clusters → Read” vs “Clusters: Read”) and whether any extra scope is required for namespace resource counts. Update the text to match the UI precisely.
96-146: Tighten response documentation: enumerate possible error codes and status codes.Current guidance says “varies depending on the error.” If there’s a defined set (e.g., 400 validation errors with specific
error_codes like MISSING_FIELD, INVALID_RESOURCE, 403 FORBIDDEN for missing permission, 404 for unknown cluster/namespace, 500 for internal), list them for integrators.I can help draft a small table of status codes and
error_codevalues if you share the canonical list.
31-57: Clarify validation semantics for the request schema.
- Is
resourcesallowed to be empty? If not, specify “non-empty array”.- How are duplicates handled? First-wins/last-wins/aggregated?
- Input size limits (number of resources) and rate limits?
Documenting these reduces integration friction.
docs/configuration/metric-providers-aws.rst (4)
25-31: Optional: anticipate session-token in the secretIf users rely on temporary credentials (STS), having an optional session-token makes the flow complete.
- kubectl create secret generic aws-secret-key -n robusta \ - --from-literal=access-key=YOUR_ACCESS_KEY \ - --from-literal=secret-key=YOUR_SECRET_ACCESS_KEY + kubectl create secret generic aws-secret-key -n robusta \ + --from-literal=access-key=YOUR_ACCESS_KEY_ID \ + --from-literal=secret-key=YOUR_SECRET_ACCESS_KEY \ + --from-literal=session-token=YOUR_SESSION_TOKEN # optional
71-79: Align the “Required Environment Variables” list with standard namesReflect the standard AWS env var names and note the optional session token.
-**Required Environment Variables**: +**Required Environment Variables** (standard AWS): -- ``PROMETHEUS_SSL_ENABLED``: Always ``"true"`` for AMP -- ``AWS_SERVICE_NAME``: Always ``"aps"`` for Amazon Prometheus Service -- ``AWS_REGION``: The AWS region where your workspace is located +- ``PROMETHEUS_SSL_ENABLED``: Always ``"true"`` for AMP +- ``AWS_ACCESS_KEY_ID`` and ``AWS_SECRET_ACCESS_KEY`` (or use IRSA) +- ``AWS_SESSION_TOKEN``: Optional if using temporary credentials +- ``AWS_SERVICE_NAME``: Always ``"aps"`` for Amazon Prometheus Service +- ``AWS_REGION``: The AWS region where your workspace is located
86-108: Tighten IAM policy Resource scopeCurrent Resource uses a wildcard across all workspaces. Prefer least-privilege by scoping to the specific workspace ARN.
- "Resource": "arn:aws:aps:*:*:workspace/*" + "Resource": "arn:aws:aps:us-east-1:123456789012:workspace/ws-12345678"If you want to keep it generic in docs, add a comment indicating users should replace region, account ID, and workspace ID.
132-135: Clarify alerting path via Amazon Managed Grafana“AlerManager URL is not needed (AWS handles alerting separately)” can be more concrete. AMP alerting is typically configured via Amazon Managed Grafana. Cross-link the alert-forwarding guide to reduce ambiguity.
-- AlertManager URL is not needed (AWS handles alerting separately) +- AMP does not expose a built-in AlertManager. Configure alerting via Amazon Managed Grafana + and forward alerts to Robusta. See :doc:`/configuration/alertmanager-integration/eks-managed-prometheus` + and :doc:`/configuration/alertmanager-integration/grafana-alert-manager`.docs/configuration/alertmanager-integration/eks-managed-prometheus.rst (1)
10-13: Good separation: alerts via Grafana, metrics via provider docClear guidance that AMP alerting flows through Amazon Managed Grafana, with metric querying deferred to the AWS metric provider docs. Consider adding an inline link to AMG docs for completeness.
-AWS Managed Prometheus uses Amazon Managed Grafana for alerting. To send alerts to Robusta, configure your Grafana instance to forward alerts. +AWS Managed Prometheus uses Amazon Managed Grafana for alerting. To send alerts to Robusta, configure your Grafana instance to forward alerts. +See the `Amazon Managed Grafana documentation <https://docs.aws.amazon.com/grafana/latest/userguide/what-is-Amazon-Managed-Grafana.html>`_ for setup details.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (6)
docs/configuration/alertmanager-integration/eks-managed-prometheus.rst(1 hunks)docs/configuration/alertmanager-integration/google-managed-prometheus.rst(2 hunks)docs/configuration/exporting/namespace-resources-api.rst(1 hunks)docs/configuration/metric-providers-aws.rst(1 hunks)docs/configuration/metric-providers-google.rst(1 hunks)docs/configuration/metric-providers-victoria.rst(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- docs/configuration/metric-providers-google.rst
🧰 Additional context used
🪛 Gitleaks (8.27.2)
docs/configuration/exporting/namespace-resources-api.rst
74-75: Discovered a potential authorization token provided in a curl command header, which could compromise the curl accessed resource.
(curl-auth-header)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: run_tests
- GitHub Check: Deploy docs
🔇 Additional comments (6)
docs/configuration/metric-providers-victoria.rst (1)
23-23: Verified Sphinx reference “Simple Upgrade” existsI confirmed that
.. _Simple Upgrade:is defined in docs/setup-robusta/upgrade.rst (line 18), so the cross-reference in metric-providers-victoria.rst is valid. No changes needed.docs/configuration/exporting/namespace-resources-api.rst (2)
25-27: Cross-reference target existsThe anchor
cb-robusta-ui-sink-namespace-configis defined indocs/configuration/sinks/RobustaUI.rst(at line 64), so the reference innamespace-resources-api.rstis valid—no changes needed.
63-66: Fix resource schema examples: use singular Kubernetes Kind and proper reST code markupThe snippet in docs/configuration/exporting/namespace-resources-api.rst currently shows plural Kinds and uses single backticks. Please update it to:
• use singular Kubernetes Kind names (e.g.
Deployment)
• wrap all inline literals in double backticks
• explicitly render the core API group as an empty‐string literal:""Apply this diff in docs/configuration/exporting/namespace-resources-api.rst (lines 63–66):
-* ``kind`` (e.g., `Deployments`) -* ``apiGroup`` (e.g., `apps`, or empty string for core group) -* ``apiVersion`` (e.g., `v1`, `v2`) +* ``kind`` (e.g., ``Deployment``) +* ``apiGroup`` (e.g., ``apps``, or empty string ``""`` for the core group) +* ``apiVersion`` (e.g., ``v1``, ``v2``)Additionally, please confirm whether the API accepts a core‐group value of an empty string (
""),null, or both, and document accordingly.docs/configuration/alertmanager-integration/google-managed-prometheus.rst (1)
71-71: LGTM: linking metric-querying to the provider docPointing users to the dedicated metric-provider page for querying is the right split of concerns.
docs/configuration/alertmanager-integration/eks-managed-prometheus.rst (2)
26-26: LGTM: deferring metric-querying to the AWS metric provider docThis keeps alert-forwarding docs focused and avoids duplication.
17-22: Grafana alert-manager cross-link validated
Verified thatdocs/configuration/alertmanager-integration/grafana-alert-manager.rstexists and is listed in the toctree ofdocs/configuration/alertmanager-integration/index.rst. No further action needed.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
docs/configuration/exporting/namespace-resources-api.rst (1)
57-60: Use singular Kubernetes Kind values across the doc (Deployment, Service, Ingress, CronJob).Kubernetes Kind is singular. Using plurals is misleading and can cause request validation issues. This appears both in the resource schema example, the request payload, and the response example.
Apply these diffs:
- * ``kind`` (e.g., `Deployments`) + * ``kind`` (e.g., ``Deployment``)- {"kind": "Deployments", "apiGroup": "apps", "apiVersion": "v1"}, - {"kind": "Services", "apiGroup": "", "apiVersion": "v1"}, - {"kind": "Ingresses", "apiGroup": "networking.k8s.io", "apiVersion": "v1"}, - {"kind": "CronJobs", "apiGroup": "batch", "apiVersion": "v1"} + {"kind": "Deployment", "apiGroup": "apps", "apiVersion": "v1"}, + {"kind": "Service", "apiGroup": "", "apiVersion": "v1"}, + {"kind": "Ingress", "apiGroup": "networking.k8s.io", "apiVersion": "v1"}, + {"kind": "CronJob", "apiGroup": "batch", "apiVersion": "v1"}- "kind": "Deployments" + "kind": "Deployment" @@ - "kind": "Services" + "kind": "Service" @@ - "kind": "Ingresses" + "kind": "Ingress"Also applies to: 76-80, 104-121
🧹 Nitpick comments (4)
docs/configuration/exporting/namespace-resources-api.rst (4)
58-60: Clarify core group handling for apiGroup.For core resources (e.g., Service, Pod, ConfigMap), confirm whether the API requires an explicit empty string for apiGroup or allows omitting the field. Document the requirement to prevent client-side confusion.
Proposed addition:
* ``apiGroup`` (e.g., `apps`, or empty string for core group) + +.. note:: + For core Kubernetes resources (no API group), set ``apiGroup`` to an empty string ``""`` as shown in the examples. Do not omit the field unless the API supports it.
68-71: Avoid secret scanner false positives; improve placeholder clarity.Use a conventional placeholder and add a short caution to prevent accidental key exposure in copy-pastes.
- --header 'Authorization: Bearer API-KEY-HERE' \ + --header 'Authorization: Bearer <API-KEY>' \-- ``API-KEY-HERE`` with your API Key from **Settings → API Keys → New API Key**. +- ``<API-KEY>`` with your API Key from **Settings → API Keys → New API Key**. Make sure the key has **Clusters → Read** permissions to access namespace resource data.Optional addition:
+.. warning:: + Never paste real API keys into code snippets committed to the repo or shared publicly.Also applies to: 83-86
7-10: Define what “active count” means.“Active” can be interpreted differently per Kind (e.g., CronJobs vs. Jobs vs. Pods). Add one sentence to avoid ambiguity.
-Use this endpoint to retrieve an **active count of specific Kubernetes resources** within a namespace. This is the same data displayed in the **Namespaces** tab of the Robusta UI. +Use this endpoint to retrieve an **active count of specific Kubernetes resources** within a namespace. This is the same data displayed in the **Namespaces** tab of the Robusta UI. + +“Active” refers to currently existing objects in the cluster (not deleted/terminating). It does not include historical objects. If there are Kind-specific nuances, they are documented below.
126-140: Include 401 Unauthorized in error examples (if applicable).Most auth-protected APIs return 401 when the Authorization header is missing/invalid and 403 when valid but forbidden. If that’s the case here, reflect it in the examples.
-- **Status Code**: Varies depending on the error (e.g., `400`, `403`, `500`) +- **Status Code**: Varies depending on the error (e.g., `400`, `401`, `403`, `500`)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
docs/configuration/exporting/namespace-resources-api.rst(1 hunks)
🧰 Additional context used
🪛 Gitleaks (8.27.2)
docs/configuration/exporting/namespace-resources-api.rst
68-69: Discovered a potential authorization token provided in a curl command header, which could compromise the curl accessed resource.
(curl-auth-header)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: run_tests
- GitHub Check: Deploy docs
🔇 Additional comments (2)
docs/configuration/exporting/namespace-resources-api.rst (2)
22-51: Request schema section is clear and actionable.Well-structured list-table, good field descriptions, and explicit required flags.
19-21: Cross‐reference Verified
The labelcb-robusta-ui-sink-namespace-configis defined in docs/configuration/sinks/RobustaUI.rst (line 64), so the:ref:cb-robusta-ui-sink-namespace-config`` link resolves correctly. No changes required.
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
docs/configuration/alertmanager-integration/victoria-metrics.rst (3)
11-12: Use the correct brand: “VictoriaMetrics Alertmanager”Brand should be “VictoriaMetrics” (one word). This is user-facing text, so we should fix it.
-Add the following to your Victoria Metrics Alertmanager configuration (e.g., Helm values file or VMAlertmanagerConfig CRD): +Add the following to your VictoriaMetrics Alertmanager configuration (e.g., Helm values file or VMAlertmanagerConfig CRD):
15-30: YAML indentation is invalid; webhook and route fields won’t parseThe current snippet mis-indents webhook_configs and the route list item fields. Alertmanager/VictoriaMetrics Alertmanager will reject this.
Apply this corrected YAML layout:
- receivers: - - name: 'robusta' - webhook_configs: - - url: 'http://<ROBUSTA-HELM-RELEASE-NAME>-runner.<NAMESPACE>.svc.cluster.local/api/alerts' - send_resolved: true # (3) - - route: # (1) - routes: - - receiver: 'robusta' - group_by: [ '...' ] - group_wait: 1s - group_interval: 1s - matchers: - - severity =~ ".*" - repeat_interval: 4h - continue: true # (2) + receivers: + - name: 'robusta' + webhook_configs: + - url: 'http://<ROBUSTA-HELM-RELEASE-NAME>-runner.<NAMESPACE>.svc.cluster.local/api/alerts' + send_resolved: true # (3) + + route: # (1) + routes: + - receiver: 'robusta' + group_by: ['...'] + group_wait: 1s + group_interval: 1s + matchers: + - severity =~ ".*" + repeat_interval: 4h + continue: true # (2)
32-36: Grammar fix in code-annotationsSubject-verb agreement: “routes have …”.
- 1. Put Robusta's route as the first route, to guarantee it receives alerts. If you can't do so, you must guarantee all previous routes has ``continue: true`` set. + 1. Put Robusta's route as the first route, to guarantee it receives alerts. If you can't do so, you must guarantee all previous routes have ``continue: true`` set.
🧹 Nitpick comments (3)
docs/configuration/alertmanager-integration/victoria-metrics.rst (3)
1-2: Heading underline length nitpickThe underline is longer than the title. reST allows it, but most of this repo’s docs keep it equal for consistency.
Apply this small tweak if you want to align with common style:
-VictoriaMetrics Alerts -*********************** +VictoriaMetrics Alerts +**********************
4-4: Clarify direction: “from … to Robusta”“with Robusta” is ambiguous. “to Robusta” better reflects the alert flow.
-This guide shows how to send alerts from `VictoriaMetrics <https://victoriametrics.com/>`_ with Robusta. +This guide shows how to send alerts from `VictoriaMetrics <https://victoriametrics.com/>`_ to Robusta.
6-7: Eliminate duplicate cross-reference to VictoriaMetricsThe link to
/configuration/metric-providers-victoriaappears both near the top (line 6) and again under “Configure Metrics Querying” (lines 39–43). Since the brief note at the top suffices in this alert-forwarding guide, remove the bottom section to avoid repetition.• File: docs/configuration/alertmanager-integration/victoria-metrics.rst
Remove lines 39–43:- Configure Metrics Querying - ==================================== - - To enable Robusta to query metrics from VictoriaMetrics, see :doc:`/configuration/metric-providers-victoria`.If you’d rather keep the detailed section instead, drop line 6:
- For configuring metric querying from VictoriaMetrics, see :doc:`/configuration/metric-providers-victoria`. + .. (cross-reference moved to a dedicated section below)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
docs/configuration/alertmanager-integration/victoria-metrics.rst(2 hunks)docs/configuration/metric-providers-aws.rst(1 hunks)docs/configuration/metric-providers-victoria.rst(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- docs/configuration/metric-providers-victoria.rst
- docs/configuration/metric-providers-aws.rst
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: run_tests
- GitHub Check: Deploy docs
🔇 Additional comments (1)
docs/configuration/alertmanager-integration/victoria-metrics.rst (1)
38-38: Include path validatedThe file
docs/configuration/alertmanager-integration/_testing_integration.rstexists at the specified relative path, so the.. include:: ./_testing_integration.rstdirective is correct. No changes needed.
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
docs/configuration/alertmanager-integration/victoria-metrics.rst (2)
15-20: YAML indentation for webhook_configs is invalid
webhook_configsmust be a key inside the receiver list item, andsend_resolvedmust be nested under the webhook entry. Current indentation will not parse.- receivers: - - name: 'robusta' - webhook_configs: - - url: 'http://<ROBUSTA-HELM-RELEASE-NAME>-runner.<NAMESPACE>.svc.cluster.local/api/alerts' - send_resolved: true # (3) + receivers: + - name: 'robusta' + webhook_configs: + - url: 'http://<ROBUSTA-HELM-RELEASE-NAME>-runner.<NAMESPACE>.svc.cluster.local/api/alerts' + send_resolved: true # (3)Optional: if using Robusta Cloud/SaaS, consider documenting the cloud webhook URL variant as well.
21-30: Fix route block indentation and matcher formattingList item properties under the
- receiver:entry are over-indented. Also, use a consistent matcher string (no spaces around operator) to match common Alertmanager examples.- route: # (1) - routes: - - receiver: 'robusta' - group_by: [ '...' ] - group_wait: 1s - group_interval: 1s - matchers: - - severity =~ ".*" - repeat_interval: 4h - continue: true # (2) + route: # (1) + routes: + - receiver: 'robusta' + group_by: ['...'] + group_wait: 1s + group_interval: 1s + matchers: + - severity=~".*" + repeat_interval: 4h + continue: true # (2)Nit: 1s group_wait/group_interval is very aggressive and can increase notification noise. Consider more typical values (e.g., group_wait: 30s, group_interval: 5m) in examples.
♻️ Duplicate comments (1)
docs/configuration/exporting/robusta-pro-features.rst (1)
9-16: Fix broken HolmesGPT doc link targetThe link points to ../holmesgpt/index which doesn’t exist in this PR. Update to an existing target (main-features). This avoids a broken internal reference in Sphinx.
-:doc:`AI Analysis (HolmesGPT) <../holmesgpt/index>` +:doc:`AI Analysis (HolmesGPT) <../holmesgpt/main-features>`
🧹 Nitpick comments (3)
docs/configuration/alertmanager-integration/victoria-metrics.rst (2)
4-4: Fix preposition: “with Robusta” → “to Robusta”This reads like Robusta is co-sending the alerts. “to Robusta” is clearer.
-This guide shows how to send alerts from `VictoriaMetrics <https://victoriametrics.com/>`_ with Robusta. +This guide shows how to send alerts from `VictoriaMetrics <https://victoriametrics.com/>`_ to Robusta.
32-35: Grammar nit in annotation (plural agreement)“previous routes has” → “preceding routes have”.
- 1. Put Robusta's route as the first route, to guarantee it receives alerts. If you can't do so, you must guarantee all previous routes has ``continue: true`` set. + 1. Put Robusta's route as the first route, to guarantee it receives alerts. If you can't do so, you must guarantee all preceding routes have ``continue: true`` set.docs/configuration/exporting/robusta-pro-features.rst (1)
1-2: Use a more descriptive page title to match the file’s purpose and slug“Overview” is too generic for a page named robusta-pro-features.rst and used in navigation. Prefer a title that reflects the content (e.g., “Robusta Pro Features” or “Robusta Pro Overview”) to improve clarity and TOC/SEO.
Apply one of the following diffs:
Option A (most explicit):
-Overview -======== +Robusta Pro Features +====================Option B:
-Overview -======== +Robusta Pro Overview +====================
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (6)
docs/configuration/alertmanager-integration/_prometheus_flags_check.rst(1 hunks)docs/configuration/alertmanager-integration/victoria-metrics.rst(2 hunks)docs/configuration/exporting/robusta-pro-features.rst(2 hunks)docs/configuration/metric-providers-coralogix.rst(1 hunks)docs/configuration/metric-providers-external.rst(1 hunks)docs/configuration/metric-providers-victoria.rst(1 hunks)
✅ Files skipped from review due to trivial changes (2)
- docs/configuration/alertmanager-integration/_prometheus_flags_check.rst
- docs/configuration/metric-providers-external.rst
🚧 Files skipped from review as they are similar to previous changes (2)
- docs/configuration/metric-providers-coralogix.rst
- docs/configuration/metric-providers-victoria.rst
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: run_tests
- GitHub Check: Deploy docs
🔇 Additional comments (6)
docs/configuration/alertmanager-integration/victoria-metrics.rst (5)
1-2: Heading level and style look goodTop-level title and underline are consistent with RST conventions.
9-9: Sub-heading style is consistentGood use of a secondary heading underline for the section.
41-41: Section heading looks goodKeeping the metrics querying content minimal here and linking out is the right move.
43-43: Cross-reference phrasing is fineThe link text is clear and matches the new docs split.
6-6: Cross-reference target verified and will build correctly
- Found docs/configuration/metric-providers-victoria.rst in the repository.
- Confirmed it’s included in the toctree in docs/index.rst (
VictoriaMetrics <configuration/metric-providers-victoria>).The :doc:
/configuration/metric-providers-victorialink will resolve properly in the Sphinx build.docs/configuration/exporting/robusta-pro-features.rst (1)
41-44: Ensure API docs exist and remove redundant overview linksWe verified that
exporting-data.rstalready includes a toctree for the new API pages (send-alerts-api,configuration-changes-api,alert-export-api,alert-statistics-api), so those docs will be discoverable once present.Please:
Confirm that the following files are created under
docs/configuration/exporting/:
•alert-export-api.rst
•alert-statistics-api.rst
•send-alerts-api.rst
•configuration-changes-api.rstStreamline navigation in
robusta-pro-features.rstby choosing one of the two approaches:
- Keep the overview link to
exporting-data.rstand remove the four individual API bullets.- Retain the four API bullets and remove or retitle the duplicate
:doc:Alert History Import and Export API `
links (currently on lines 36 and 67).This will eliminate duplication and ensure a single, clear entry point to your exporting APIs.
* docs for krr api * wip * docs * Update docs/configuration/resource-recommender.rst Co-authored-by: moshemorad <moshemorad12340@gmail.com> --------- Co-authored-by: Arik Alon <alon.arik@gmail.com> Co-authored-by: moshemorad <moshemorad12340@gmail.com>
* feat: runner binding address configurable using env variable * chore: leave default bind address to the original value
* Add krr security context * Add to docs + helm * Fix comments
|
|
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
src/robusta/integrations/slack/sender.py (2)
370-372: Fix exception logging: star-unpacking into a set will crash for dict blocksThe f-string uses a set literal with a starred list:
{*output_blocks,}. Sinceoutput_blockscontains dicts, this raisesTypeError: unhashable type: 'dict'while handling an exception, masking the root cause.Apply this diff:
- logging.error( - f"error sending message to slack\ne={e}\ntext={message}\nchannel={channel}\nblocks={*output_blocks,}\nattachment_blocks={*attachment_blocks,}" - ) + logging.error( + f"error sending message to slack\ne={e}\ntext={message}\nchannel={channel}\n" + f"blocks={output_blocks}\nattachment_blocks={attachment_blocks}" + )
290-303: Return type inconsistency can break callers expecting a tuple
prepare_slack_textsometimes returns a string (whenmessageis empty) and elsewhere returns a(message, error_msg)tuple. Callers here always unpack a tuple, so the string path would raise a ValueError.Apply this diff to always return a tuple:
- if len(message) == 0: - return "empty-message" # blank messages aren't allowed + if len(message) == 0: + message = "empty-message" # blank messages aren't allowed message = Transformer.apply_length_limit(message, MAX_BLOCK_CHARS) if error_files: error_msg = ( "_Failed to send file(s) " + ", ".join(error_files) + " to slack._\n _See robusta-runner logs for details._" ) - return message, error_msg + return message, error_msg - return message, None + return message, Noneplaybooks/robusta_playbooks/krr.py (2)
5-5: ImportError: pickle has no symbol named NONEfrom pickle import NONE will raise an ImportError at module import time. Use the Python built-in None instead.
Apply this diff:
-from pickle import NONEAlso update call sites that pass NONE (see below).
447-447: Use None (built-in) instead of nonexistent pickle.NONEThis aligns with the fix to remove the invalid import.
- _publish_krr_finding(event=event,krr_json=params.result, scan_id=params.scan_id, start_time=params.start_time, metadata=metadata, timeout=None, config_json=NONE) + _publish_krr_finding(event=event, krr_json=params.result, scan_id=params.scan_id, start_time=params.start_time, metadata=metadata, timeout=None, config_json=None)
🧹 Nitpick comments (24)
docs/configuration/alertmanager-integration/grafana-cloud-mimir.rst (12)
53-55: Avoid hardcoding tokens in curl; prefer env vars and caution against leaksInline tokens are easy to leak via shell history, screenshots, or logs. Recommend using an environment variable in the example.
Apply this diff:
- curl -H "Authorization: Bearer YOUR_GLSA_TOKEN" \ + # Recommended: export your token first to avoid leaking secrets in shell history/logs + # export GLSA_TOKEN='<your token>' + curl -H "Authorization: Bearer ${GLSA_TOKEN}" \ "https://YOUR-INSTANCE.grafana.net/api/datasources" | jqOptionally add a short admonition below the code block reminding users not to commit tokens, redact them in screenshots, and rotate if exposed.
11-14: Clarify prerequisites and terminology (Alertmanager vs datasource; add jq)Grafana Cloud provides a managed Alertmanager; you don’t necessarily “configure an AlertManager datasource.” Also, the guide uses jq—call that out.
Apply this diff:
-* Prometheus and AlertManager datasources configured in Grafana Cloud +* A Prometheus datasource configured in Grafana Cloud +* Grafana Cloud Alerting (managed Alertmanager) enabled * Access to create service accounts and API tokens in Grafana Cloud * Your Robusta ``account_id`` and ``signing_key`` from ``generated_values.yaml`` +* ``jq`` installed locally (used in curl examples)
30-34: Specify minimum token permissions/role for service accountSpell out the least-privilege guidance so users don’t over-scope their token.
Proposed addition:
2. Create a new service account named "robusta-integration" 3. Generate a service account token 4. Save the token (starts with ``glsa_``) +5. Assign the minimal role needed: + - Viewer role is typically sufficient to query datasources via the Grafana proxy. + - If you plan to manage alerting configuration via API, an Editor role may be required. + - Ensure the token has permission to query the Prometheus datasource and read Alertmanager APIs.Please confirm the exact minimum role requirements for your Robusta/Grafana setup and adjust accordingly.
38-40: Fix minor grammar and spacing in the cluster-name sectionSmall readability polish.
Apply this diff:
-If your grafana setup covers multiple clusters, the cluster name is required and used to -identify your specific cluster in Prometheus queries: +If your Grafana setup covers multiple clusters, the cluster name is required and is used to +identify your specific cluster in Prometheus queries:
56-56: Tighten wording on datasource UIDDuplicated “UID” in the sentence.
Apply this diff:
-Note the UID for Prometheus datasource UID (typically ``grafanacloud-prom``) +Note the Prometheus datasource UID (typically ``grafanacloud-prom``)
77-83: Consistency: use “Alertmanager” (Grafana’s preferred casing) and clarify base URLConsistent casing improves clarity, and a short note helps users avoid appending /api paths manually.
Apply this diff:
-# Grafana Cloud AlertManager Configuration +# Grafana Cloud Alertmanager Configuration alertmanager_url: https://YOUR-INSTANCE.grafana.net alertmanager_flavor: grafana alertmanager_auth: Bearer YOUR_GLSA_TOKENConsider adding this note after the config block:
+.. note:: + + When using ``alertmanager_flavor: grafana``, set ``alertmanager_url`` to your Grafana base URL + (e.g., ``https://YOUR-INSTANCE.grafana.net``). Robusta will call the appropriate + Alertmanager-compatible API paths under this base URL. Do not append ``/api/alertmanager`` yourself.
66-87: Security note: avoid committing tokens in generated_values.yaml; prefer secretsThese values often end up in repos or CI logs. Encourage secret management.
Proposed addition just after the YAML block:
+.. warning:: + + The values shown above contain long-lived credentials. Avoid committing ``generated_values.yaml`` + to source control. Where possible, store tokens in Kubernetes Secrets and reference them via the + chart's supported secret values or use your secret manager/CI to inject them at deploy time. Rotate + tokens if exposure is suspected.If the Helm chart supports referencing secrets directly for these fields, consider adding an example.
99-109: Generalize namespace in kubectl/helm commandsMany users don’t run in “default”. Use a placeholder to reduce copy-paste errors.
Apply this diff:
- helm upgrade robusta robusta/robusta -f generated_values.yaml -n default + # Replace <NAMESPACE> with the namespace where Robusta is installed (e.g., robusta or default) + helm upgrade robusta robusta/robusta -f generated_values.yaml -n <NAMESPACE>- kubectl rollout restart deployment/robusta-runner -n default + kubectl rollout restart deployment/robusta-runner -n <NAMESPACE>
128-129: Generalize namespace for Holmes restart as wellKeep consistency with earlier commands.
Apply this diff:
- helm upgrade robusta robusta/robusta -f generated_values.yaml -n default - kubectl rollout restart deployment/robusta-holmes -n default + helm upgrade robusta robusta/robusta -f generated_values.yaml -n <NAMESPACE> + kubectl rollout restart deployment/robusta-holmes -n <NAMESPACE>
119-123: Holmes settings: call out why these flags matter for MimirA brief justification reduces misconfiguration churn.
Proposed rewording:
-* Set ``fetch_labels_with_labels_api: false`` (important for Mimir compatibility) -* Set ``fetch_metadata_with_series_api: true`` (important for Mimir compatibility) +* Set ``fetch_labels_with_labels_api: false`` (avoids Mimir labels API incompatibilities) +* Set ``fetch_metadata_with_series_api: true`` (uses the series API which Mimir supports reliably)
145-151: Optional: suggest a quick query to validate Prometheus connectivityGive users a concrete smoke test.
Proposed addition:
2. Check if CPU and memory graphs are displayed 3. If graphs are shown, the metrics integration is working correctly +4. Alternatively, run a quick query via the proxy to validate connectivity: + + .. code-block:: bash + + curl -H "Authorization: Bearer ${GLSA_TOKEN}" \ + "https://YOUR-INSTANCE.grafana.net/api/datasources/proxy/uid/PROMETHEUS_DATASOURCE_UID/api/v1/query?query=up%5B5m%5D" | jq '.status'
171-176: Troubleshooting Holmes: include 403/401 guidanceAdd actionable hints for common auth failures.
Proposed additions:
* Verify ``fetch_metadata_with_series_api`` is set to ``true`` * Check that the Holmes deployment has restarted after configuration changes * Review Holmes logs for authentication errors: ``kubectl logs -n default deployment/robusta-holmes`` +* For 401/403 errors: + - Confirm the service account role (Viewer or above) and that the token is valid (not expired/revoked). + - Verify the Prometheus datasource UID is correct and accessible to the token. + - Ensure the Authorization header is sent to the Grafana proxy (``Bearer <token>``).src/robusta/integrations/slack/sender.py (1)
646-651: Nit: capitalize “Holmes” and tighten punctuation for claritySmall copy improvements for consistency with product naming and readability.
Apply this diff:
- return MarkdownBlock("_Ask AI questions about this alert, by connecting <https://platform.robusta.dev/create-account|Robusta SaaS> and tagging holmes._") + return MarkdownBlock("_Ask AI questions about this alert by connecting <https://platform.robusta.dev/create-account|Robusta SaaS> and tagging Holmes._") @@ - return MarkdownBlock("_Ask AI questions about this alert, by adding holmes to your <https://docs.robusta.dev/master/configuration/holmesgpt/index.html#enable-holmes-in-slack-in-the-platform|Slack>._") + return MarkdownBlock("_Ask AI questions about this alert by adding Holmes to your <https://docs.robusta.dev/master/configuration/holmesgpt/index.html#enable-holmes-in-slack-in-the-platform|Slack>._") @@ - return MarkdownBlock("_Ask AI questions about this alert, by tagging holmes in a threaded reply_") + return MarkdownBlock("_Ask AI questions about this alert by tagging Holmes in a threaded reply_")helm/robusta/values.yaml (1)
712-714: Expose runner bind address via Helm valuesYou added RUNNER_BIND_ADDR support in code, but it's not configurable via Helm. If you want users to override it (e.g., bind only to 127.0.0.1 in certain setups), add a Helm value and wire it into the template.
Apply this diff here to introduce the value:
# Enable hardened filesystem security (read-only root filesystem with writable volume mounts) hardenedFs: false setKRRSecurityContext: false + # Bind address for the runner web server (consumed by RUNNER_BIND_ADDR) + bindAddr: "0.0.0.0"Then add the env var in helm/robusta/templates/runner.yaml alongside the other runner envs (see my comment there for the exact patch).
src/robusta/runner/web.py (1)
69-72: Log the bind address/port on startup for easier opsTiny traceability improvement: log the bind host and port before starting Flask so it’s visible in pod logs.
@staticmethod def run(): - app.run(host=RUNNER_BIND_ADDR, port=PORT, use_reloader=False) + logging.info("Starting runner web server on %s:%s", RUNNER_BIND_ADDR, PORT) + app.run(host=RUNNER_BIND_ADDR, port=PORT, use_reloader=False)helm/robusta/templates/runner.yaml (1)
121-126: Also wire RUNNER_BIND_ADDR from valuesSince RUNNER_BIND_ADDR is now supported in code, expose it here so users can override the default bind address via Helm.
- name: CLUSTER_DOMAIN value: {{ .Values.global.clusterDomain }} - name: SET_KRR_SECURITY_CONTEXT value: {{ .Values.runner.setKRRSecurityContext | quote }} + - name: RUNNER_BIND_ADDR + value: {{ .Values.runner.bindAddr | default "0.0.0.0" | quote }} {{- if .Values.openshift.enabled }} - name: IS_OPENSHIFT value: "True"docs/configuration/resource-recommender.rst (1)
228-235: Avoid embedding secrets directly in curl examplesTo reduce accidental leakage (shell history, copy/paste), prefer an env var for the API key in the example.
-.. code-block:: bash - - curl -X GET "https://api.robusta.dev/api/query/krr/recommendations?account_id=YOUR_ACCOUNT_ID&cluster_id=my-cluster&namespace=default" \ - -H "Authorization: Bearer YOUR_API_KEY" \ - -H "Content-Type: application/json" +.. code-block:: bash + + export ROBUSTA_API_KEY="YOUR_API_KEY" + curl -X GET "https://api.robusta.dev/api/query/krr/recommendations?account_id=YOUR_ACCOUNT_ID&cluster_id=my-cluster&namespace=default" \ + -H "Authorization: Bearer $ROBUSTA_API_KEY" \ + -H "Content-Type: application/json"This also addresses the static secret-header warning from scanners like gitleaks.
playbooks/robusta_playbooks/krr.py (3)
504-514: Good least-privilege securityContext; consider adding runAsUser/readOnlyRootFilesystem (optional)The new SecurityContext is solid: non-root, drop ALL, no privilege escalation, RuntimeDefault seccomp. Two optional hardening tweaks you may consider (ensure the image supports them):
- Set runAsUser to a non-root UID (e.g., 65532) to satisfy clusters enforcing MustRunAsNonRoot regardless of image metadata.
- Set readOnlyRootFilesystem=True if KRR doesn’t need to write to the container FS.
Apply only if you’ve validated the krr image does not require root or a writable root FS.
Here is a minimal diff for the context creation (optional):
- security_context = SecurityContext( - runAsNonRoot=True, - capabilities=Capabilities( - drop=["ALL"] - ), - allowPrivilegeEscalation=False, - seccompProfile=SeccompProfile(type="RuntimeDefault") - ) + security_context = SecurityContext( + runAsNonRoot=True, + runAsUser=65532, # optional: only if image supports non-root UID + readOnlyRootFilesystem=True, # optional: only if image doesn't write to root FS + capabilities=Capabilities(drop=["ALL"]), + allowPrivilegeEscalation=False, + seccompProfile=SeccompProfile(type="RuntimeDefault"), + )
525-525: Avoid passing None for securityContext to keep the serialized spec cleanPassing securityContext=None may serialize an explicit null (depending on the client), which is noisy. Only set the field when enabled.
Apply this refactor:
- spec = PodSpec( - serviceAccountName=params.serviceAccountName, - containers=[ - Container( - name="krr", - image=IMAGE, - imagePullPolicy="Always", - command=["/bin/sh", "-c", python_command], - env=env_var, - resources=resources, - securityContext=security_context, - ) - ], - restartPolicy="Never", - **params.krr_job_spec, - ) + container = Container( + name="krr", + image=IMAGE, + imagePullPolicy="Always", + command=["/bin/sh", "-c", python_command], + env=env_var, + resources=resources, + ) + if security_context: + container.securityContext = security_context + + spec = PodSpec( + serviceAccountName=params.serviceAccountName, + containers=[container], + restartPolicy="Never", + **params.krr_job_spec, + )
43-43: Type hint mismatch: KRR_PUSH_SCAN is a bool, not a strload_bool returns a boolean. Adjust the annotation to match.
-KRR_PUSH_SCAN: str = load_bool("KRR_PUSH_SCAN", True) +KRR_PUSH_SCAN: bool = load_bool("KRR_PUSH_SCAN", True)src/robusta/utils/error_codes.py (1)
40-40: Confirmed: Proper Usage of HOLMES_RATE_LIMIT_EXCEEDED; Documentation Update NeededI ran the suggested search and found the new error code is only referenced in the centralized handler:
- src/robusta/utils/error_codes.py (line 40)
- src/robusta/core/playbooks/internal/ai_integration.py (line 58)
However, there are no mentions of
HOLMES_RATE_LIMIT_EXCEEDEDor5204in the docs or error-code listings. Please add an entry (e.g., in your error-codes documentation or README) to surface this new Holmes rate-limit code.src/robusta/core/playbooks/internal/ai_integration.py (3)
51-62: Harden the centralized Holmes error mapping; add Timeout handling, guard HTTPError without response, fix typos, and include status info.
- Handle requests.Timeout as a connection class error.
- Guard against HTTPError with no attached response (defensive).
- Surface HTTP status in messages for better operability; include Retry-After on 429 when present.
- Fix typos (“occurred”, “API”).
Apply this diff:
-def handle_holmes_error(e: Exception) -> NoReturn: - if isinstance(e, requests.ConnectionError): - raise ActionException(ErrorCodes.HOLMES_CONNECTION_ERROR, "Holmes endpoint is currently unreachable.") - elif isinstance(e, requests.HTTPError): - if e.response.status_code == 401 and "invalid_api_key" in e.response.text: - raise ActionException(ErrorCodes.HOLMES_REQUEST_ERROR, "Holmes invalid api key.") - elif e.response.status_code == 429: - raise ActionException(ErrorCodes.HOLMES_RATE_LIMIT_EXCEEDED, "Holmes rate limit exceeded.") - raise ActionException(ErrorCodes.HOLMES_REQUEST_ERROR, "Holmes internal configuration error.") - else: - raise ActionException(ErrorCodes.HOLMES_UNEXPECTED_ERROR, "An unexpected error occured.") +def handle_holmes_error(e: Exception) -> NoReturn: + # Connection and timeout errors + if isinstance(e, (requests.ConnectionError, requests.Timeout)): + raise ActionException( + ErrorCodes.HOLMES_CONNECTION_ERROR, + "Holmes endpoint is unreachable or timed out.", + ) + # HTTP status errors + elif isinstance(e, requests.HTTPError): + resp = getattr(e, "response", None) + if resp is not None: + status = resp.status_code + if status == 401: + # Prefer a clear message; avoid brittle body matching + if "invalid_api_key" in (resp.text or ""): + msg = "Holmes unauthorized: invalid API key (HTTP 401)." + else: + msg = "Holmes unauthorized (HTTP 401). Check API key." + raise ActionException(ErrorCodes.HOLMES_REQUEST_ERROR, msg) + elif status == 429: + retry_after = resp.headers.get("Retry-After") + msg = "Holmes rate limit exceeded (HTTP 429)." + if retry_after: + msg += f" Retry after {retry_after}s." + raise ActionException(ErrorCodes.HOLMES_RATE_LIMIT_EXCEEDED, msg) + else: + raise ActionException( + ErrorCodes.HOLMES_REQUEST_ERROR, + f"Holmes API request failed (HTTP {status}).", + ) + # Fallback when no response is attached + raise ActionException(ErrorCodes.HOLMES_REQUEST_ERROR, "Holmes API request failed (no response).") + else: + raise ActionException(ErrorCodes.HOLMES_UNEXPECTED_ERROR, "An unexpected error occurred.")
170-174: Reduce repetition: extract a small helper to build FindingSubject from ResourceInfo.The same conditional subject construction appears in multiple places. A tiny helper would reduce duplication and make future changes safer.
For example, add this helper once in the module:
def _make_subject(resource: ResourceInfo | None) -> FindingSubject: return FindingSubject( name=resource.name if resource else "", namespace=resource.namespace if resource else "", subject_type=FindingSubjectType.from_kind(resource.kind) if resource else FindingSubjectType.TYPE_NONE, node=resource.node if resource else "", container=resource.container if resource else "", )Then replace repeated inline constructions with:
subject=_make_subject(params.resource)Also applies to: 234-238, 311-315, 439-443
342-344: Consider sending JSON via requests’ json= parameter (and add a timeout).Using data=holmes_req.json() works but sends a raw string without automatic Content-Type. json=holmes_req.dict() sets headers correctly and avoids double serialization. Also consider a sensible timeout to avoid indefinite hangs (esp. in streaming).
You could refactor like:
holmes_req = HolmesChatRequest( ask=params.ask, conversation_history=params.conversation_history, model=params.model, stream=params.stream, ) # Streaming with requests.post(url, json=holmes_req.dict(), stream=True, headers={"Connection": "keep-alive"}, timeout=30) as resp: ... # Non-streaming result = requests.post(url, json=holmes_req.dict(), timeout=30)Apply similarly to other requests in this module.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (3)
docs/images/krr-api-key.pngis excluded by!**/*.pngdocs/images/new-krr-api-key.pngis excluded by!**/*.pngpoetry.lockis excluded by!**/*.lock
📒 Files selected for processing (13)
docs/configuration/alertmanager-integration/grafana-alert-manager.rst(3 hunks)docs/configuration/alertmanager-integration/grafana-cloud-mimir.rst(1 hunks)docs/configuration/alertmanager-integration/index.rst(2 hunks)docs/configuration/resource-recommender.rst(1 hunks)docs/setup-robusta/installation/dev-setup.rst(2 hunks)helm/robusta/templates/runner.yaml(1 hunks)helm/robusta/values.yaml(1 hunks)playbooks/robusta_playbooks/krr.py(4 hunks)src/robusta/core/model/env_vars.py(2 hunks)src/robusta/core/playbooks/internal/ai_integration.py(16 hunks)src/robusta/integrations/slack/sender.py(1 hunks)src/robusta/runner/web.py(2 hunks)src/robusta/utils/error_codes.py(1 hunks)
✅ Files skipped from review due to trivial changes (2)
- docs/configuration/alertmanager-integration/index.rst
- docs/setup-robusta/installation/dev-setup.rst
🚧 Files skipped from review as they are similar to previous changes (1)
- docs/configuration/alertmanager-integration/grafana-alert-manager.rst
🧰 Additional context used
🧬 Code Graph Analysis (3)
src/robusta/integrations/slack/sender.py (1)
src/robusta/core/reporting/blocks.py (1)
MarkdownBlock(42-61)
playbooks/robusta_playbooks/krr.py (1)
src/robusta/core/model/env_vars.py (1)
load_bool(7-9)
src/robusta/core/playbooks/internal/ai_integration.py (3)
src/robusta/utils/error_codes.py (2)
ActionException(42-47)ErrorCodes(5-40)src/robusta/core/reporting/consts.py (2)
FindingSubjectType(42-72)from_kind(54-72)src/robusta/core/reporting/holmes.py (1)
HolmesChatRequest(37-41)
🪛 Gitleaks (8.27.2)
docs/configuration/alertmanager-integration/grafana-cloud-mimir.rst
53-53: Discovered a potential authorization token provided in a curl command header, which could compromise the curl accessed resource.
(curl-auth-header)
docs/configuration/resource-recommender.rst
232-233: Discovered a potential authorization token provided in a curl command header, which could compromise the curl accessed resource.
(curl-auth-header)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: run_tests
- GitHub Check: Deploy docs
🔇 Additional comments (17)
docs/configuration/alertmanager-integration/grafana-cloud-mimir.rst (1)
1-4: Overall: clear, actionable integration doc—nice workStructure, flow, and cross-references are solid. The proxy-based approach for Mimir is correctly highlighted, and the Holmes-specific flags are called out.
Also applies to: 58-66, 110-117, 142-188
src/robusta/integrations/slack/sender.py (2)
646-651: LGTM: copy updated to remove the Slack “@” mentionThe new wording aligns with platform-agnostic docs (no implicit Slack-only syntax). Functionality unchanged.
648-648: Doc link confirmed canonical
The URL returns HTTP 200 with no redirects, so it remains valid and doesn’t need updating.helm/robusta/values.yaml (1)
713-713: New KRR securityContext flag looks goodThe new runner.setKRRSecurityContext toggle with a safe default (false) is clear and aligns with the Helm/env wiring in the PR.
src/robusta/runner/web.py (1)
13-15: Importing RUNNER_BIND_ADDR is correct and used belowThe new env var is properly imported and consumed. No issues.
helm/robusta/templates/runner.yaml (1)
121-123: KRR security context env wiring is correctSET_KRR_SECURITY_CONTEXT is correctly surfaced from values and quoted as a string for consistent parsing.
src/robusta/core/model/env_vars.py (2)
91-92: RUNNER_BIND_ADDR addition is correct and non-breakingDefaulting to 0.0.0.0 preserves existing behavior while enabling overrides. Looks good.
142-142: SET_KRR_SECURITY_CONTEXT toggle added cleanlyBoolean toggle via load_bool with a safe default. Matches Helm/env wiring.
docs/configuration/resource-recommender.rst (2)
160-166: Docs for enabling KRR security context are accurateThe key matches the Helm value (runner.setKRRSecurityContext). Clear and actionable.
187-189: Confirm and align KRR API endpoint pathThe heading at line 187 shows
GET /api/krr/recommendations
but the curl example at lines 232–235 calls
GET /api/query/krr/recommendationsPlease verify which path the server actually exposes and update both locations in docs/configuration/resource-recommender.rst accordingly:
• Line 187: replace the “GET …” heading
• Lines 232–235: update the curl example URLplaybooks/robusta_playbooks/krr.py (2)
11-11: Hikaru imports for SecurityContext are correct and align with k8s 1.26 modelsImporting SecurityContext, Capabilities, and SeccompProfile from hikaru 1.26 is appropriate for attaching a container-level securityContext.
33-33: No Action Needed:SET_KRR_SECURITY_CONTEXTIs Already a BooleanVerified that in
src/robusta/core/model/env_vars.py:SET_KRR_SECURITY_CONTEXT = load_bool("SET_KRR_SECURITY_CONTEXT", False)This ensures
SET_KRR_SECURITY_CONTEXTis a boolean, so the existingif SET_KRR_SECURITY_CONTEXT:check behaves as intended.src/robusta/core/playbooks/internal/ai_integration.py (5)
4-4: Type annotation import (NoReturn) is appropriate.Import is correct and matches the handler’s always-raise contract.
136-136: Verified: All exception handlers delegate tohandle_holmes_errorEvery
except Exception as eblock insrc/robusta/core/playbooks/internal/ai_integration.py(lines 132, 186, 251, 328, 409, 456) correctly callshandle_holmes_error(e). This centralizes error handling and eliminates duplication.
83-83: Confirmed: HolmesRequest defines amodelfield, so passingmodel=params.modelon line 83 is valid.
Approved.
298-299: HolmesIssueChatRequest already supportsmodel
Themodel: Optional[str] = Nonefield is declared on the parentHolmesChatRequest(line 40 in src/robusta/core/reporting/holmes.py), soHolmesIssueChatRequestinherits it and will includemodelwithout validation errors.
426-427: Confirmed:modelfield is supported via inheritanceThe
model: Optional[str]attribute is declared on HolmesChatRequest (line 40) and inherited by HolmesWorkloadHealthRequest, so no changes are needed.
|
Merged with a new PR #1902 |
Pull request was closed
Main changes:
Alert Sourcesdocs intoAlert SourcesandMetrics Providers