Skip to content

Alerting

Singh, Prabhu (He/Him/His) edited this page Apr 8, 2026 · 3 revisions

Alerting

Logsenta supports real-time alerting when error patterns are detected in your Kubernetes pod logs. Alerts can be sent via email (SMTP) and/or webhooks (Slack, Microsoft Teams, PagerDuty, or custom endpoints).

Table of Contents

  1. How Alerting Works
  2. Enabling Alerting
  3. Threshold Configuration
  4. Email Alerting
  5. Webhook Alerting
  6. Multiple Alert Rules (Pattern-Based Routing)
  7. Configuration Reference
  8. Examples

How Alerting Works

┌─────────────────────────────────────────────────────────────────────────┐
│                        Logsenta Alerting Flow                           │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   Pod Logs ──► Error Pattern ──► Alert Tracker ──► Threshold Check      │
│                  Detected           (count)           (reached?)        │
│                                        │                   │            │
│                                        │              ┌────┴────┐       │
│                                        │              │         │       │
│                                        ▼              ▼         ▼       │
│                                   Increment       No Alert   Send Alert │
│                                    Counter                     │        │
│                                                                │        │
│                                              ┌─────────────────┼────┐   │
│                                              │                 │    │   │
│                                              ▼                 ▼    ▼   │
│                                           Email            Webhooks     │
│                                           (SMTP)    (Slack/Teams/PD)    │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Alert Threshold Logic

The alerting system uses a threshold-based approach to prevent alert fatigue:

  1. Error Detection: When an error pattern is detected in pod logs
  2. Tracking: The AlertTracker records the occurrence with timestamp
  3. Window Check: Old occurrences outside the time window are removed
  4. Threshold Check: If occurrences ≥ threshold count within the window, trigger alert
  5. Cooldown: After alerting, wait for cooldown period before re-alerting the same pattern

Example: With count: 2, windowSeconds: 300, cooldownSeconds: 600:

  • First error at 10:00:00 → No alert (count: 1)
  • Second error at 10:03:00 → Alert triggered! (count: 2 within 5 min)
  • Third error at 10:05:00 → No alert (in cooldown until 10:13:00)
  • Fourth error at 10:15:00 → Alert triggered (cooldown expired, new window)

Enabling Alerting

Enable alerting globally in your values.yaml:

alerting:
  enabled: true

Or via Helm install:

helm install logsenta ./charts/logsenta-engine \
  --set alerting.enabled=true

Threshold Configuration

Parameter Description Default
alerting.threshold.count Minimum errors before alerting 2
alerting.threshold.windowSeconds Time window for counting errors 300 (5 min)
alerting.threshold.cooldownSeconds Cooldown before re-alerting same pattern 600 (10 min)
alerting.threshold.groupByNamespace Group alerts by namespace true
alerting.threshold.groupByPod Group alerts by pod name false
alerting:
  enabled: true
  threshold:
    count: 2              # Alert after 2 errors
    windowSeconds: 300    # Within 5 minutes
    cooldownSeconds: 600  # Wait 10 min before re-alerting
    groupByNamespace: true
    groupByPod: false

Grouping Explained

  • groupByNamespace: true (default): Errors from different namespaces are tracked separately
  • groupByPod: true: Errors from different pods are tracked separately (more granular)
groupByNamespace groupByPod Behavior
true false Same pattern in same namespace = single alert
true true Same pattern in same pod = single alert
false false Same pattern anywhere = single alert

Email Alerting

Send alerts via SMTP email.

Parameter Description Default
alerting.email.enabled Enable email alerts false
alerting.email.smtpHost SMTP server hostname smtp.gmail.com
alerting.email.smtpPort SMTP server port 587
alerting.email.useTls Use STARTTLS true
alerting.email.useSsl Use SSL (for port 465) false
alerting.email.username SMTP username ""
alerting.email.password SMTP password ""
alerting.email.fromAddress Sender email address ""
alerting.email.toAddresses Recipient email addresses []
alerting.email.ccAddresses CC email addresses []
alerting.email.subjectPrefix Email subject prefix [Logsenta Alert]

Email Configuration Example

alerting:
  enabled: true
  email:
    enabled: true
    smtpHost: "smtp.gmail.com"
    smtpPort: 587
    useTls: true
    username: "alerts@yourcompany.com"
    password: "your-app-password"  # Use App Password for Gmail
    fromAddress: "logsenta@yourcompany.com"
    toAddresses:
      - "sre-team@yourcompany.com"
      - "oncall@yourcompany.com"
    ccAddresses:
      - "devops-lead@yourcompany.com"
    subjectPrefix: "[PROD Alert]"

Gmail Setup

For Gmail, you need to use an App Password:

  1. Enable 2-Factor Authentication on your Google account
  2. Go to Security → App Passwords
  3. Generate a new App Password for "Mail"
  4. Use this password in alerting.email.password

Webhook Alerting

Send alerts to Slack, Microsoft Teams, PagerDuty, or custom endpoints.

Common Webhook Parameters

Parameter Description Default
name Webhook identifier -
enabled Enable this webhook false
type Webhook type: slack, teams, pagerduty, generic -
url Webhook URL ""
timeout Request timeout (seconds) 30
verifySsl Verify SSL certificate true
headers Custom HTTP headers {}

Slack Configuration

alerting:
  enabled: true
  webhooks:
    - name: "slack"
      enabled: true
      type: "slack"
      url: "https://hooks.slack.com/services/T00/B00/XXXX"
      timeout: 30
      verifySsl: true

Slack Alert Format:

🚨 Logsenta Alert
┌──────────────────────────────────
│ Pattern:     NullPointerException
│ Namespace:   production
│ Pod:         api-server-abc123
│ Container:   main
│ Occurrences: 5
│ Time:        2026-04-03 10:15:30
├──────────────────────────────────
│ Error:
│ java.lang.NullPointerException at...
└──────────────────────────────────

Microsoft Teams Configuration

alerting:
  enabled: true
  webhooks:
    - name: "teams"
      enabled: true
      type: "teams"
      url: "https://outlook.office.com/webhook/..."
      timeout: 30
      verifySsl: true

PagerDuty Configuration

alerting:
  enabled: true
  webhooks:
    - name: "pagerduty"
      enabled: true
      type: "pagerduty"
      url: "https://events.pagerduty.com/v2/enqueue"
      timeout: 30
      verifySsl: true
      headers:
        X-Routing-Key: "YOUR_PAGERDUTY_ROUTING_KEY"

Generic Webhook Configuration

For custom alerting endpoints:

alerting:
  enabled: true
  webhooks:
    - name: "custom"
      enabled: true
      type: "generic"
      url: "https://your-alerting-system.com/api/alerts"
      timeout: 30
      verifySsl: true
      headers:
        Authorization: "Bearer YOUR_API_TOKEN"
        X-Custom-Header: "value"

Generic Payload Format:

{
  "alert_type": "logsenta_error",
  "pattern": "NullPointerException",
  "namespace": "production",
  "pod_name": "api-server-abc123",
  "container": "main",
  "error_line": "java.lang.NullPointerException at...",
  "occurrences": 5,
  "timestamp": "2026-04-03T10:15:30Z",
  "log_context": ["line1", "line2", "..."]
}

Multiple Alert Rules (Pattern-Based Routing)

For advanced use cases, you can define multiple alert rules to route specific error patterns to different recipients with different thresholds.

How Rules Work

┌─────────────────────────────────────────────────────────────────────────┐
│                     Rule-Based Alert Routing                            │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   Error Detected ──► Match Rules ──► Found?                             │
│                           │            │                                │
│                           │       ┌────┴────┐                           │
│                           │       │         │                           │
│                           ▼       ▼         ▼                           │
│                      Rule 1    No Match   Use Default                   │
│                      Rule 2  ──────────► Global Config                  │
│                      Rule N                                             │
│                           │                                             │
│                           ▼                                             │
│                   ┌──────────────┐                                      │
│                   │ Rule Config  │                                      │
│                   ├──────────────┤                                      │
│                   │ • Threshold  │                                      │
│                   │ • Email      │──► Team-specific alerts              │
│                   │ • Webhooks   │                                      │
│                   └──────────────┘                                      │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Rule Configuration

Each rule can have:

Field Description Required
name Unique rule identifier Yes
patterns List of regex/string patterns to match Yes
threshold Custom threshold for this rule No
email Rule-specific email recipients No
webhooks List of webhook names to trigger No
namespaces Limit rule to specific namespaces No

Example: Multiple Rules Configuration

alerting:
  enabled: true
  
  # Global SMTP configuration (shared by all rules)
  email:
    enabled: true
    smtpHost: "smtp.example.com"
    smtpPort: 587
    useTls: true
    username: "alerts@example.com"
    password: "your-password"
    fromAddress: "logsenta@example.com"
  
  # Webhooks available for rules to reference
  webhooks:
    - name: "slack"
      enabled: true
      type: "slack"
      url: "https://hooks.slack.com/services/T00/B00/xxx"
    - name: "pagerduty"
      enabled: true
      type: "pagerduty"
      url: "https://events.pagerduty.com/v2/enqueue"
      headers:
        X-Routing-Key: "YOUR_KEY"
    - name: "teams"
      enabled: true
      type: "teams"
      url: "https://outlook.office.com/webhook/..."
  
  # Pattern-based alert rules
  rules:
    # Critical errors - immediate PagerDuty + Email to on-call
    - name: "critical-errors"
      patterns:
        - "CRITICAL"
        - "FATAL"
        - "OOMKilled"
        - "panic:"
      threshold:
        count: 1              # Alert immediately
        windowSeconds: 60
        cooldownSeconds: 300  # 5 min cooldown
      email:
        enabled: true
        toAddresses:
          - "oncall@example.com"
          - "sre-critical@example.com"
      webhooks:
        - "pagerduty"
        - "slack"
    
    # Java exceptions - notify Java dev team
    - name: "java-exceptions"
      patterns:
        - "NullPointerException"
        - "IllegalArgumentException"
        - "ClassNotFoundException"
        - "at\\s+[\\w.$]+\\([\\w]+\\.java:\\d+\\)"
      threshold:
        count: 3              # Alert after 3 occurrences
        windowSeconds: 300    # Within 5 minutes
        cooldownSeconds: 600
      email:
        enabled: true
        toAddresses:
          - "java-team@example.com"
      webhooks:
        - "slack"
      namespaces:             # Only for these namespaces
        - "production"
        - "staging"
    
    # Python errors - notify Python team
    - name: "python-errors"
      patterns:
        - "Traceback"
        - "File\\s+\"[^\"]+\",\\s+line\\s+\\d+"
        - "TypeError"
        - "ValueError"
        - "ImportError"
      threshold:
        count: 2
        windowSeconds: 300
        cooldownSeconds: 600
      email:
        enabled: true
        toAddresses:
          - "python-team@example.com"
      webhooks:
        - "teams"
    
    # Database errors - DBA team with high priority
    - name: "database-errors"
      patterns:
        - "ConnectionRefused"
        - "connection timeout"
        - "database.*unavailable"
        - "SQLSTATE"
        - "deadlock"
      threshold:
        count: 2
        windowSeconds: 120    # 2 minutes
        cooldownSeconds: 300
      email:
        enabled: true
        toAddresses:
          - "dba-team@example.com"
      webhooks:
        - "pagerduty"
    
    # Security alerts - Security team
    - name: "security-alerts"
      patterns:
        - "unauthorized"
        - "authentication failed"
        - "access denied"
        - "permission denied"
        - "invalid token"
      threshold:
        count: 5              # Higher threshold to avoid noise
        windowSeconds: 300
        cooldownSeconds: 600
      email:
        enabled: true
        toAddresses:
          - "security-team@example.com"
      webhooks:
        - "slack"

Rule Matching Behavior

  1. Pattern Matching: Each error is checked against all rules in order
  2. Multiple Matches: An error can match multiple rules (triggers all matching rules)
  3. Namespace Filter: If namespaces is specified, rule only applies to those namespaces
  4. No Match: Errors not matching any rule use the global threshold/email/webhooks config
  5. Independent Tracking: Each rule has its own threshold counter and cooldown

Use Cases

Use Case Configuration
Critical alerts → PagerDuty patterns: ["FATAL", "OOMKilled"], webhooks: ["pagerduty"]
Team-specific routing Different email.toAddresses per rule
Namespace isolation namespaces: ["production"] to only alert on prod
High-severity = low threshold threshold.count: 1 for critical patterns
Low-severity = high threshold threshold.count: 10 for warnings

Rule vs Global Alerting

Feature Global Alerting Rule-Based Alerting
Configuration Single threshold/recipients Multiple rules with different configs
Pattern routing All patterns → same destination Pattern → specific destination
Threshold flexibility One threshold for all Per-rule thresholds
Complexity Simple More complex but flexible
Use when Small teams, simple needs Large orgs, multiple teams

Configuration Reference

Environment Variables

These environment variables are automatically set by the Helm chart:

Variable Description
ALERTING_ENABLED Enable alerting (true/false)
ALERTING_THRESHOLD_COUNT Error count threshold
ALERTING_THRESHOLD_WINDOW_SECONDS Time window in seconds
ALERTING_THRESHOLD_COOLDOWN_SECONDS Cooldown period in seconds
ALERTING_GROUP_BY_NAMESPACE Group by namespace
ALERTING_GROUP_BY_POD Group by pod
ALERTING_EMAIL_ENABLED Enable email alerts
ALERTING_EMAIL_SMTP_HOST SMTP host
ALERTING_EMAIL_SMTP_PORT SMTP port
ALERTING_EMAIL_USE_TLS Use TLS
ALERTING_EMAIL_FROM From address
ALERTING_EMAIL_TO To addresses (comma-separated)
ALERTING_EMAIL_SUBJECT_PREFIX Subject prefix
ALERTING_SMTP_USERNAME SMTP username (from Secret)
ALERTING_SMTP_PASSWORD SMTP password (from Secret)
ALERTING_WEBHOOKS_CONFIG Webhooks JSON config

Secrets

Sensitive values are stored in Kubernetes Secrets:

Secret Key Description
alerting-smtp-username SMTP username (base64 encoded)
alerting-smtp-password SMTP password (base64 encoded)
alerting-webhook-{name}-url Webhook URL (base64 encoded)

Examples

Production Setup with Slack and Email

alerting:
  enabled: true
  
  threshold:
    count: 3                # Alert after 3 errors
    windowSeconds: 300      # Within 5 minutes
    cooldownSeconds: 1800   # 30 min cooldown (reduce noise)
    groupByNamespace: true
    groupByPod: false
  
  email:
    enabled: true
    smtpHost: "smtp.sendgrid.net"
    smtpPort: 587
    useTls: true
    username: "apikey"
    password: "SG.xxxx"     # SendGrid API key
    fromAddress: "alerts@yourcompany.com"
    toAddresses:
      - "sre-oncall@yourcompany.com"
    subjectPrefix: "[PROD K8s Alert]"
  
  webhooks:
    - name: "slack-critical"
      enabled: true
      type: "slack"
      url: "https://hooks.slack.com/services/T00/B00/xxx"

High-Sensitivity Setup (Low Threshold)

For critical namespaces where every error matters:

alerting:
  enabled: true
  
  threshold:
    count: 1                # Alert on first error
    windowSeconds: 60       # 1 minute window
    cooldownSeconds: 300    # 5 min cooldown
    groupByNamespace: true
    groupByPod: true        # Per-pod tracking
  
  webhooks:
    - name: "pagerduty"
      enabled: true
      type: "pagerduty"
      url: "https://events.pagerduty.com/v2/enqueue"
      headers:
        X-Routing-Key: "YOUR_CRITICAL_SERVICE_KEY"

Development/Testing Setup

For non-critical environments with reduced alerting:

alerting:
  enabled: true
  
  threshold:
    count: 10               # High threshold
    windowSeconds: 600      # 10 minute window
    cooldownSeconds: 3600   # 1 hour cooldown
    groupByNamespace: false # Aggregate all namespaces
  
  webhooks:
    - name: "slack-dev"
      enabled: true
      type: "slack"
      url: "https://hooks.slack.com/services/T00/B00/dev-channel"

Troubleshooting

Alerts Not Firing

  1. Check alerting is enabled:

    kubectl get configmap -n logsenta -o yaml | grep ALERTING_ENABLED
  2. Check threshold settings:

    • Is count too high?
    • Is windowSeconds too short?
    • Is the pattern still in cooldown?
  3. Check pod logs:

    kubectl logs -n logsenta deployment/logsenta-engine | grep -i alert

Email Not Sending

  1. Verify SMTP credentials:

    kubectl get secret -n logsenta logsenta-engine-credentials -o jsonpath='{.data.alerting-smtp-password}' | base64 -d
  2. Check firewall/network:

    • Ensure pod can reach SMTP server (port 587 or 465)
  3. Gmail issues:

    • Use App Password, not regular password
    • Enable "Less secure app access" (not recommended) or use App Password

Webhook Not Triggering

  1. Verify webhook URL:

    kubectl get secret -n logsenta logsenta-engine-credentials -o jsonpath='{.data.alerting-webhook-slack-url}' | base64 -d
  2. Test webhook manually:

    curl -X POST -H "Content-Type: application/json" \
      -d '{"text":"Test message"}' \
      "YOUR_WEBHOOK_URL"
  3. Check SSL settings:

    • Set verifySsl: false if using self-signed certificates

Next Steps

Clone this wiki locally