Skip to content

feat: add alerts for indexer lag and source failures#834

Open
supremeproton01 wants to merge 4 commits into
Pulsefy:mainfrom
supremeproton01:feature/indexer-lag-failure-alerting
Open

feat: add alerts for indexer lag and source failures#834
supremeproton01 wants to merge 4 commits into
Pulsefy:mainfrom
supremeproton01:feature/indexer-lag-failure-alerting

Conversation

@supremeproton01

Copy link
Copy Markdown

Summary

  1. Lag Metrics Produced ✅
    Stellar Ledger Lag: Tracks how far behind we are from the latest Stellar ledger (via Horizon API)
    Table Ingestion Lag: Measures staleness for articles, social_posts, analytics_records, and contract_events
    Configurable Thresholds: Warning and critical severity levels per metric type

  2. Alerts Configured ✅ (Log-Based for MVP)
    4 Alert Rules:
    🚨 Critical Indexer Lag (>10 minutes)
    ⚠️ Warning Indexer Lag (>2 minutes)
    ⚠️ Data Source Failures (3+ in 5 min window)
    🚨 Pipeline Falling Behind (2+ stale sources)
    Dispatch Methods: Structured JSON logging + optional Telegram/webhooks

  3. Runbook Documented ✅
    ALERTING_RUNBOOK.md (1000+ lines): Complete operations guide with procedures
    ALERTING_INTEGRATION_GUIDE.md (400+ lines): Integration steps with code examples
    QUICK_REFERENCE.md: Quick lookup guide

📦 Deliverables
Core Implementation (1,100 LOC)
src/metrics/indexer_lag.py - Metrics collection
src/metrics/alerting_rules.py - Alert rules engine
src/metrics/ingestion_monitoring.py - Job orchestrator
src/metrics/init.py - Module exports

Testing (380 LOC)
tests/test_metrics_alerting.py - 30+ unit tests

Documentation & Demo
ALERTING_RUNBOOK.md - Operations guide
ALERTING_INTEGRATION_GUIDE.md - Integration guide
IMPLEMENTATION_SUMMARY.md - Delivery summary
QUICK_REFERENCE.md - Quick lookup
demo_indexer_lag_alerting.py - Interactive demo

Closes #745

Type of Change

  • feat
  • test

Validation

  • Lint passed for affected area(s)
  • Tests passed for affected area(s)
  • Manual verification completed (if applicable)

Documentation

  • Documentation updated (or N/A with explanation)

Checklist

  • Branch name uses feat/, fix/, or docs/
  • Commit messages follow Conventional Commits
  • PR scope matches linked issue acceptance criteria

@drips-wave

drips-wave Bot commented Jun 1, 2026

Copy link
Copy Markdown

@supremeproton01 Great news! 🎉 Based on an automated assessment of this PR, the linked Wave issue(s) no longer count against your application limits.

You can now already apply to more issues while waiting for a review of this PR. Keep up the great work! 🚀

Learn more about application limits

@Cedarich

Cedarich commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

@supremeproton01

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Data-processing: Alerting Rules for Indexer Lag and Failed Sources

2 participants