Summary
Enhance deduplication to group alerts that share the same root cause, even if they have different fingerprints. Currently dedup is fingerprint-based (same alert name + service + namespace). This misses cases where 5 different alerts (OOMKill, 5xx spike, latency increase, pod restart, failed health check) all stem from one bad deployment.
Why This Matters
Fingerprint dedup handles identical alerts. RCA-based grouping handles related alerts. This is what gets noise from 50 alerts to 3 actionable incidents.
Acceptance Criteria
Example
Input: OOMKill on pod-A, 5xx on service-B (calls pod-A), latency spike on service-C (calls B)
Output: 1 incident — "pod-A OOMKill causing cascading failures to B and C"
Summary
Enhance deduplication to group alerts that share the same root cause, even if they have different fingerprints. Currently dedup is fingerprint-based (same alert name + service + namespace). This misses cases where 5 different alerts (OOMKill, 5xx spike, latency increase, pod restart, failed health check) all stem from one bad deployment.
Why This Matters
Fingerprint dedup handles identical alerts. RCA-based grouping handles related alerts. This is what gets noise from 50 alerts to 3 actionable incidents.
Acceptance Criteria
Example