Conversation
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
TDD setup - tests will pass once feature is implemented. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Send alert when destination is auto-disabled after reaching consecutive failure threshold. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
I think we should rename One thing to consider here is what we'll do once we have alerts based on failure rate and the associated event type.
Yes, your proposition makes more sense
No strong opinion, but the current payload does lack |
yes, let's do
that's just the computed value of also do you know |
|
I would represent the threshold itself which may not line up with the current / max exactly. |
Changes
Alert Payload Schema
delivery_response→attempt_responsetenant_idto top-level of alert dataAlertDestinationwithfilter,metadata,updated_atNew
alert.destination.disabledCallbackError Handling
DestinationDisablerreturns disabled destination for timestamp consistencyAlert Payloads
alert.consecutive_failure{ "topic": "alert.consecutive_failure", "timestamp": "2025-01-15T10:30:00Z", "data": { "tenant_id": "tenant_123", "event": { "id": "evt_abc", "topic": "user.created", "metadata": {}, "data": {} }, "max_consecutive_failures": 20, "consecutive_failures": 10, "will_disable": false, "destination": { "id": "dest_xyz", "tenant_id": "tenant_123", "type": "webhook", "topics": ["*"], "filter": {}, "config": {}, "metadata": {}, "created_at": "2025-01-01T00:00:00Z", "updated_at": "2025-01-01T00:00:00Z", "disabled_at": null }, "attempt_response": { "status": "500", "data": {"error": "Internal Server Error"} } } }alert.destination.disabled{ "topic": "alert.destination.disabled", "timestamp": "2025-01-15T10:30:00Z", "data": { "tenant_id": "tenant_123", "destination": { "id": "dest_xyz", "tenant_id": "tenant_123", "type": "webhook", "topics": ["*"], "filter": {}, "config": {}, "metadata": {}, "created_at": "2025-01-01T00:00:00Z", "updated_at": "2025-01-15T10:30:00Z", "disabled_at": "2025-01-15T10:30:00Z" }, "disabled_at": "2025-01-15T10:30:00Z", "triggering_event": { "id": "evt_abc", "topic": "user.created", "metadata": {}, "data": {} }, "consecutive_failures": 20, "max_consecutive_failures": 20, "attempt_response": { "status": "500", "data": {"error": "Internal Server Error"} } } }Questions
1. Should we include
attemptinstead ofevent?Current payload uses
event/triggering_event, but conceptually the alert is triggered by a delivery attempt, not the event itself. The event is context within the attempt.Also,
attempt_responseis reallyresponse_datafrommodels.Attempt, along with other attempt fields that are currently missing.Proposed structure using full attempt data:
{ "attempt": { "id": "atm_xyz", "attempt_number": 3, "manual": false, "status": "failed", "time": "2025-01-15T10:30:00Z", "code": "500", "response_data": { "status": "500", "data": {...} } }, "event": { "id": "evt_abc", "topic": "user.created", "metadata": {}, "data": {} } }2. Note: Alternative
consecutive_failuresstructureAn alternative structure was mentioned:
The current payload has this info but flat (
consecutive_failures,max_consecutive_failures,will_disable). Happy to refactor if the nested structure is preferred.