Skip to content

feat: update alert & implement alert.destination.disabled#672

Open
alexluong wants to merge 4 commits intomainfrom
alert
Open

feat: update alert & implement alert.destination.disabled#672
alexluong wants to merge 4 commits intomainfrom
alert

Conversation

@alexluong
Copy link
Collaborator

@alexluong alexluong commented Feb 3, 2026

Changes

Alert Payload Schema

  • Rename delivery_responseattempt_response
  • Add tenant_id to top-level of alert data
  • Expand AlertDestination with filter, metadata, updated_at

New alert.destination.disabled Callback

  • Sent when destination is auto-disabled after consecutive failures

Error Handling

  • Notifications are best-effort (logged, not propagated)
  • DestinationDisabler returns disabled destination for timestamp consistency

Alert Payloads

alert.consecutive_failure

{
  "topic": "alert.consecutive_failure",
  "timestamp": "2025-01-15T10:30:00Z",
  "data": {
    "tenant_id": "tenant_123",
    "event": {
      "id": "evt_abc",
      "topic": "user.created",
      "metadata": {},
      "data": {}
    },
    "max_consecutive_failures": 20,
    "consecutive_failures": 10,
    "will_disable": false,
    "destination": {
      "id": "dest_xyz",
      "tenant_id": "tenant_123",
      "type": "webhook",
      "topics": ["*"],
      "filter": {},
      "config": {},
      "metadata": {},
      "created_at": "2025-01-01T00:00:00Z",
      "updated_at": "2025-01-01T00:00:00Z",
      "disabled_at": null
    },
    "attempt_response": {
      "status": "500",
      "data": {"error": "Internal Server Error"}
    }
  }
}

alert.destination.disabled

{
  "topic": "alert.destination.disabled",
  "timestamp": "2025-01-15T10:30:00Z",
  "data": {
    "tenant_id": "tenant_123",
    "destination": {
      "id": "dest_xyz",
      "tenant_id": "tenant_123",
      "type": "webhook",
      "topics": ["*"],
      "filter": {},
      "config": {},
      "metadata": {},
      "created_at": "2025-01-01T00:00:00Z",
      "updated_at": "2025-01-15T10:30:00Z",
      "disabled_at": "2025-01-15T10:30:00Z"
    },
    "disabled_at": "2025-01-15T10:30:00Z",
    "triggering_event": {
      "id": "evt_abc",
      "topic": "user.created",
      "metadata": {},
      "data": {}
    },
    "consecutive_failures": 20,
    "max_consecutive_failures": 20,
    "attempt_response": {
      "status": "500",
      "data": {"error": "Internal Server Error"}
    }
  }
}

Questions

1. Should we include attempt instead of event?

Current payload uses event / triggering_event, but conceptually the alert is triggered by a delivery attempt, not the event itself. The event is context within the attempt.

Also, attempt_response is really response_data from models.Attempt, along with other attempt fields that are currently missing.

Proposed structure using full attempt data:

{
  "attempt": {
    "id": "atm_xyz",
    "attempt_number": 3,
    "manual": false,
    "status": "failed",
    "time": "2025-01-15T10:30:00Z",
    "code": "500",
    "response_data": { "status": "500", "data": {...} }
  },
  "event": {
    "id": "evt_abc",
    "topic": "user.created",
    "metadata": {},
    "data": {}
  }
}

2. Note: Alternative consecutive_failures structure

An alternative structure was mentioned:

"consecutive_failures": {
  "current": 3,
  "max": 5,
  "progress": 0.6
}

The current payload has this info but flat (consecutive_failures, max_consecutive_failures, will_disable). Happy to refactor if the nested structure is preferred.

alexluong and others added 4 commits February 3, 2026 12:21
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
TDD setup - tests will pass once feature is implemented.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Send alert when destination is auto-disabled after reaching
consecutive failure threshold.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Feb 3, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
outpost-docs Ready Ready Preview, Comment Feb 3, 2026 11:59am
outpost-website Ready Ready Preview, Comment Feb 3, 2026 11:59am

Request Review

@alexbouchardd
Copy link
Contributor

I think we should rename alert.consecutive_failure to something like alert.destination.consecutive_failure or alert.destination.failure.

One thing to consider here is what we'll do once we have alerts based on failure rate and the associated event type.

  1. Should we include attempt instead of event?

Yes, your proposition makes more sense

2

No strong opinion, but the current payload does lack progress, which is the threshold that was triggered

@alexluong
Copy link
Collaborator Author

One thing to consider here is what we'll do once we have alerts based on failure rate and the associated event type.

yes, let's do alert.destination.consecutive_failure in case we want to expand in the future?

but the current payload does lack progress

that's just the computed value of current / max tho, right? Or you want the threshold itself, so 50/70/90/100?

also do you know current can be higher than max? In that case, would progress be fixed at 100 (or 1) or would we continue adding it up?

@alexbouchardd
Copy link
Contributor

I would represent the threshold itself which may not line up with the current / max exactly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants