Problem
Squadron emits a rich vocabulary of mission lifecycle events (mission started/completed/failed/stopped, task started/completed/failed, agent reasoning, tool calls, routing decisions) but has no way to forward terminal outcomes — especially failures — to the operators or systems that care. There's no built-in webhook, no email, no PagerDuty, no Slack channel ping, no UI toast. Today an operator finds out a mission failed by going to look.
There's also a smaller gap: the runner transitions to MissionFailed and returns the error to its caller, but never calls MissionHandler.MissionFailed(...) — the handler chain literally never sees a mission-level failure event. So even before notifications, terminal failures don't land in the event store.
Proposal
A unified Notifier interface, with three implementation routes that share one HCL surface area.
Architecture
// squadron/notify/notifier.go
type Notifier interface {
Send(ctx context.Context, ev NotificationEvent) error
}
type NotificationEvent struct {
Type string // "mission_completed" | "mission_failed"
MissionName string
MissionID string
Title string // pre-rendered short subject
Body string // pre-rendered long body
Severity string // "info" | "warning" | "critical"
Hints map[string]string // routing hints (e.g. "channel": "#ops")
Timestamp time.Time
}
Three sources of Notifier implementations:
-
Built-in (in-process Go) — webhook, email, command_center. Simple HTTP/SMTP/wsbridge transports. No subprocess, no plugin SDK. Compiled into squadron.
-
Notifier plugin (new SDK squadron-notifier-sdk) — gRPC subprocess via hashicorp/go-plugin, mirroring squadron-sdk and squadron-gateway-sdk. Single RPC: Notify(NotificationEvent). v1 ships the SDK as the extension point but no concrete plugin yet. Designed for community/custom transports (PagerDuty, Datadog, OpsGenie, MS Teams).
-
Gateway-as-notifier — extend squadron-gateway-sdk with an optional NotifyingGateway interface (one method: OnNotification). Gateways that implement it can be referenced as notifiers via kind = "gateway". Existing gateways without it keep compiling; squadron does an interface assertion before calling. v1: gateway_slack implements it.
Why hybrid (not pure-plugin)
The gateway SDK already dropped its PagerDuty example (squadron-gateway-sdk#2) because the gateway protocol is Q&A-shaped (buttons, multi-select, free-text), not fire-and-forget. A webhook POST does not need an OS subprocess, SMTP does not need an OS subprocess, and the command-center wsbridge is already in-process — forcing those through a plugin SDK pays the install/release/handshake/process-supervision tax for nothing. Conversely, building a brand-new Slack notifier from scratch when the existing gateway already holds the bot token + channel + websocket is duplicative. The hybrid lets each transport pay only the complexity it actually needs.
HCL surface
# Built-in destinations
notifier "ops_webhook" {
kind = "webhook"
url = "https://hooks.example.com/squadron"
headers = { Authorization = "Bearer ${vars.hook_token}" }
}
notifier "alerts_email" {
kind = "email"
smtp_host = "smtp.sendgrid.net"
smtp_port = 587
from = "squadron@example.com"
to = ["oncall@example.com"]
username = vars.smtp_user
password = vars.smtp_pass
}
notifier "ui_toast" {
kind = "command_center"
}
# Gateway-as-notifier (delegates to existing gateway "slack" block)
notifier "ops_slack" {
kind = "gateway"
gateway = "slack"
channel = "#squadron-ops"
}
# Notifier plugin (post-v1, shape preview)
notifier "pd_critical" {
kind = "pagerduty"
source = "github.com/foo/notifier_pagerduty"
version = "v1.0.0"
settings = { integration_key = vars.pd_key }
}
# Global subscriptions: apply to ALL missions
notify {
on = ["mission_failed"]
targets = [notifiers.ops_slack, notifiers.alerts_email]
}
notify {
on = ["mission_completed"]
targets = [notifiers.ui_toast]
}
# Mission-level subscription: ADDS to global rules (additive merge, dedupe)
mission "nightly_etl" {
notify {
on = ["mission_failed"]
targets = [notifiers.pd_critical]
}
task "extract" { ... }
}
# Mission with no notify block — receives only the global rules
mission "boring_check" {
task "check" { ... }
}
Block conventions:
notifier "name" {} — top-level destination definition (noun, mirrors gateway "name" {}).
notifiers.* — variable namespace for cross-references (mirrors plugins.*, mcp.*).
notify { on = []; targets = [] } — subscription (verb). Supported at both global (top-level) and mission level. Additive merge — a mission's effective rules are the union of its own blocks and all global blocks; targets dedupe per delivery. No opt-out from globals in v1.
Resolution. Notifiers load after gateways (so kind = "gateway" can resolve) and before missions (so missions can reference them) in the staged-evaluation pipeline.
Mission failure event-emission gap fix
Add MissionFailed(name string, err error) to MissionHandler and emit at the three runner failure paths immediately before transitioning state. Implement in CLI handler, StoringMissionHandler, wsbridge streamer, and debug logger. Wire payload (EventMissionFailed / MissionFailedData) already exists in squadron-wire/protocol/events.go — currently never produced.
v1 scope
- Internal
Notifier interface + dispatcher
- Mission failure event-emission gap fixed
- HCL:
notifier "name" {} destinations + notify {} subscriptions at global and mission level
- Built-in destinations: webhook, email, command_center
- Gateway-as-notifier extension in
squadron-gateway-sdk; Slack implements it
squadron-notifier-sdk shipped as new module (interface + proto + plugin host glue), no concrete plugin built
- Events accepted in v1:
mission_completed, mission_failed only
- Defaults-only message rendering (hardcoded title/body templates)
- Best-effort delivery: 10 s per-call timeout, failures logged, no retry queue
- Command center: new wire message type + React toast component
Implementation order (suggested PR sequence)
Each step independently mergeable + testable.
- Plumbing fix — add
MissionHandler.MissionFailed + emit from runner + update all in-tree handlers. PR in squadron.
- Internal
Notifier interface + dispatcher + webhook built-in. PR in squadron.
- Email built-in. PR in
squadron.
NotifyingGateway extension to squadron-gateway-sdk. PR in squadron-gateway-sdk.
- Slack gateway implements
NotifyingGateway. PR in gateway_slack.
squadron-notifier-sdk skeleton — new module init + initial release tag. Separate repo.
- Notifier plugin loader in squadron. PR in
squadron.
- Command center UI toast — built-in
command_center notifier + React toast. PRs in squadron + commander.
Steps 1–3 + 8 alone constitute a usable product even if 4–7 slip.
Touched repos
squadron — new notify/ package, config/notifier.go, runner emission fix, plugin loader, command_center notifier, dispatcher wiring
squadron-gateway-sdk — NotifyingGateway interface + proto extension
squadron-notifier-sdk — new repo/module
gateway_slack — implement NotifyingGateway
commander — React toast component + (possibly) a new wire message type
squadron-sdk — no changes
What's deferred (v2+)
- Concrete notifier plugins (PagerDuty, Datadog, OpsGenie, MS Teams)
- Discord-as-notifier (Slack-only in v1)
task_failed, mission_stopped, budget_breach, mission_issue event subscriptions
- User-provided message templates
- At-least-once delivery / persistent retry queue / outbox table
- Per-target severity overrides
- Mute windows / quiet hours
- Aggregation (e.g. "5 task_failed in 60s → one notification")
- Mission opt-out from global notify rules (
notify { skip_global = true })
Problem
Squadron emits a rich vocabulary of mission lifecycle events (mission started/completed/failed/stopped, task started/completed/failed, agent reasoning, tool calls, routing decisions) but has no way to forward terminal outcomes — especially failures — to the operators or systems that care. There's no built-in webhook, no email, no PagerDuty, no Slack channel ping, no UI toast. Today an operator finds out a mission failed by going to look.
There's also a smaller gap: the runner transitions to
MissionFailedand returns the error to its caller, but never callsMissionHandler.MissionFailed(...)— the handler chain literally never sees a mission-level failure event. So even before notifications, terminal failures don't land in the event store.Proposal
A unified
Notifierinterface, with three implementation routes that share one HCL surface area.Architecture
Three sources of
Notifierimplementations:Built-in (in-process Go) —
webhook,email,command_center. Simple HTTP/SMTP/wsbridge transports. No subprocess, no plugin SDK. Compiled into squadron.Notifier plugin (new SDK
squadron-notifier-sdk) — gRPC subprocess via hashicorp/go-plugin, mirroringsquadron-sdkandsquadron-gateway-sdk. Single RPC:Notify(NotificationEvent). v1 ships the SDK as the extension point but no concrete plugin yet. Designed for community/custom transports (PagerDuty, Datadog, OpsGenie, MS Teams).Gateway-as-notifier — extend
squadron-gateway-sdkwith an optionalNotifyingGatewayinterface (one method:OnNotification). Gateways that implement it can be referenced as notifiers viakind = "gateway". Existing gateways without it keep compiling; squadron does an interface assertion before calling. v1:gateway_slackimplements it.Why hybrid (not pure-plugin)
The gateway SDK already dropped its PagerDuty example (squadron-gateway-sdk#2) because the gateway protocol is Q&A-shaped (buttons, multi-select, free-text), not fire-and-forget. A webhook POST does not need an OS subprocess, SMTP does not need an OS subprocess, and the command-center wsbridge is already in-process — forcing those through a plugin SDK pays the install/release/handshake/process-supervision tax for nothing. Conversely, building a brand-new Slack notifier from scratch when the existing gateway already holds the bot token + channel + websocket is duplicative. The hybrid lets each transport pay only the complexity it actually needs.
HCL surface
Block conventions:
notifier "name" {}— top-level destination definition (noun, mirrorsgateway "name" {}).notifiers.*— variable namespace for cross-references (mirrorsplugins.*,mcp.*).notify { on = []; targets = [] }— subscription (verb). Supported at both global (top-level) and mission level. Additive merge — a mission's effective rules are the union of its own blocks and all global blocks; targets dedupe per delivery. No opt-out from globals in v1.Resolution. Notifiers load after gateways (so
kind = "gateway"can resolve) and before missions (so missions can reference them) in the staged-evaluation pipeline.Mission failure event-emission gap fix
Add
MissionFailed(name string, err error)toMissionHandlerand emit at the three runner failure paths immediately before transitioning state. Implement in CLI handler, StoringMissionHandler, wsbridge streamer, and debug logger. Wire payload (EventMissionFailed/MissionFailedData) already exists insquadron-wire/protocol/events.go— currently never produced.v1 scope
Notifierinterface + dispatchernotifier "name" {}destinations +notify {}subscriptions at global and mission levelsquadron-gateway-sdk; Slack implements itsquadron-notifier-sdkshipped as new module (interface + proto + plugin host glue), no concrete plugin builtmission_completed,mission_failedonlyImplementation order (suggested PR sequence)
Each step independently mergeable + testable.
MissionHandler.MissionFailed+ emit from runner + update all in-tree handlers. PR insquadron.Notifierinterface + dispatcher + webhook built-in. PR insquadron.squadron.NotifyingGatewayextension tosquadron-gateway-sdk. PR insquadron-gateway-sdk.NotifyingGateway. PR ingateway_slack.squadron-notifier-sdkskeleton — new module init + initial release tag. Separate repo.squadron.command_centernotifier + React toast. PRs insquadron+commander.Steps 1–3 + 8 alone constitute a usable product even if 4–7 slip.
Touched repos
squadron— newnotify/package,config/notifier.go, runner emission fix, plugin loader, command_center notifier, dispatcher wiringsquadron-gateway-sdk—NotifyingGatewayinterface + proto extensionsquadron-notifier-sdk— new repo/modulegateway_slack— implementNotifyingGatewaycommander— React toast component + (possibly) a new wire message typesquadron-sdk— no changesWhat's deferred (v2+)
task_failed,mission_stopped,budget_breach,mission_issueevent subscriptionsnotify { skip_global = true })