Skip to content

spec: alpha & radar webhooks (public outbound delivery)#6

Open
0xchetan wants to merge 1 commit intomainfrom
feat/alpha-radar-webhooks-spec
Open

spec: alpha & radar webhooks (public outbound delivery)#6
0xchetan wants to merge 1 commit intomainfrom
feat/alpha-radar-webhooks-spec

Conversation

@0xchetan
Copy link
Copy Markdown
Collaborator

Summary

Design spec for re-enabling the public outbound webhook product, replacing the module deleted in commit beed758 (Feb 15 2026) after a DDoS incident.

  • Two feeds: radar (alpha_insights stream) + alpha (signal_created stream).
  • $0.50 / delivery charged via canonical credit_ledger path at enqueue time.
  • Hardening focused on the failure modes that allowed the original attack: paid-source ledger gate, atomic charge+delivery, SSRF + DNS-rebinding protection, sweeper + XAUTOCLAIM crash recovery.
  • No intelligence-monorepo changes needed — PR #184 already publishes both required Redis streams.

Two passes of spec review (issues found → fixed → approved). Ready to drive an implementation plan.

Test plan

  • Reviewer reads end-to-end and confirms scope is implementable as a single plan.
  • Confirm DDoS hypothesis (§8.1) matches what the original incident actually was — postmortem gates Phase 4 rollout.
  • Sign off on the per-endpoint billing model (5 webhooks → $2.50 max fan-out per event).

🤖 Generated with Claude Code

Design for re-enabling public outbound webhook product:

- Two feeds: radar (alpha_insights) and alpha (signal_created)
- $0.50 per delivery via canonical credit_ledger path
- Restores deleted module from beed758^, hardened against original DDoS
- Atomic charge+delivery in single Postgres txn, sweeper for crash recovery
- DNS-rebinding-safe SSRF protection via pinned-lookup https.Agent
- XAUTOCLAIM cron for stream resilience
- Paid-source ledger gate at registration AND re-activation

Spec passed two-pass review. Ready for implementation planning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a design for re-enabling public outbound webhooks for Alpha and Radar feeds, featuring a credit-based charging model and hardened security measures against DDoS and SSRF attacks. Feedback focuses on ensuring the Bull queue jobId matches the delivery ID for the reconciliation sweeper to function correctly and extending SSRF protection to cover HTTP connections allowed in development environments.

// Enqueue OUTSIDE the txn (Bull/Redis is not transactional with Postgres).
// If this fails, sweeper will catch it.
try {
await this.queue.add('deliver', { deliveryId: result.delivery.id }, retryOpts);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The Bull job is added without an explicit jobId. However, the reconciliation sweeper in §7.5 attempts to retrieve the job using the delivery ID (this.queue.getJob(d.id)). For this to work, the deliveryId must be passed as the jobId when enqueuing. Without this, the sweeper will fail to find active jobs and may incorrectly refund deliveries that are still in the queue.

Suggested change
await this.queue.add('deliver', { deliveryId: result.delivery.id }, retryOpts);
await this.queue.add('deliver', { deliveryId: result.delivery.id }, { ...retryOpts, jobId: result.delivery.id });

Comment on lines +491 to +501
export const safeWebhookAgent = new Agent({
lookup: (hostname, options, callback) => {
lookup(hostname, options, (err, address, family) => {
if (err) return callback(err);
if (isPrivateOrMetadata(address)) {
return callback(new Error(`SSRF blocked: ${hostname} resolved to ${address}`));
}
callback(null, address, family);
});
},
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The safeWebhookAgent is defined using https.Agent. Since the design allows http:// URLs in the development profile (line 160), the SSRF protection should also be applied to a corresponding http.Agent. Without this, the SSRF defense could be bypassed in development environments or if the protocol check is misconfigured.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a7eb9e373e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


for (const d of stuck) {
// Check Bull for a job with this deliveryId
const job = await this.queue.getJob(d.id); // or scan active+waiting+delayed
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Set Bull job IDs before querying by delivery ID

The sweeper assumes getJob(d.id) can find the live delivery job, but earlier enqueueing only adds { deliveryId } payload and does not set jobId to that value. In Bull, auto-generated job IDs will then never match deliveryId, so this check can incorrectly conclude "no live job" and trigger refundAndMarkFailed for deliveries that are still queued/running, causing false refunds and dropped webhook deliveries under normal load.

Useful? React with 👍 / 👎.

Comment on lines +382 to +385
await redis.xautoclaim(
stream, GROUP, CONSUMER, /* min-idle-time */ 300_000, '0',
'COUNT', 50,
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Process reclaimed stream messages after XAUTOCLAIM

This cron calls XAUTOCLAIM but discards the returned messages, while the main consumer loop reads only XREADGROUP ... '>' (new entries). Claimed pending entries therefore are not re-dispatched or XACKed, so crash-recovered work can remain stuck in pending indefinitely and never reach customers; the reclaim path needs to explicitly process the returned entries (or add a 0-ID pending read path).

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant