Skip to content

feat(payment-gated-subs): expire incomplete subscriptions on timeout#5519

Open
ancorcruz wants to merge 6 commits into
mainfrom
feat/payment-gated-subs-2/expire-incomplete-subscriptions
Open

feat(payment-gated-subs): expire incomplete subscriptions on timeout#5519
ancorcruz wants to merge 6 commits into
mainfrom
feat/payment-gated-subs-2/expire-incomplete-subscriptions

Conversation

@ancorcruz
Copy link
Copy Markdown
Contributor

Context

A payment-gated subscription enters incomplete and waits for the payment to resolve. If the resolution never arrives (3DS abandoned, SEPA pending indefinitely, customer closes browser, mandate cancelled externally and never re-established), the subscription stays in that state indefinitely. There is no automatic cleanup, no PSP-side release of pending authorizations, and no visibility into stuck records.

This PR adds the timeout pathway that closes the gap.

Description

A clock-driven chain that finds incomplete subscriptions whose activation rule has aged past its expires_at and transitions them to canceled with cancelation_reason: :timeout, closes the open invoice, and best-effort cancels the pending payment with the PSP.

Subscriptions::ActivationRules::ExpireService

Core service. For a given incomplete subscription:

  1. Acquires subscription.with_lock. Race protection against a payment webhook resolution landing concurrently.
  2. Re-checks subscription.incomplete? post-lock and bails if a webhook resolved the subscription first.
  3. Transitions the payment activation rule to :expired via Payment::EvaluateService.
  4. Closes the open invoice.
  5. Calls ResolveSubscriptionStatusService, which performs the mark_as_canceled! transition and fires the subscription.canceled webhook + activity log (existing wiring, unchanged).
  6. Sets cancelation_reason: :timeout. Follows the established Payment::ResolveService#handle_failure pattern of caller-sets-reason; rule status alone doesn't disambiguate which actor expired the rule (today only timeout, tomorrow potentially manual force-expire).
  7. Enqueues PaymentProviders::CancelPaymentJob for the most recent pending/processing payment on the invoice. Runs after the transaction commits.

Subscriptions::ActivationRules::ExpireIncompleteJob

Thin async wrapper. Each clock tick enqueues one per expirable subscription so a slow expiration doesn't block others. Queue: :billing when SIDEKIQ_BILLING is enabled, :default otherwise — matches sibling Subscriptions::TerminateJob. unique :until_executed prevents duplicate enqueues if a clock tick fires before the previous tick's jobs have drained.

Clock::ExpireIncompleteSubscriptionsJob

Periodic batch worker. Uses the existing Subscription.expirable scope (incomplete + activation_rule pending + past expires_at). Inherits from ClockJob, which already routes the :clock_worker / :clock queue.

Registered in clock.rb to run hourly at *:20, staggered between the existing *:15 and *:30 schedules. Sentry cron monitor under slug lago_expire_incomplete_subscriptions.

Subscriptions::ActivationRules::Payment::ResolveJob queue alignment

The existing ResolveJob was on queue_as "default". Updated to the same conditional :billing / :default pattern as the new ExpireIncompleteJob so both halves of the activation-rule machinery (success/failure resolution and timeout expiration) scale onto the same dedicated worker pool when SIDEKIQ_BILLING is enabled. No behavioral change when the env var is unset.

E2E scenarios

Two new scenarios in payment_gated_activation_spec.rb:

  • Gated subscription whose rule has aged past expires_at: clock → expire job → expire service → subscription canceled with cancelation_reason: :timeout, rule expired, invoice closed.
  • Gated subscription still within the timeout window: clock runs but subscription remains incomplete, confirming the Subscription.expirable scope correctly excludes future-expiry rules.

ancorcruz added 6 commits May 15, 2026 15:30
…on timeout

## Context

Gated subscriptions stuck in `incomplete` past their activation rule's
`expires_at` need to be canceled so authorized PSP funds are released
and the customer can be retried without conflicting state. M1 left the
foundation (`expires_at` column, `expirable` scope, payment evaluator
that accepts `:expired`) but no actor to run the transition.

## Description

Add `Subscriptions::ActivationRules::ExpireService` — the core
timeout-driven cancel:

1. Acquires `subscription.with_lock`. Race protection against a
   payment webhook landing concurrently.
2. Re-checks `subscription.incomplete?` after the lock; if it resolved
   between the clock-job pickup and the lock acquisition (success
   webhook won the race), bails cleanly.
3. Transitions the payment activation rule to `:expired` via
   `Payment::EvaluateService`.
4. Closes the open invoice (`invoice.closed!`).
5. Calls `ResolveSubscriptionStatusService` — the existing M1 service
   handles the actual `mark_as_canceled!` transition, webhook
   (`subscription.canceled`), and activity log.
6. Sets `cancelation_reason: :timeout` after the resolution. Matches
   M1's `Payment::ResolveService#handle_failure` pattern of caller-
   sets-reason: rule status alone doesn't disambiguate which actor
   triggered the rejection.
7. Best-effort: enqueues `PaymentProviders::CancelPaymentJob` for the
   most recent pending/processing payment on the invoice. The PSP-side
   cancel runs after the transaction commits.

Spec covers three contexts: happy path, race where the subscription
already resolved before lock acquisition, and the no-eligible-payment
case (rule expires, sub cancels, no PSP cancel job is enqueued).

`PaymentProviders::CancelPaymentJob` is still in an open PR (the
dispatcher); the spec defines a minimal stub inline so it works
against current main.
Thin async wrapper around ExpireService. Each clock-job tick enqueues
one ExpireIncompleteJob per expirable subscription; the job runs
independently so a slow expiration does not block others.

Queue routing follows the sibling subscription-billing convention:
:billing when SIDEKIQ_BILLING is enabled, :default otherwise. Matches
Subscriptions::TerminateJob.

unique :until_executed prevents the same subscription from being
enqueued twice if a clock tick runs before the previous tick's jobs
have drained. Inner state checks live in ExpireService where they can
see the post-lock state.
…ling

ExpireIncompleteJob uses the conditional :billing / :default queue
pattern so an operator can route subscription-billing jobs onto a
dedicated worker pool when SIDEKIQ_BILLING is enabled. The sibling
Payment ResolveJob — which handles the success/failure resolution
side of the same activation-rule machinery — was still on the plain
"default" queue.

Update ResolveJob to the same pattern so both jobs scale together.
No behavioral change when SIDEKIQ_BILLING is unset (still :default).
…tions

Periodic batch worker that scans for gated subscriptions whose payment
activation rule has timed out and enqueues per-subscription
ExpireIncompleteJob workers. Inherits from ClockJob, which already
configures the :clock_worker / :clock queue routing.

The query relies on the M1-era Subscription.expirable scope, which
joins activation_rules where status is pending and expires_at is in
the past — no new database access patterns are introduced here.

Spec covers three populations: expirable (incomplete + pending +
past), non-expirable pending (rule still within window), and active
with a satisfied rule. Only the first should be picked up.
…hourly

Run Clock::ExpireIncompleteSubscriptionsJob every hour at *:20,
staggered between the existing *:15 api_keys_track_usage and *:30
retry_generating_subscription_invoices schedules to spread load.

Sentry cron monitor registered under slug
lago_expire_incomplete_subscriptions so missed runs surface as alerts.

Hourly granularity follows the same cadence as the existing
terminate_ended_subscriptions schedule. timeout_hours values of 1+ are
served with at-most-one-hour late tolerance, which matches the M2
"best-effort" semantics.
Two scenarios exercise the full timeout chain end-to-end against the
test environment:

1. Gated subscription whose activation rule has aged past its
   expires_at: clock job picks it up, enqueues the expire job, the
   expire service runs, and the subscription ends up canceled with
   cancelation_reason: timeout, the rule expired, and the open invoice
   closed.

2. Gated subscription still within the timeout window: clock job runs
   but the subscription remains incomplete, demonstrating that the
   Subscription.expirable scope correctly excludes future-expiry rules.

The dispatcher PR is still open, so PaymentProviders::CancelPaymentJob
is stubbed via stub_const for the duration of the new describe block.
When that PR merges, the stub becomes a no-op (real class wins) and
the scenarios continue to pass.
@ancorcruz ancorcruz self-assigned this May 15, 2026
@ancorcruz ancorcruz marked this pull request as ready for review May 18, 2026 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant