feat(payment-gated-subs): expire incomplete subscriptions on timeout#5519
Open
ancorcruz wants to merge 6 commits into
Open
feat(payment-gated-subs): expire incomplete subscriptions on timeout#5519ancorcruz wants to merge 6 commits into
ancorcruz wants to merge 6 commits into
Conversation
…on timeout ## Context Gated subscriptions stuck in `incomplete` past their activation rule's `expires_at` need to be canceled so authorized PSP funds are released and the customer can be retried without conflicting state. M1 left the foundation (`expires_at` column, `expirable` scope, payment evaluator that accepts `:expired`) but no actor to run the transition. ## Description Add `Subscriptions::ActivationRules::ExpireService` — the core timeout-driven cancel: 1. Acquires `subscription.with_lock`. Race protection against a payment webhook landing concurrently. 2. Re-checks `subscription.incomplete?` after the lock; if it resolved between the clock-job pickup and the lock acquisition (success webhook won the race), bails cleanly. 3. Transitions the payment activation rule to `:expired` via `Payment::EvaluateService`. 4. Closes the open invoice (`invoice.closed!`). 5. Calls `ResolveSubscriptionStatusService` — the existing M1 service handles the actual `mark_as_canceled!` transition, webhook (`subscription.canceled`), and activity log. 6. Sets `cancelation_reason: :timeout` after the resolution. Matches M1's `Payment::ResolveService#handle_failure` pattern of caller- sets-reason: rule status alone doesn't disambiguate which actor triggered the rejection. 7. Best-effort: enqueues `PaymentProviders::CancelPaymentJob` for the most recent pending/processing payment on the invoice. The PSP-side cancel runs after the transaction commits. Spec covers three contexts: happy path, race where the subscription already resolved before lock acquisition, and the no-eligible-payment case (rule expires, sub cancels, no PSP cancel job is enqueued). `PaymentProviders::CancelPaymentJob` is still in an open PR (the dispatcher); the spec defines a minimal stub inline so it works against current main.
Thin async wrapper around ExpireService. Each clock-job tick enqueues one ExpireIncompleteJob per expirable subscription; the job runs independently so a slow expiration does not block others. Queue routing follows the sibling subscription-billing convention: :billing when SIDEKIQ_BILLING is enabled, :default otherwise. Matches Subscriptions::TerminateJob. unique :until_executed prevents the same subscription from being enqueued twice if a clock tick runs before the previous tick's jobs have drained. Inner state checks live in ExpireService where they can see the post-lock state.
…ling ExpireIncompleteJob uses the conditional :billing / :default queue pattern so an operator can route subscription-billing jobs onto a dedicated worker pool when SIDEKIQ_BILLING is enabled. The sibling Payment ResolveJob — which handles the success/failure resolution side of the same activation-rule machinery — was still on the plain "default" queue. Update ResolveJob to the same pattern so both jobs scale together. No behavioral change when SIDEKIQ_BILLING is unset (still :default).
…tions Periodic batch worker that scans for gated subscriptions whose payment activation rule has timed out and enqueues per-subscription ExpireIncompleteJob workers. Inherits from ClockJob, which already configures the :clock_worker / :clock queue routing. The query relies on the M1-era Subscription.expirable scope, which joins activation_rules where status is pending and expires_at is in the past — no new database access patterns are introduced here. Spec covers three populations: expirable (incomplete + pending + past), non-expirable pending (rule still within window), and active with a satisfied rule. Only the first should be picked up.
…hourly Run Clock::ExpireIncompleteSubscriptionsJob every hour at *:20, staggered between the existing *:15 api_keys_track_usage and *:30 retry_generating_subscription_invoices schedules to spread load. Sentry cron monitor registered under slug lago_expire_incomplete_subscriptions so missed runs surface as alerts. Hourly granularity follows the same cadence as the existing terminate_ended_subscriptions schedule. timeout_hours values of 1+ are served with at-most-one-hour late tolerance, which matches the M2 "best-effort" semantics.
Two scenarios exercise the full timeout chain end-to-end against the test environment: 1. Gated subscription whose activation rule has aged past its expires_at: clock job picks it up, enqueues the expire job, the expire service runs, and the subscription ends up canceled with cancelation_reason: timeout, the rule expired, and the open invoice closed. 2. Gated subscription still within the timeout window: clock job runs but the subscription remains incomplete, demonstrating that the Subscription.expirable scope correctly excludes future-expiry rules. The dispatcher PR is still open, so PaymentProviders::CancelPaymentJob is stubbed via stub_const for the duration of the new describe block. When that PR merges, the stub becomes a no-op (real class wins) and the scenarios continue to pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
A payment-gated subscription enters
incompleteand waits for the payment to resolve. If the resolution never arrives (3DS abandoned, SEPA pending indefinitely, customer closes browser, mandate cancelled externally and never re-established), the subscription stays in that state indefinitely. There is no automatic cleanup, no PSP-side release of pending authorizations, and no visibility into stuck records.This PR adds the timeout pathway that closes the gap.
Description
A clock-driven chain that finds incomplete subscriptions whose activation rule has aged past its
expires_atand transitions them tocanceledwithcancelation_reason: :timeout, closes the open invoice, and best-effort cancels the pending payment with the PSP.Subscriptions::ActivationRules::ExpireServiceCore service. For a given incomplete subscription:
subscription.with_lock. Race protection against a payment webhook resolution landing concurrently.subscription.incomplete?post-lock and bails if a webhook resolved the subscription first.:expiredviaPayment::EvaluateService.ResolveSubscriptionStatusService, which performs themark_as_canceled!transition and fires thesubscription.canceledwebhook + activity log (existing wiring, unchanged).cancelation_reason: :timeout. Follows the establishedPayment::ResolveService#handle_failurepattern of caller-sets-reason; rule status alone doesn't disambiguate which actor expired the rule (today only timeout, tomorrow potentially manual force-expire).PaymentProviders::CancelPaymentJobfor the most recent pending/processing payment on the invoice. Runs after the transaction commits.Subscriptions::ActivationRules::ExpireIncompleteJobThin async wrapper. Each clock tick enqueues one per expirable subscription so a slow expiration doesn't block others. Queue:
:billingwhenSIDEKIQ_BILLINGis enabled,:defaultotherwise — matches siblingSubscriptions::TerminateJob.unique :until_executedprevents duplicate enqueues if a clock tick fires before the previous tick's jobs have drained.Clock::ExpireIncompleteSubscriptionsJobPeriodic batch worker. Uses the existing
Subscription.expirablescope (incomplete + activation_rule pending + pastexpires_at). Inherits fromClockJob, which already routes the:clock_worker/:clockqueue.Registered in
clock.rbto run hourly at*:20, staggered between the existing*:15and*:30schedules. Sentry cron monitor under sluglago_expire_incomplete_subscriptions.Subscriptions::ActivationRules::Payment::ResolveJobqueue alignmentThe existing
ResolveJobwas onqueue_as "default". Updated to the same conditional:billing/:defaultpattern as the newExpireIncompleteJobso both halves of the activation-rule machinery (success/failure resolution and timeout expiration) scale onto the same dedicated worker pool whenSIDEKIQ_BILLINGis enabled. No behavioral change when the env var is unset.E2E scenarios
Two new scenarios in
payment_gated_activation_spec.rb:expires_at: clock → expire job → expire service → subscription canceled withcancelation_reason: :timeout, rule expired, invoice closed.Subscription.expirablescope correctly excludes future-expiry rules.