Skip to content

fix: log underlying errors when /account/limits returns HTTP 500#4987

Merged
forestileao merged 2 commits into
mainfrom
cursor/agent-ac77a424
May 25, 2026
Merged

fix: log underlying errors when /account/limits returns HTTP 500#4987
forestileao merged 2 commits into
mainfrom
cursor/agent-ac77a424

Conversation

@cursor
Copy link
Copy Markdown
Contributor

@cursor cursor Bot commented May 25, 2026

Summary

Sentry issue 7504868852 reports an HTTP 500 /account/limits event captured at level info by the captureHTTPError middleware in pkg/public/middleware/logging.go. That middleware only attaches the URL and status (CaptureMessage("HTTP %d %s", status, path)), so the Sentry event by itself has no information about what actually failed.

The /account/limits endpoint is served by getOrganizationCreationStatus in pkg/public/server.go, which delegates to describeOrganizationCreationStatus. That function can fail at several distinct stages:

  • models.CountOrganizationsByBillingAccount (DB query)
  • usage.Service.CheckAccountLimits (gRPC to the usage service)
  • usage.Service.SetupAccount (lazy provisioning, called on a codes.NotFound)
  • the second CheckAccountLimits call after lazy provisioning

The handler previously collapsed all of these into a single log.Errorf("Error loading organization creation status for account %s: %v", ...). Because of %v on a wrapped error the underlying cause was technically printed, but the entry was unstructured and didn't expose which stage failed or the gRPC status code — neither in the application logs nor (via correlation) in Sentry.

Changes

pkg/public/server.go

  • describeOrganizationCreationStatus: emit a structured log.WithError(err).WithField("account_id", ...).WithField("stage", ...) entry at each failure point, identifying the stage (count_organizations, check_account_limits) so future Sentry occurrences can be filtered, alerted on, and triaged from logs.
  • checkAccountOrganizationCreationLimits: log the SetupAccount failure and the retry-CheckAccountLimits failure separately, both with account_id and grpc_code fields, instead of silently returning the raw gRPC error.
  • getOrganizationCreationStatus / createOrganization: drop the redundant top-level log.Errorf (the cause is now logged from the call site with structured fields) and replace with a short tagged Error line, so we don't double-log the same chain.

pkg/public/server_test.go

  • Extend fakePublicUsageService with a checkAccountErr field so tests can simulate gRPC failures.
  • Add Test__GetOrganizationCreationStatus/returns 500 with diagnostic context when the usage service is unavailable: when the usage service returns codes.Unavailable, the endpoint must still respond with HTTP 500 (no panic, no nil dereference) — exercising the previously-untested gRPC failure path that drives the Sentry alert.

Why diagnostic-only

This mirrors the same pattern used to triage prior HTTP 500 ... Sentry issues:

The handler still returns the same HTTP 500 response when the upstream dependency really is broken, but the next occurrence of this Sentry issue will have the actual underlying error (DB failure / gRPC Unavailable / DeadlineExceeded / etc.) in the application logs with structured fields, so the root cause can be acted on directly.

Validation

The dev environment requires Docker (unavailable in this VM), so the full make test / make lint / make check.build.app chain was not run. As a substitute:

  • go build ./pkg/public/... ./pkg/models/... ./pkg/usage/... ./pkg/public/middleware/... — clean
  • go vet ./pkg/public/... — clean
  • gofmt -s -w / goimports -w on the touched files — no further diff

Refs

Open in Web Open in Cursor 

The /account/limits HTTP endpoint and the createOrganization handler both
call describeOrganizationCreationStatus, which can fail when:

  - the underlying CountOrganizationsByBillingAccount DB query errors;
  - the usage gRPC service is unreachable / times out / returns a
    non-NotFound error on CheckAccountLimits;
  - lazy SetupAccount fails with anything other than AlreadyExists; or
  - the second CheckAccountLimits call (after lazy provisioning) fails.

Today's Sentry event 7504868852 ("HTTP 500 /account/limits") was
captured by the public middleware (captureHTTPError) which only records
the URL and status — the actual error chain only existed in the
application log line, which used a single combined log entry and gave
no structured stage information for filtering or alerting.

Mirror the diagnostic-logging pattern from PRs #4810 / #4929: emit a
structured log entry at every failure path with the account ID, a
"stage" tag identifying which dependency failed (count_organizations
vs. check_account_limits), and the gRPC status code where applicable.
The handler-level Errorf calls are replaced with a structured info-free
entry, since the cause is already logged from the call site with full
context.

Adds a regression test that exercises the previously-untested gRPC
failure path: when the usage service returns codes.Unavailable, the
endpoint must still respond with HTTP 500 (no panic, no nil dereference)
so the existing middleware reports it to Sentry — but the next
occurrence will now have the underlying error in the logs for triage.

Refs: Sentry issue 7504868852
@superplanehq-integration
Copy link
Copy Markdown

👋 Commands for maintainers:

  • /sp start - Start an ephemeral machine (takes ~30s)
  • /sp stop - Stop a running machine (auto-executed on pr close)

@forestileao forestileao marked this pull request as ready for review May 25, 2026 15:10
@forestileao forestileao self-requested a review May 25, 2026 15:11
@forestileao forestileao self-assigned this May 25, 2026
@forestileao forestileao merged commit e7458ef into main May 25, 2026
4 checks passed
@forestileao forestileao deleted the cursor/agent-ac77a424 branch May 25, 2026 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants