fix: log underlying errors when /account/limits returns HTTP 500#4987
Merged
Conversation
The /account/limits HTTP endpoint and the createOrganization handler both
call describeOrganizationCreationStatus, which can fail when:
- the underlying CountOrganizationsByBillingAccount DB query errors;
- the usage gRPC service is unreachable / times out / returns a
non-NotFound error on CheckAccountLimits;
- lazy SetupAccount fails with anything other than AlreadyExists; or
- the second CheckAccountLimits call (after lazy provisioning) fails.
Today's Sentry event 7504868852 ("HTTP 500 /account/limits") was
captured by the public middleware (captureHTTPError) which only records
the URL and status — the actual error chain only existed in the
application log line, which used a single combined log entry and gave
no structured stage information for filtering or alerting.
Mirror the diagnostic-logging pattern from PRs #4810 / #4929: emit a
structured log entry at every failure path with the account ID, a
"stage" tag identifying which dependency failed (count_organizations
vs. check_account_limits), and the gRPC status code where applicable.
The handler-level Errorf calls are replaced with a structured info-free
entry, since the cause is already logged from the call site with full
context.
Adds a regression test that exercises the previously-untested gRPC
failure path: when the usage service returns codes.Unavailable, the
endpoint must still respond with HTTP 500 (no panic, no nil dereference)
so the existing middleware reports it to Sentry — but the next
occurrence will now have the underlying error in the logs for triage.
Refs: Sentry issue 7504868852
|
👋 Commands for maintainers:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Sentry issue 7504868852 reports an
HTTP 500 /account/limitsevent captured at levelinfoby thecaptureHTTPErrormiddleware inpkg/public/middleware/logging.go. That middleware only attaches the URL and status (CaptureMessage("HTTP %d %s", status, path)), so the Sentry event by itself has no information about what actually failed.The
/account/limitsendpoint is served bygetOrganizationCreationStatusinpkg/public/server.go, which delegates todescribeOrganizationCreationStatus. That function can fail at several distinct stages:models.CountOrganizationsByBillingAccount(DB query)usage.Service.CheckAccountLimits(gRPC to the usage service)usage.Service.SetupAccount(lazy provisioning, called on acodes.NotFound)CheckAccountLimitscall after lazy provisioningThe handler previously collapsed all of these into a single
log.Errorf("Error loading organization creation status for account %s: %v", ...). Because of%von a wrapped error the underlying cause was technically printed, but the entry was unstructured and didn't expose which stage failed or the gRPC status code — neither in the application logs nor (via correlation) in Sentry.Changes
pkg/public/server.godescribeOrganizationCreationStatus: emit a structuredlog.WithError(err).WithField("account_id", ...).WithField("stage", ...)entry at each failure point, identifying the stage (count_organizations,check_account_limits) so future Sentry occurrences can be filtered, alerted on, and triaged from logs.checkAccountOrganizationCreationLimits: log theSetupAccountfailure and the retry-CheckAccountLimitsfailure separately, both withaccount_idandgrpc_codefields, instead of silently returning the raw gRPC error.getOrganizationCreationStatus/createOrganization: drop the redundant top-levellog.Errorf(the cause is now logged from the call site with structured fields) and replace with a short taggedErrorline, so we don't double-log the same chain.pkg/public/server_test.gofakePublicUsageServicewith acheckAccountErrfield so tests can simulate gRPC failures.Test__GetOrganizationCreationStatus/returns 500 with diagnostic context when the usage service is unavailable: when the usage service returnscodes.Unavailable, the endpoint must still respond withHTTP 500(no panic, no nil dereference) — exercising the previously-untested gRPC failure path that drives the Sentry alert.Why diagnostic-only
This mirrors the same pattern used to triage prior
HTTP 500 ...Sentry issues:fix: log underlying errors when canvas dashboard returns HTTP 500(Sentry 7483010621)fix: log underlying errors when agent chat endpoint returns HTTP 500(Sentry 7495744767)The handler still returns the same HTTP 500 response when the upstream dependency really is broken, but the next occurrence of this Sentry issue will have the actual underlying error (DB failure / gRPC
Unavailable/DeadlineExceeded/ etc.) in the application logs with structured fields, so the root cause can be acted on directly.Validation
The dev environment requires Docker (unavailable in this VM), so the full
make test/make lint/make check.build.appchain was not run. As a substitute:go build ./pkg/public/... ./pkg/models/... ./pkg/usage/... ./pkg/public/middleware/...— cleango vet ./pkg/public/...— cleangofmt -s -w/goimports -won the touched files — no further diffRefs