fix: log underlying errors when agent chat endpoint returns HTTP 500#4929
Draft
cursor[bot] wants to merge 1 commit into
Draft
fix: log underlying errors when agent chat endpoint returns HTTP 500#4929cursor[bot] wants to merge 1 commit into
cursor[bot] wants to merge 1 commit into
Conversation
When GET /api/v1/agents/canvases/{canvas_id}/chat returns codes.Internal,
the underlying error was either swallowed or logged with limited context.
The grpc error sanitizer then only forwarded the sanitized message to
logs, and Sentry only saw 'HTTP 500 ...' messages with no payload, making
it impossible to diagnose the root cause from a Sentry issue alone.
Capture the actual error with organization_id/user_id/canvas_id fields
before wrapping it in a status.Error, so the next 500 from this endpoint
has the context needed to investigate (DB error in FindCanvas, upstream
provider failure on session provisioning, or invalid JWT identifiers).
Refs: Sentry issue 7495744767
Co-authored-by: Aleksandar Mitrovic <AleksandarCole@users.noreply.github.com>
|
👋 Commands for maintainers:
|
forestileao
added a commit
that referenced
this pull request
May 25, 2026
<!-- CURSOR_AGENT_PR_BODY_BEGIN --> ## Summary Sentry issue [7504868852](https://superplane.sentry.io/issues/7504868852/) reports an `HTTP 500 /account/limits` event captured at level `info` by the `captureHTTPError` middleware in `pkg/public/middleware/logging.go`. That middleware only attaches the URL and status (`CaptureMessage("HTTP %d %s", status, path)`), so the Sentry event by itself has no information about what actually failed. The `/account/limits` endpoint is served by `getOrganizationCreationStatus` in `pkg/public/server.go`, which delegates to `describeOrganizationCreationStatus`. That function can fail at several distinct stages: - `models.CountOrganizationsByBillingAccount` (DB query) - `usage.Service.CheckAccountLimits` (gRPC to the usage service) - `usage.Service.SetupAccount` (lazy provisioning, called on a `codes.NotFound`) - the second `CheckAccountLimits` call after lazy provisioning The handler previously collapsed all of these into a single `log.Errorf("Error loading organization creation status for account %s: %v", ...)`. Because of `%v` on a wrapped error the underlying cause was technically printed, but the entry was unstructured and didn't expose which stage failed or the gRPC status code — neither in the application logs nor (via correlation) in Sentry. ## Changes `pkg/public/server.go` - `describeOrganizationCreationStatus`: emit a structured `log.WithError(err).WithField("account_id", ...).WithField("stage", ...)` entry at each failure point, identifying the stage (`count_organizations`, `check_account_limits`) so future Sentry occurrences can be filtered, alerted on, and triaged from logs. - `checkAccountOrganizationCreationLimits`: log the `SetupAccount` failure and the retry-`CheckAccountLimits` failure separately, both with `account_id` and `grpc_code` fields, instead of silently returning the raw gRPC error. - `getOrganizationCreationStatus` / `createOrganization`: drop the redundant top-level `log.Errorf` (the cause is now logged from the call site with structured fields) and replace with a short tagged `Error` line, so we don't double-log the same chain. `pkg/public/server_test.go` - Extend `fakePublicUsageService` with a `checkAccountErr` field so tests can simulate gRPC failures. - Add `Test__GetOrganizationCreationStatus/returns 500 with diagnostic context when the usage service is unavailable`: when the usage service returns `codes.Unavailable`, the endpoint must still respond with `HTTP 500` (no panic, no nil dereference) — exercising the previously-untested gRPC failure path that drives the Sentry alert. ## Why diagnostic-only This mirrors the same pattern used to triage prior `HTTP 500 ...` Sentry issues: - PR #4810 — `fix: log underlying errors when canvas dashboard returns HTTP 500` (Sentry 7483010621) - PR #4929 — `fix: log underlying errors when agent chat endpoint returns HTTP 500` (Sentry 7495744767) The handler still returns the same HTTP 500 response when the upstream dependency really is broken, but the next occurrence of this Sentry issue will have the actual underlying error (DB failure / gRPC `Unavailable` / `DeadlineExceeded` / etc.) in the application logs with structured fields, so the root cause can be acted on directly. ## Validation The dev environment requires Docker (unavailable in this VM), so the full `make test` / `make lint` / `make check.build.app` chain was not run. As a substitute: - `go build ./pkg/public/... ./pkg/models/... ./pkg/usage/... ./pkg/public/middleware/...` — clean - `go vet ./pkg/public/...` — clean - `gofmt -s -w` / `goimports -w` on the touched files — no further diff ## Refs - Sentry issue [7504868852](https://superplane.sentry.io/issues/7504868852/) - Prior precedents: PR #4810, PR #4929 <!-- CURSOR_AGENT_PR_BODY_END --> <div><a href="https://cursor.com/agents/bc-486bf509-b07b-46bc-89f3-c5c39530be7d"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cursor.com/assets/images/open-in-web-dark.png"><source media="(prefers-color-scheme: light)" srcset="https://cursor.com/assets/images/open-in-web-light.png"><img alt="Open in Web" width="114" height="28" src="https://cursor.com/assets/images/open-in-web-dark.png"></picture></a> <a href="https://cursor.com/background-agent?bcId=bc-486bf509-b07b-46bc-89f3-c5c39530be7d"><picture><source media="(prefers-color-scheme: dark)" srcset="https://cursor.com/assets/images/open-in-cursor-dark.png"><source media="(prefers-color-scheme: light)" srcset="https://cursor.com/assets/images/open-in-cursor-light.png"><img alt="Open in Cursor" width="131" height="28" src="https://cursor.com/assets/images/open-in-cursor-dark.png"></picture></a> </div> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Pedro Leão <60622592+forestileao@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When
GET /api/v1/agents/canvases/{canvas_id}/chatreturnscodes.Internal, the underlying error was either swallowed (in theensureCanvas/parseOrgUserhelpers) or logged with limited context (in theEnsureSessionerror path). The gRPC error sanitizer then only forwarded the sanitized message to logs, and the Sentry HTTP middleware only saw"HTTP 500 ..."messages with no error payload, making it impossible to diagnose the root cause from a Sentry issue alone.This change applies the same diagnostic-logging pattern that PR #4790 introduced for the canvas dashboard endpoints (commit
753ca8408):parseOrgUsernow logs the bad organization/user UUID before wrapping it in astatus.Error(codes.Internal, ...).ensureCanvasnow logs the underlying DB error frommodels.FindCanvasbefore falling through tocodes.Internal.GetCanvasAgentChatnow includesorganization_idanduser_id(in addition to the existingcanvas_id) on itsEnsureSessionerror log line.The next time this endpoint returns 500, the application logs will include the actual error chain (DB error, upstream Anthropic provider failure during session provisioning, JWT/UUID misconfiguration, etc.), so the Sentry issue can be triaged without runtime instrumentation.
Why diagnostic-only
Sentry only captured
HTTP 500 /api/v1/agents/canvases/{id}/chatfor issue 7495744767, withlevel=infoand no error payload — thecaptureHTTPErrormiddleware just recordsCaptureMessage("HTTP %d %s", status, path). Without runtime visibility into the underlying error, the safest fix is to ensure the next occurrence is logged with full context, mirroring the same pattern used for the previous "HTTP 500 dashboard" Sentry issue.Validation
Inside the dev
appcontainer:make test PKG_TEST_PACKAGES="./pkg/grpc/actions/agents ./pkg/agents/..."→ all 53 tests passmake format.go→ no diffsmake lint→ cleanmake check.build.app→ builds successfullyRefs
753ca8408("fix: log underlying errors when canvas dashboard returns HTTP 500")