Skip to content

fix: log underlying errors when agent chat endpoint returns HTTP 500#4929

Draft
cursor[bot] wants to merge 1 commit into
mainfrom
cursor/agent-1741a40c
Draft

fix: log underlying errors when agent chat endpoint returns HTTP 500#4929
cursor[bot] wants to merge 1 commit into
mainfrom
cursor/agent-1741a40c

Conversation

@cursor
Copy link
Copy Markdown
Contributor

@cursor cursor Bot commented May 20, 2026

Summary

When GET /api/v1/agents/canvases/{canvas_id}/chat returns codes.Internal, the underlying error was either swallowed (in the ensureCanvas/parseOrgUser helpers) or logged with limited context (in the EnsureSession error path). The gRPC error sanitizer then only forwarded the sanitized message to logs, and the Sentry HTTP middleware only saw "HTTP 500 ..." messages with no error payload, making it impossible to diagnose the root cause from a Sentry issue alone.

This change applies the same diagnostic-logging pattern that PR #4790 introduced for the canvas dashboard endpoints (commit 753ca8408):

  • parseOrgUser now logs the bad organization/user UUID before wrapping it in a status.Error(codes.Internal, ...).
  • ensureCanvas now logs the underlying DB error from models.FindCanvas before falling through to codes.Internal.
  • GetCanvasAgentChat now includes organization_id and user_id (in addition to the existing canvas_id) on its EnsureSession error log line.

The next time this endpoint returns 500, the application logs will include the actual error chain (DB error, upstream Anthropic provider failure during session provisioning, JWT/UUID misconfiguration, etc.), so the Sentry issue can be triaged without runtime instrumentation.

Why diagnostic-only

Sentry only captured HTTP 500 /api/v1/agents/canvases/{id}/chat for issue 7495744767, with level=info and no error payload — the captureHTTPError middleware just records CaptureMessage("HTTP %d %s", status, path). Without runtime visibility into the underlying error, the safest fix is to ensure the next occurrence is logged with full context, mirroring the same pattern used for the previous "HTTP 500 dashboard" Sentry issue.

Validation

Inside the dev app container:

  • make test PKG_TEST_PACKAGES="./pkg/grpc/actions/agents ./pkg/agents/..." → all 53 tests pass
  • make format.go → no diffs
  • make lint → clean
  • make check.build.app → builds successfully

Refs

  • Sentry issue 7495744767
  • Prior precedent: commit 753ca8408 ("fix: log underlying errors when canvas dashboard returns HTTP 500")
Open in Web Open in Cursor 

When GET /api/v1/agents/canvases/{canvas_id}/chat returns codes.Internal,
the underlying error was either swallowed or logged with limited context.
The grpc error sanitizer then only forwarded the sanitized message to
logs, and Sentry only saw 'HTTP 500 ...' messages with no payload, making
it impossible to diagnose the root cause from a Sentry issue alone.

Capture the actual error with organization_id/user_id/canvas_id fields
before wrapping it in a status.Error, so the next 500 from this endpoint
has the context needed to investigate (DB error in FindCanvas, upstream
provider failure on session provisioning, or invalid JWT identifiers).

Refs: Sentry issue 7495744767

Co-authored-by: Aleksandar Mitrovic <AleksandarCole@users.noreply.github.com>
@superplanehq-integration
Copy link
Copy Markdown

👋 Commands for maintainers:

  • /sp start - Start an ephemeral machine (takes ~30s)
  • /sp stop - Stop a running machine (auto-executed on pr close)

forestileao added a commit that referenced this pull request May 25, 2026
<!-- CURSOR_AGENT_PR_BODY_BEGIN -->
## Summary

Sentry issue
[7504868852](https://superplane.sentry.io/issues/7504868852/) reports an
`HTTP 500 /account/limits` event captured at level `info` by the
`captureHTTPError` middleware in `pkg/public/middleware/logging.go`.
That middleware only attaches the URL and status (`CaptureMessage("HTTP
%d %s", status, path)`), so the Sentry event by itself has no
information about what actually failed.

The `/account/limits` endpoint is served by
`getOrganizationCreationStatus` in `pkg/public/server.go`, which
delegates to `describeOrganizationCreationStatus`. That function can
fail at several distinct stages:

- `models.CountOrganizationsByBillingAccount` (DB query)
- `usage.Service.CheckAccountLimits` (gRPC to the usage service)
- `usage.Service.SetupAccount` (lazy provisioning, called on a
`codes.NotFound`)
- the second `CheckAccountLimits` call after lazy provisioning

The handler previously collapsed all of these into a single
`log.Errorf("Error loading organization creation status for account %s:
%v", ...)`. Because of `%v` on a wrapped error the underlying cause was
technically printed, but the entry was unstructured and didn't expose
which stage failed or the gRPC status code — neither in the application
logs nor (via correlation) in Sentry.

## Changes

`pkg/public/server.go`

- `describeOrganizationCreationStatus`: emit a structured
`log.WithError(err).WithField("account_id", ...).WithField("stage",
...)` entry at each failure point, identifying the stage
(`count_organizations`, `check_account_limits`) so future Sentry
occurrences can be filtered, alerted on, and triaged from logs.
- `checkAccountOrganizationCreationLimits`: log the `SetupAccount`
failure and the retry-`CheckAccountLimits` failure separately, both with
`account_id` and `grpc_code` fields, instead of silently returning the
raw gRPC error.
- `getOrganizationCreationStatus` / `createOrganization`: drop the
redundant top-level `log.Errorf` (the cause is now logged from the call
site with structured fields) and replace with a short tagged `Error`
line, so we don't double-log the same chain.

`pkg/public/server_test.go`

- Extend `fakePublicUsageService` with a `checkAccountErr` field so
tests can simulate gRPC failures.
- Add `Test__GetOrganizationCreationStatus/returns 500 with diagnostic
context when the usage service is unavailable`: when the usage service
returns `codes.Unavailable`, the endpoint must still respond with `HTTP
500` (no panic, no nil dereference) — exercising the previously-untested
gRPC failure path that drives the Sentry alert.

## Why diagnostic-only

This mirrors the same pattern used to triage prior `HTTP 500 ...` Sentry
issues:

- PR #4810 — `fix: log underlying errors when canvas dashboard returns
HTTP 500` (Sentry 7483010621)
- PR #4929 — `fix: log underlying errors when agent chat endpoint
returns HTTP 500` (Sentry 7495744767)

The handler still returns the same HTTP 500 response when the upstream
dependency really is broken, but the next occurrence of this Sentry
issue will have the actual underlying error (DB failure / gRPC
`Unavailable` / `DeadlineExceeded` / etc.) in the application logs with
structured fields, so the root cause can be acted on directly.

## Validation

The dev environment requires Docker (unavailable in this VM), so the
full `make test` / `make lint` / `make check.build.app` chain was not
run. As a substitute:

- `go build ./pkg/public/... ./pkg/models/... ./pkg/usage/...
./pkg/public/middleware/...` — clean
- `go vet ./pkg/public/...` — clean
- `gofmt -s -w` / `goimports -w` on the touched files — no further diff

## Refs

- Sentry issue
[7504868852](https://superplane.sentry.io/issues/7504868852/)
- Prior precedents: PR #4810, PR #4929
<!-- CURSOR_AGENT_PR_BODY_END -->

<div><a
href="https://cursor.com/agents/bc-486bf509-b07b-46bc-89f3-c5c39530be7d"><picture><source
media="(prefers-color-scheme: dark)"
srcset="https://cursor.com/assets/images/open-in-web-dark.png"><source
media="(prefers-color-scheme: light)"
srcset="https://cursor.com/assets/images/open-in-web-light.png"><img
alt="Open in Web" width="114" height="28"
src="https://cursor.com/assets/images/open-in-web-dark.png"></picture></a>&nbsp;<a
href="https://cursor.com/background-agent?bcId=bc-486bf509-b07b-46bc-89f3-c5c39530be7d"><picture><source
media="(prefers-color-scheme: dark)"
srcset="https://cursor.com/assets/images/open-in-cursor-dark.png"><source
media="(prefers-color-scheme: light)"
srcset="https://cursor.com/assets/images/open-in-cursor-light.png"><img
alt="Open in Cursor" width="131" height="28"
src="https://cursor.com/assets/images/open-in-cursor-dark.png"></picture></a>&nbsp;</div>

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Pedro Leão <60622592+forestileao@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant