Skip to content

fix(lifecycle): reduce GitHub API rate limiting from batch enrichment bypass#906

Open
harsh-batheja wants to merge 8 commits intomainfrom
fix/api-rate-limiting
Open

fix(lifecycle): reduce GitHub API rate limiting from batch enrichment bypass#906
harsh-batheja wants to merge 8 commits intomainfrom
fix/api-rate-limiting

Conversation

@harsh-batheja
Copy link
Copy Markdown
Collaborator

Summary

Fixes GitHub API rate limiting caused by commit 53ef778 (Apr 2) which added new poll-cycle handlers that created API call storms.

Before: ~8-10 API calls/PR/30s poll → 112 calls/min with 7 sessions → hits 5,000/hr limit in ~45min
After: ~2-4 API calls/PR/poll → 3-4x reduction, supporting 20-30+ sessions

Changes

1. CRITICAL — maybeDispatchMergeConflicts: Kill getMergeability() fallback

  • Previously called getMergeability() (3 REST calls) whenever hasConflicts was undefined in cached data — even when the batch had successfully fetched PR data
  • Now gates fallback on cachedData being absent (batch didn't run), not on hasConflicts being undefined
  • Uses cachedData.hasConflicts ?? false when batch ran

2. HIGH — maybeDispatchCIFailureDetails: Use batch ciChecks

  • Extended GraphQL PR_FIELDS to include statusCheckRollup.contexts.nodes (individual CI check names, statuses, URLs)
  • Added parseCheckContexts() to parse CheckRun and StatusContext nodes into CICheck[]
  • Added ciChecks?: CICheck[] to PREnrichmentData interface
  • maybeDispatchCIFailureDetails now uses cachedEnrichment.ciChecks when available, falls back to getCIChecks() REST call only when batch didn't run

3. MEDIUM — maybeDispatchReviewBacklog: 2-minute throttle

  • Added in-memory lastReviewBacklogCheckAt: Map<SessionId, number> per-session timestamp
  • Skips getPendingComments + getAutomatedComments API calls if checked within last 2 minutes
  • Comments don't change faster than this in practice; saves 2 REST calls per session per poll

Test plan

  • All 579 core package tests pass
  • All 136 scm-github package tests pass (56 new graphql-batch tests)
  • pnpm typecheck passes for core and scm-github
  • New tests verify batch enrichment bypass for merge conflicts detection
  • New tests verify ciChecks parsed correctly from GraphQL (CheckRun + StatusContext nodes)
  • New tests verify review backlog throttle skips API calls within 2-minute window
  • Existing tests for merge conflict, CI failure, and review backlog behavior unchanged

🤖 Generated with Claude Code

… bypass

Three optimizations to prevent API storms in the lifecycle manager poll cycle:

1. **CRITICAL - maybeDispatchMergeConflicts**: Gate the getMergeability()
   fallback to only run when batch enrichment didn't run at all. Previously
   it called getMergeability() (3 REST calls) whenever hasConflicts was
   undefined, even when the batch had already fetched PR data. Now uses
   cachedData.hasConflicts ?? false when the batch ran.

2. **HIGH - maybeDispatchCIFailureDetails**: Use batch enrichment ciChecks
   when available instead of calling getCIChecks() (separate REST call)
   on every poll. The GraphQL batch query now fetches statusCheckRollup
   contexts (individual check names, statuses, URLs) alongside the rollup
   state. Falls back to getCIChecks() only when batch didn't run.

3. **MEDIUM - maybeDispatchReviewBacklog**: Throttle getPendingComments +
   getAutomatedComments API calls to at most once per 2 minutes per session.
   These were called every 30s even when nothing had changed.

Impact: ~8-10 API calls/PR/poll reduced to ~2-4, enabling 3-4x more
concurrent sessions before hitting GitHub's 5,000/hr REST limit.

Also extends PREnrichmentData with ciChecks?: CICheck[] and adds
parseCheckContexts() helper to graphql-batch.ts for parsing CheckRun
and StatusContext nodes from the GraphQL statusCheckRollup.contexts field.
…ncated

When a PR has >20 CI checks, contexts(first: 20) silently truncates the
list. Setting ciChecks to undefined when pageInfo.hasNextPage is true
ensures maybeDispatchCIFailureDetails falls back to the getCIChecks()
REST call, which returns all checks without truncation.

Also adds pageInfo { hasNextPage } to the contexts GraphQL query so
truncation can be detected.
Add the new throttle map to the existing pruning loop that removes stale
entries for sessions no longer in the session list. Previously the map
was only cleared on terminal status transitions, leaving orphaned entries
for sessions removed externally (killed + cleaned up without transition).
…xt conclusion

Two fixes for automated review findings:

1. Bypass review backlog throttle when a transition reaction just fired for
   humanReactionKey or automatedReactionKey. The transitionReaction branch
   needs to read the current fingerprint via the API to record
   lastPendingReviewDispatchHash. Without bypassing, the throttle prevents
   this write and the next unthrottled poll sees a stale (empty) hash,
   clears the reaction tracker, and fires a duplicate dispatch.

2. Set conclusion on StatusContext nodes in parseCheckContexts() to match
   the REST getCIChecksFromStatusRollup() format (rawState.toUpperCase()).
   The CI failure fingerprint includes c.conclusion ?? '', so inconsistent
   conclusion values between GraphQL and REST paths caused phantom fingerprint
   changes when switching sources, triggering duplicate dispatches.
…pped

Two consistency fixes in parseCheckContexts() vs the REST path:

1. NEUTRAL conclusion: was mapped to 'passed' (with SUCCESS), but
   mapRawCheckStateToStatus() in the REST path maps NEUTRAL to 'skipped'.
   Changed to treat NEUTRAL the same as SKIPPED.

2. CheckRun conclusion: was stored as the raw GraphQL string (may be
   lowercase). REST getCIChecks/getCIChecksFromStatusRollup always store
   conclusion as rawState.toUpperCase(). Now stores rawConclusion which
   is already uppercased during the status branching logic.

Both fixes prevent phantom fingerprint changes when maybeDispatchCIFailureDetails
switches between GraphQL batch and REST fallback across poll cycles.
parseCheckContexts() was mapping these conclusions to 'failed' via the
else fallback, while mapRawCheckStateToStatus() in the REST path
explicitly maps all of them to 'skipped'. Added them to the skipped
branch alongside SKIPPED and NEUTRAL to fully mirror the REST mapping.
parseCheckContexts() mapped QUEUED and WAITING CheckRun statuses to
'running', but mapRawCheckStateToStatus() in the REST path maps both
to 'pending'. Only IN_PROGRESS maps to 'running' in the REST path.

Fixes fingerprint inconsistency when switching between GraphQL batch
and REST fallback across poll cycles.
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit b6813b5. Configure here.

- STARTUP_FAILURE conclusion now falls through to the "skipped" branch
  (matching mapRawCheckStateToStatus() REST default) instead of the
  explicit failure enumeration catch-all
- Null pageInfo guard prevents TypeError from typeof null === "object"
  JavaScript quirk when accessing hasNextPage on a null pageInfo field
- Tests added for both cases
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant