chore: add GitHub repo diagnostics by Zeyuzhao · Pull Request #543 · ColeMurray/background-agents

Zeyuzhao · 2026-04-22T22:36:35Z

Summary

add control-plane logging around installation repo listing, repo cache serving, and branch fetching
add web-side logging for repo listing, branch listing, and sign-in access checks
add a script to clear the Cloudflare repo cache and cached installation tokens

Testing

Not run

Summary by CodeRabbit

Chores
- Added enhanced diagnostic logging across repository and branch listing operations, including performance metrics and sample data collection
- Added a cache-clearing utility script for Cloudflare KV management, supporting dry-run and local/remote targeting modes

coderabbitai · 2026-04-22T22:36:46Z

📝 Walkthrough

Walkthrough

The changes add comprehensive structured logging and observability throughout repository listing, branch fetching, and authentication flows in both control-plane and web services. A new bash utility script enables Cloudflare KV cache clearing. Logging enhancements capture operation timings, sample data, empty-list warnings, and requester context across multiple API endpoints.

Changes

Cohort / File(s)	Summary
GitHub App & Repository Listing Logging `packages/control-plane/src/auth/github-app.ts`, `packages/control-plane/src/routes/repos.ts`	Adds structured logging to repository and branch listing operations, including token-generation timing, sample repos/branches, per-page fetch latencies, cache freshness state, and warnings when empty lists are returned.
Web API Repository & Branch Endpoints `packages/web/src/app/api/repos/route.ts`, `packages/web/src/app/api/repos/[owner]/[name]/branches/route.ts`	Enhances API endpoints with requester-context logging (unauthorized access, session/email derivation), control-plane request/response tracking, sample data logging, error enrichment with metadata, and warnings on empty results.
Authentication Access Logging `packages/web/src/lib/auth.ts`	Augments `signIn` callback with normalized GitHub username/email processing, match indicators, allowlist-mode detection, and console logging of access-check inputs/results with denial warnings.
Cloudflare Cache Management `scripts/clear-cloudflare-repos-cache.sh`	New bash CLI utility for clearing Cloudflare KV cache entries (`repos:list` and installation token keys), with namespace-ID resolution from Terraform, local/remote targeting, and dry-run mode support.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hop, hop! Through the repos now I see,
Each log and trace tells history!
From branches traced to caches clear,
The data's path is crystal here! 🥕✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'chore: add GitHub repo diagnostics' clearly summarizes the main change—adding structured logging and diagnostic capabilities across multiple layers (control-plane, web, and cache-clearing script) to improve observability of GitHub repository operations.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/control-plane/src/auth/github-app.ts`:
- Around line 495-510: The log currently emits plain sample_repos (and similarly
sample_branches elsewhere) which can leak private names; change the logger.info
call in github.installation_repos.listed to avoid raw names by default: remove
sample_repos/sample_branches and instead emit counts (e.g., total_repos_sampled:
sampleRepos.length) and/or deterministic non-reversible fingerprints (e.g.,
sha256 hashes) only when an explicit opt-in debug flag is set (e.g.,
config.emitRepoSamples or logger.isDebugEnabled). Update the code that builds
sampleRepos/sampleBranches (used around the logger.info block and the block at
lines ~646-655) to compute either hashed values or skip populating them unless
the debug flag is true, and ensure the public keys in installationContext remain
unchanged.

In `@packages/control-plane/src/routes/repos.ts`:
- Around line 51-60: The logs currently include sensitive repo and branch names
via the sample_repos and sample_branches fields (see logger.info usages that map
repos.slice(0,5).map(repo => repo.fullName) and similar branch mappings); change
these logger payloads to avoid plain names by emitting only counts, boolean
cache/stale flags, or non-reversible hashes (or indices) instead, and ensure any
name-level sampling is gated behind an explicit debug flag (e.g., if (isDebug) {
... }) so production logging of logger.info with traceId and repos/branches uses
safe fields like total_repos, sample_count, or hashed_sample_ids rather than
repo.fullName or branch names.

In `@packages/web/src/app/api/repos/`[owner]/[name]/branches/route.ts:
- Around line 18-23: The logs currently emit raw identifiers (requester) and
branch names (sampleBranches); update the logging in the branches route to
redact those values: derive a requester type label from session.user (e.g.,
"github_user", "email_user", "unknown") instead of logging the actual
login/email, and when logging branch info from sampleBranches/log payloads only
emit safe diagnostics such as branch count and either masked branch-name samples
or a hash/first-character-and-length summary (no full names). Locate and change
the logging around the requester variable and any uses of sampleBranches in the
route handler (the requester variable and the branch-sampling/logging block
referenced in the file) to implement these redactions while preserving counts
and context.

In `@packages/web/src/app/api/repos/route.ts`:
- Around line 20-24: The current console.info calls log sensitive identifiers
(requester) and repository names (sampleRepos); update the logging in the route
to avoid PII by replacing direct requester values with a non-identifying
attribute (e.g., requesterType or a hashed/salted identifier) and by replacing
sampleRepos payloads with safe metadata (e.g., sampleRepos.length, cache status,
and optionally anonymized or hashed repo IDs) in the console.info statements
that reference requester and sampleRepos; locate and change the usages of the
variables requester and sampleRepos and any console.info/console.log that emits
them so logs contain only counts, cache metadata, or hashed identifiers instead
of raw emails/repo names.

In `@packages/web/src/lib/auth.ts`:
- Around line 64-83: The logs in the sign-in access check and denial paths in
auth.ts are emitting sensitive identifiers (githubUsername, emailDomain) and
full allowlist contents (allowedUsers, allowedDomains); update the signIn
logging to avoid raw identifiers: replace githubUsername/emailDomain with either
boolean flags (e.g., usernameMatch, domainMatch) or hashed/masked values and
replace allowedUsers/allowedDomains with counts (e.g., allowedUsersCount,
allowedDomainsCount) or presence flags; apply the same change to the warning
block that logs sign-in denial and keep other diagnostic fields
(unsafeAllowAllUsers, allowAllMode, isAllowed) intact. Use the variables seen in
this diff (githubUsername, emailDomain, allowedUsers, allowedDomains,
unsafeAllowAllUsers, usernameMatch, domainMatch, allowAllMode, isAllowed) as the
targets to modify the logged payloads.

In `@scripts/clear-cloudflare-repos-cache.sh`:
- Around line 88-91: The script currently builds WRANGLER using unpinned runtime
calls (WRANGLER=(npx wrangler kv key)), which can pull an unexpected version;
change the invocation to use npx --no-install so the CI fails if the pinned
wrangler isn't available (update the WRANGLER definition symbol to use npx
--no-install wrangler kv key), and keep require_command npx to ensure npx is
present; no other logic changes required.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ef7359b9-edf2-4306-8aca-be4e2d464cae

📥 Commits

Reviewing files that changed from the base of the PR and between 6f9ee21 and c394d02.

📒 Files selected for processing (6)

packages/control-plane/src/auth/github-app.ts
packages/control-plane/src/routes/repos.ts
packages/web/src/app/api/repos/[owner]/[name]/branches/route.ts
packages/web/src/app/api/repos/route.ts
packages/web/src/lib/auth.ts
scripts/clear-cloudflare-repos-cache.sh

coderabbitai · 2026-04-22T22:41:39Z

+  const roundedTokenGenerationMs = Math.round(tokenGenerationMs * 100) / 100;
+  const sampleRepos = allRepos.slice(0, 5).map((repo) => repo.fullName);
+  const installationContext = {
+    app_id: config.appId,
+    installation_id: config.installationId,
+  };
+
+  logger.info("github.installation_repos.listed", {
+    ...installationContext,
+    repository_selection: first.data.repository_selection,
+    total_count: totalCount,
+    total_pages: totalPages,
+    total_repos: allRepos.length,
+    token_generation_ms: roundedTokenGenerationMs,
+    page_timings: pageTiming,
+    sample_repos: sampleRepos,


⚠️ Potential issue | 🟠 Major

Do not log private repo or branch names by default.

sample_repos and sample_branches can contain confidential names from private GitHub installations. Counts/timings are useful without exposing names; gate samples behind an explicit debug flag or emit hashes/counts only.

🔒 Proposed safer diagnostics

- const sampleRepos = allRepos.slice(0, 5).map((repo) => repo.fullName); + const sampleRepoCount = Math.min(allRepos.length, 5); @@ - sample_repos: sampleRepos, + sample_repo_count: sampleRepoCount, @@ - const branchSample = branches.slice(0, 10).map((branch) => branch.name); + const sampleBranchCount = Math.min(branches.length, 10); @@ - sample_branches: branchSample, + sample_branch_count: sampleBranchCount,

Also applies to: 646-655

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/control-plane/src/auth/github-app.ts` around lines 495 - 510, The log currently emits plain sample_repos (and similarly sample_branches elsewhere) which can leak private names; change the logger.info call in github.installation_repos.listed to avoid raw names by default: remove sample_repos/sample_branches and instead emit counts (e.g., total_repos_sampled: sampleRepos.length) and/or deterministic non-reversible fingerprints (e.g., sha256 hashes) only when an explicit opt-in debug flag is set (e.g., config.emitRepoSamples or logger.isDebugEnabled). Update the code that builds sampleRepos/sampleBranches (used around the logger.info block and the block at lines ~646-655) to compute either hashed values or skip populating them unless the debug flag is true, and ensure the public keys in installationContext remain unchanged.

coderabbitai · 2026-04-22T22:41:39Z

    logger.info("Repo fetch completed", {
      trace_id: traceId,
      total_repos: repos.length,
+      sample_repos: repos.slice(0, 5).map((repo) => repo.fullName),
    });
+    if (repos.length === 0) {
+      logger.warn("Repo fetch returned no repositories", {
+        trace_id: traceId,
+      });
+    }


⚠️ Potential issue | 🟠 Major

Avoid duplicating private repo and branch names in route logs.

The new sample_repos / sample_branches fields can leak private repository and branch names to centralized logs. Since counts and cache state already diagnose empty/stale behavior, prefer sample counts or hashes unless explicit debug logging is enabled.

🔒 Proposed safer fields

- sample_repos: repos.slice(0, 5).map((repo) => repo.fullName), + sample_repo_count: Math.min(repos.length, 5), @@ - sample_repos: cached.repos.slice(0, 5).map((repo) => repo.fullName), + sample_repo_count: Math.min(cached.repos.length, 5), @@ - sample_repos: repos.slice(0, 5).map((repo) => repo.fullName), + sample_repo_count: Math.min(repos.length, 5), @@ - sample_repos: enrichedRepos.slice(0, 5).map((repo) => repo.fullName), + sample_repo_count: Math.min(enrichedRepos.length, 5), @@ - sample_branches: branches.slice(0, 10).map((branch) => branch.name), + sample_branch_count: Math.min(branches.length, 10),

Also applies to: 145-158, 192-201, 236-247, 352-365

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/control-plane/src/routes/repos.ts` around lines 51 - 60, The logs currently include sensitive repo and branch names via the sample_repos and sample_branches fields (see logger.info usages that map repos.slice(0,5).map(repo => repo.fullName) and similar branch mappings); change these logger payloads to avoid plain names by emitting only counts, boolean cache/stale flags, or non-reversible hashes (or indices) instead, and ensure any name-level sampling is gated behind an explicit debug flag (e.g., if (isDebug) { ... }) so production logging of logger.info with traceId and repos/branches uses safe fields like total_repos, sample_count, or hashed_sample_ids rather than repo.fullName or branch names.

coderabbitai · 2026-04-22T22:41:39Z

+  const requester = session.user.login ?? session.user.email ?? "unknown";
+
+  console.info("[branches] fetching branches from control plane", {
+    requester,
+    repo: `${owner}/${name}`,
+  });


⚠️ Potential issue | 🟠 Major

Redact requester and branch-name samples from web logs.

requester can fall back to a full email, and sampleBranches exposes private branch names. Keep the useful diagnostics while logging requester type/counts instead of raw identifiers and names.

🔒 Proposed safer payloads

- const requester = session.user.login ?? session.user.email ?? "unknown"; + const requesterType = session.user.login ? "github_login" : session.user.email ? "email" : "unknown"; @@ - requester, + requesterType, repo: `${owner}/${name}`, @@ - requester, + requesterType, repo: `${owner}/${name}`, branchCount: branches.length, - sampleBranches: branches.slice(0, 10).map((branch) => branch.name), + sampleBranchCount: Math.min(branches.length, 10), @@ - requester, + requesterType, repo: `${owner}/${name}`, @@ - requester, + requesterType, repo: `${owner}/${name}`,

Also applies to: 41-59

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/web/src/app/api/repos/`[owner]/[name]/branches/route.ts around lines 18 - 23, The logs currently emit raw identifiers (requester) and branch names (sampleBranches); update the logging in the branches route to redact those values: derive a requester type label from session.user (e.g., "github_user", "email_user", "unknown") instead of logging the actual login/email, and when logging branch info from sampleBranches/log payloads only emit safe diagnostics such as branch count and either masked branch-name samples or a hash/first-character-and-length summary (no full names). Locate and change the logging around the requester variable and any uses of sampleBranches in the route handler (the requester variable and the branch-sampling/logging block referenced in the file) to implement these redactions while preserving counts and context.

coderabbitai · 2026-04-22T22:41:40Z

+  const requester = session.user?.login ?? session.user?.email ?? "unknown";
+
+  console.info("[repos] fetching repositories from control plane", {
+    requester,
+  });


⚠️ Potential issue | 🟠 Major

Avoid logging requester identifiers and private repo samples.

requester can be a full email, and sampleRepos can expose private repository names. Use requester type plus counts/cache metadata, or hash identifiers if correlation is required.

🔒 Proposed safer payloads

- const requester = session.user?.login ?? session.user?.email ?? "unknown"; + const requesterType = session.user?.login ? "github_login" : session.user?.email ? "email" : "unknown"; @@ - requester, + requesterType, @@ - requester, + requesterType, repoCount: data.repos.length, cached: data.cached, cachedAt: data.cachedAt, - sampleRepos: data.repos.slice(0, 5).map((repo) => repo.fullName), + sampleRepoCount: Math.min(data.repos.length, 5), @@ - requester, + requesterType, cached: data.cached,

Also applies to: 52-65

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/web/src/app/api/repos/route.ts` around lines 20 - 24, The current console.info calls log sensitive identifiers (requester) and repository names (sampleRepos); update the logging in the route to avoid PII by replacing direct requester values with a non-identifying attribute (e.g., requesterType or a hashed/salted identifier) and by replacing sampleRepos payloads with safe metadata (e.g., sampleRepos.length, cache status, and optionally anonymized or hashed repo IDs) in the console.info statements that reference requester and sampleRepos; locate and change the usages of the variables requester and sampleRepos and any console.info/console.log that emits them so logs contain only counts, cache metadata, or hashed identifiers instead of raw emails/repo names.

coderabbitai · 2026-04-22T22:41:40Z

+      console.info("[auth] signIn access check", {
+        githubUsername: githubUsername ?? null,
+        emailDomain: emailDomain ?? null,
+        allowedUsers: config.allowedUsers,
+        allowedDomains: config.allowedDomains,
+        unsafeAllowAllUsers: config.unsafeAllowAllUsers,
+        usernameMatch,
+        domainMatch,
+        allowAllMode,
+        isAllowed,
+      });
+
      if (!isAllowed) {
+        console.warn("[auth] signIn denied", {
+          githubUsername: githubUsername ?? null,
+          emailDomain: emailDomain ?? null,
+          allowedUsers: config.allowedUsers,
+          allowedDomains: config.allowedDomains,
+          unsafeAllowAllUsers: config.unsafeAllowAllUsers,
+        });


⚠️ Potential issue | 🟠 Major

Avoid logging user identifiers and allowlist contents.

These logs emit githubUsername, allowedUsers, and allowedDomains on every sign-in attempt. That exposes user identifiers and access-policy details to log sinks; keep the diagnostic booleans/counts instead, or hash identifiers before logging.

🔒 Proposed safer log payload

console.info("[auth] signIn access check", { - githubUsername: githubUsername ?? null, - emailDomain: emailDomain ?? null, - allowedUsers: config.allowedUsers, - allowedDomains: config.allowedDomains, + hasGithubUsername: Boolean(githubUsername), + hasEmailDomain: Boolean(emailDomain), + allowedUserCount: config.allowedUsers.length, + allowedDomainCount: config.allowedDomains.length, unsafeAllowAllUsers: config.unsafeAllowAllUsers, usernameMatch, domainMatch, @@ if (!isAllowed) { console.warn("[auth] signIn denied", { - githubUsername: githubUsername ?? null, - emailDomain: emailDomain ?? null, - allowedUsers: config.allowedUsers, - allowedDomains: config.allowedDomains, + hasGithubUsername: Boolean(githubUsername), + hasEmailDomain: Boolean(emailDomain), + allowedUserCount: config.allowedUsers.length, + allowedDomainCount: config.allowedDomains.length, unsafeAllowAllUsers: config.unsafeAllowAllUsers, });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/web/src/lib/auth.ts` around lines 64 - 83, The logs in the sign-in access check and denial paths in auth.ts are emitting sensitive identifiers (githubUsername, emailDomain) and full allowlist contents (allowedUsers, allowedDomains); update the signIn logging to avoid raw identifiers: replace githubUsername/emailDomain with either boolean flags (e.g., usernameMatch, domainMatch) or hashed/masked values and replace allowedUsers/allowedDomains with counts (e.g., allowedUsersCount, allowedDomainsCount) or presence flags; apply the same change to the warning block that logs sign-in denial and keep other diagnostic fields (unsafeAllowAllUsers, allowAllMode, isAllowed) intact. Use the variables seen in this diff (githubUsername, emailDomain, allowedUsers, allowedDomains, unsafeAllowAllUsers, usernameMatch, domainMatch, allowAllMode, isAllowed) as the targets to modify the logged payloads.

coderabbitai · 2026-04-22T22:41:40Z

+require_command jq
+require_command npx
+
+WRANGLER=(npx wrangler kv key)


🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Package manager files:" fd -d 1 '^(package.json|pnpm-lock.yaml|package-lock.json|yarn.lock)$' -x ls -l {} echo echo "Wrangler declarations:" rg -n -C2 '"wrangler"\s*:' package.json pnpm-lock.yaml package-lock.json yarn.lock 2>/dev/null || true

Repository: ColeMurray/background-agents

Length of output: 2534

Use a repo-pinned Wrangler for destructive KV deletes.

Wrangler is pinned to ^4.0.0 in the project, but using npx wrangler can still download an unreviewed version in CI or different environments. For this cache-clearing script, use npx --no-install wrangler to ensure the command fails rather than silently using a different version.

🔧 Proposed fix

require_command jq require_command npx -WRANGLER=(npx wrangler kv key) +WRANGLER=(npx --no-install wrangler kv key)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

require_command jq

require_command npx

WRANGLER=(npx wrangler kv key)

require_command jq

require_command npx

WRANGLER=(npx --no-install wrangler kv key)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/clear-cloudflare-repos-cache.sh` around lines 88 - 91, The script currently builds WRANGLER using unpinned runtime calls (WRANGLER=(npx wrangler kv key)), which can pull an unexpected version; change the invocation to use npx --no-install so the CI fails if the pinned wrangler isn't available (update the WRANGLER definition symbol to use npx --no-install wrangler kv key), and keep require_command npx to ensure npx is present; no other logic changes required.

chore: add github repo diagnostics

c394d02

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: add GitHub repo diagnostics#543

chore: add GitHub repo diagnostics#543
Zeyuzhao wants to merge 1 commit intoColeMurray:mainfrom
Zeyuzhao:github-repo-diagnostics

Zeyuzhao commented Apr 22, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Uh oh!

coderabbitai Bot Apr 22, 2026

Uh oh!

coderabbitai Bot Apr 22, 2026

Uh oh!

coderabbitai Bot Apr 22, 2026

Uh oh!

coderabbitai Bot Apr 22, 2026

Uh oh!

coderabbitai Bot Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Zeyuzhao commented Apr 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Zeyuzhao commented Apr 22, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading