fix(metrics): include all-branch commits in contributor table#73
Conversation
There was a problem hiding this comment.
Copilot review summary:
No blocking issues found.
Validated behavior:
- Contributor counting now traverses all branches per repository.
- Commit SHA de-duplication prevents double-counting commits visible on multiple branches.
- README generation now explicitly enables all-branch counting for TOP contributors.
Residual risk:
- API usage is higher than default-branch-only mode and may increase rate-limit pressure on large orgs.
There was a problem hiding this comment.
Pull request overview
Updates the contributor metrics generation so the “TOP contributors” table counts commits across all branches (with SHA de-duplication), and wires that mode into the profile README rendering.
Changes:
- Add branch enumeration plus per-branch commit collection and cross-branch SHA de-duplication for contributor counts.
- Introduce a JSON state/cache mechanism to reuse previously collected branch commit SHAs and metadata.
- Enable
include_all_branches=Truewhen rendering the TOP contributors table inprofile/README.qmd.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/make_readme/get_per_user_commits.py |
Implements all-branch commit aggregation with SHA de-duplication and adds a persisted cache/state file to speed subsequent runs. |
profile/README.qmd |
Switches TOP contributors rendering to include_all_branches=True. |
| STATE_SCHEMA_VERSION = 1 | ||
| STATE_FILE_PATH = ( | ||
| Path(__file__).resolve().parents[2] | ||
| / "profile" | ||
| / "activity_data" | ||
| / "per_user_commits_state.json" | ||
| ) |
| # Default to cached values when branch tip has not changed. | ||
| branch_shas = cached_shas | ||
| branch_tip_to_store = branch_tip_sha | ||
|
|
||
| if not cached_tip_sha or cached_tip_sha != branch_tip_sha: |
| if author not in eligible_members: | ||
| continue | ||
|
|
||
| commit_date = parse_commit_date(commit_date_str) |
| f"https://api.github.com/repos/{repo_full_name}/commits" | ||
| f"?sha={branch_name}&per_page=100&page={page}", | ||
| headers=headers, |
|
@copilot please re-review this PR. Addressed all 4 inline findings:\n\n1. Cache/state path moved outside published profile artifacts.\n2. Commit metadata caching now stores all authors to avoid member-change undercount on cache hits.\n3. Cached date parsing is now guarded with warning+skip behavior.\n4. Branch commit requests now use URL-safe params for branch names.\n\nPlease take another pass. |
Re-reviewed. I confirmed all 4 inline findings are addressed in the current branch and did not find additional blocking issues in this pass. |
Update contributor counting to traverse all repository branches and deduplicate by commit SHA so merged commits are not double-counted. Enable all-branch counting in the profile README generation call for TOP contributors. _commit message is ai-generated_
for more information, see https://pre-commit.ci
Implement persistent state-backed contributor counting to reduce API load while keeping counts accurate. Adds branch tip tracking, compare-based delta processing, full-resync fallback for non-fast-forward histories, and configurable archived/fork repository filters. _commit message is ai-generated_
for more information, see https://pre-commit.ci
Add per-run summary logging for contributor counting so CI output shows cache hits, compare deltas, and full resync paths. This makes API-pressure behavior observable after enabling all-branch counting. _commit message is ai-generated_
Move cache state outside published profile artifacts, store full commit metadata for membership-change correctness, harden cached date parsing, and use URL-safe query params for branch commit requests. _commit message is ai-generated_
for more information, see https://pre-commit.ci
Changes
Issues
PR Checklist
(
Strikethroughany points that are not applicable.)Write unit tests for any new features, bug fixes, or other code changes.Update docs if there are any API changes.UpdateCHANGELOG.mdwith a short description of any user-facing changes and reference the PR number. Guidelines: https://keepachangelog.com/en/1.1.0/Generated using AI