Add scan command for multi-repo discovery and consolidated reporting#10
Add scan command for multi-repo discovery and consolidated reporting#10
Conversation
Walks one or more roots, finds every git repo, runs extract per repo in parallel, and emits a manifest plus optional consolidated HTML report. The report grows a per-repository breakdown surfacing how work distributes across repos — both in the team report and in the --email developer profile, where Files/commits/churn are filtered to that dev's contributions. - internal/scan: discovery, gitignore-style ignore matcher, worker pool. Two-pass slug naming makes <slug>.state stable across runs so resume keeps working when sibling repos share basenames. - internal/stats: commitEntry carries the LoadMultiJSONL pathPrefix; RepoBreakdown groups commits + files by repo with optional email filter (per-dev file count uses devCommits, not the repo total). - internal/report: Per-Repository Breakdown section in both the main and profile templates, gated on len(Repos) > 1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bare clones (foo.git dirs with HEAD/objects/refs at the root rather than inside .git/) weren't picked up — discovery only accepted working-tree layouts. Since mirrors, fixture clones, and GitHub-style server dirs all use this shape, the omission blocked the most common "point gitcortex scan at a folder of .git clones" validation path. isBareRepo requires all three canonical entries to avoid false positives from stray HEAD/ or refs/ directories. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
LoadMultiJSONL prefixes file paths with '<repo>:' so cross-repo aggregates can attribute files back to their source. The suspect detector's directory-glob suggestions inherited that prefix, which made the warning's remediation command a no-op: extract --ignore and scan --extract-ignore both expect bare, repo-relative globs. Strip the prefix in CollectAllSuggestions before dedup — same shape from different repos (dist/*, node_modules/*) now collapses to one canonical suggestion that works across every repo in the scan. The stripping is conservative: a colon only counts as a prefix delimiter when the preceding segment contains no slash or glob metachars, preserving the rare but valid case of a colon embedded in a real path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 02100f5c0a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The unconditional SkipDir on an ignored directory short-circuited the matcher's last-match-wins semantics: patterns like `vendor/` + `!vendor/keep` would prune vendor before vendor/keep could be examined, so intentionally re-included repositories were silently dropped from the scan. Matcher now exposes CouldReinclude(dir), which returns true iff any negation rule could fire on a descendant of dir — explicit child references (!vendor/keep) and basename-only negations (!keep) both force descent. Discovery descends instead of pruning whenever that check is true, letting per-path Match() decide. Unrelated exclusions keep pruning effective for the common case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ds.commits is keyed by SHA; when a scan ingests repos that share a commit (fork, mirror, cherry-pick) the later ingest overwrote the earlier one. RepoBreakdown iterated ds.commits directly, so the consolidated report silently under-counted commits and churn for one side of every collision — materially wrong percentages in exactly the multi-repo scenarios scan was built for. Track commits in a parallel map[repo][]*commitEntry populated at ingest. File-side lookups keep using ds.commits (one record per SHA is enough there); RepoBreakdown iterates the per-repo slice, which preserves every commit regardless of collisions. Regression test uses LoadMultiJSONL with two JSONLs that share a SHA — without the fix, one of the two repos disappears from the breakdown entirely. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
gitcortex/internal/stats/suspect.go
Line 149 in 5950a26
Multi-repo loads prefix each file path as <repo>:; matching the raw path here causes root-level heuristics to miss common cases like alpha:vendor/x.go and alpha:package-lock.json. In stats runs with multiple --input files, this can suppress suspect-path warnings and --ignore suggestions even when generated/vendor content dominates churn, reducing the accuracy of the guidance users rely on to clean noisy metrics.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
DetectSuspectFiles fed ds.files keys straight into the pattern predicates, but multi-repo loads prefix every path with <repo>:. Root-level vendor, package-lock.json, go.sum, and friends missed detection — `strings.HasPrefix(p, "vendor/")` and `p == "package-lock.json"` both fail when the string actually begins with `alpha:vendor/x.go` or equals `alpha:package-lock.json`. Nested occurrences like `repo:pkg/vendor/x.go` still matched via the `/vendor/` substring check, so the bug looked partial in the wild: heavy repos with nested generated content triggered the warning, while a Go project with vendor/ at the root under the same scan went silently unflagged and users kept chasing bus-factor ghosts from vendored code. Strip the `<repo>:` prefix once per file and feed the normalized path to both Match and Suggest. Nested paths are unaffected; single- repo loads (no prefix) pass through unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Per-Repository Breakdown section fed pctFloat's output (a string) into printf "%.0f". html/template's CSS sanitizer saw the resulting %!f(string=...) as unsafe and replaced every bar width with ZgotmplZ — the commit-share visualization rendered at zero width in every multi-repo report. PctOfTotalCommits is already a float64 on RepoStat, so printf it directly. Drop the $maxRepoCommits variable that was only guarding a zero-churn case already covered by PctOfTotalCommits returning 0. Regression test asserts no ZgotmplZ or %!f(string= markers appear in a multi-repo render. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 98efb10989
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Discover descended into ignored directories only to give negation rules a chance to re-include descendants, but after the descent decision it still ran the .git detection on the ignored parent. When the parent itself held a .git (e.g. `vendor/` with vendor/.git plus a sibling !vendor/keep rule), the walker wrongly recorded the ignored parent as a repo and then returned filepath.SkipDir from the repo branch — which cut off the very descent the negation relied on. Both outcomes violated the documented last-match-wins semantics: the ignored dir leaked into scan output, and the re-included child never got a chance to register. Return early as soon as an ignored dir is walked-into-for-negation. The .git detection is now reached only for dirs the matcher did not ignore; re-included descendants are examined in their own callback invocation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CouldReinclude only recognized negation rules whose pattern began with a literal `dir/` prefix. Globbed variants that can legitimately match descendants of an ignored directory — `!vendor*/keep`, `!*/keep`, `!*/vendor/keep` — returned false, so the walker pruned the parent and the re-included repos silently disappeared. Per-segment path.Match against the dir's segments handles both literal and glob first segments uniformly. A pattern is a candidate reinclude iff it has strictly more segments than dir and every leading segment matches the corresponding dir segment (path.Match treats `vendor*`, `*`, literals, etc. the same way). Basename-only and `**/`-prefixed negations keep their fire-anywhere semantics. Discovery test uses `vendor*/` + `!vendor*/keep` across vendor/ and vendor-old/ to prove both re-included repos are now kept while vendor/garbage stays out. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 26ac75e762
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The 6-hex (24-bit) hash suffix used for colliding basenames admits a tiny but real birthday-collision probability: two absolute paths whose SHA-1 digests share the first six hex chars would land on the same slug, and scan would silently write both repos' data to the same <slug>.jsonl and <slug>.state. Corruption of the exact kind the hash was added to prevent. Retry with a longer hash (up to the full 40 hex) whenever the proposed slug set still has duplicates. Happy path remains readable six-char suffixes; the rare collision grows until unique or panics past SHA-1 — the latter would be a cryptographic event, not a scan-time bug. Tests: 10k-duplicate property test (24-bit birthday expects ~3 collisions, forces retry path to fire naturally), determinism across re-runs of the same input, and a targeted test that builds two paths colliding at 6 hex and asserts the retry separates them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Discover took no context, so a ctx-cancelled scan still walked the entire tree before returning. On home-directory scans where the walk phase can be the longest leg of the whole command, Ctrl+C felt unresponsive and anyone running interactive scans saw seconds of post-cancel filesystem churn. Thread ctx into Discover; each filepath.WalkDir callback checks ctx.Err() before doing filesystem work and returns it to short-circuit the walk. A pre-cancelled context test asserts Discover returns context.Canceled with no repos instead of draining the tree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three additions that fill coverage holes called out in the self- assessment of this branch: 1. TestGenerate_PerRepoBreakdownWidthsAreNumeric now asserts at least one bar width renders above zero. The absence-of-error- markers check would previously pass a regression where every width silently collapsed to 0% (bars invisible but template still valid). 2. TestDiscover_AbortsMidWalk measures a baseline walk time, cancels at 25% of that interval, and asserts the cancelled walk returned ctx.Canceled, cut short before baseline, and produced fewer than the full repo count. Complements the existing pre- cancelled test, which only proved the first callback checks ctx. 3. TestGenerateProfile_MultiRepoBreakdownFiltersByEmail covers the scan --email --report flow end-to-end: two repos, one with a colleague-exclusive commit + file, assert the breakdown section renders, commit counts per repo reflect only the filtered dev, and colleague-only files do not bleed into the report. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
README gains a full `Scan: discover and aggregate every repo under a root` section covering the headline flows: consolidated team report, `--email` personal cross-repo profile, multi-root workspaces. Covers the output directory layout (slug.jsonl, slug.state, manifest.json), the `.gitcortex-ignore` syntax with negation examples, and the tuning flags forwarded to the consolidated report. HTML report section notes the new Per-Repository Breakdown that appears on multi-repo runs. Architecture tree picks up internal/scan and internal/stats/repo_breakdown. METRICS.md adds a Per-Repository Breakdown entry describing columns, how the repo label is derived from the LoadMultiJSONL path prefix, and the divergence-between-commits-and-churn signal. Vendor/generated warning section notes the prefix-stripping for multi-JSONL inputs so readers understand why suggestions come out repo-relative. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0947705215
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
scanCmd parsed --since only after scan.Run had already walked every root and extracted every discovered repo. An invalid value (`--since 1yy`, `--since nope`) cost minutes-to-hours of discovery and extraction on large workspaces before the CLI finally rejected the flag. Move parseSince up alongside the other flag validation so obvious typos fail instantly without doing any filesystem work. Test runs scanCmd against an empty TempDir with --since bogus and asserts two things: the error mentions --since (early validation fired), and it does NOT contain "no git repositories found" (the error scan.Run would have produced had it been reached). Together these prove scan.Run was not called. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
filepath.WalkDir has a well-known quirk: when the ROOT argument is a symlink it visits only the link entry (reported as ModeSymlink) and refuses to descend into the target. Users whose `~/work` is a symlink to `/mnt/data/work` — a common setup on machines with a separate data disk — would see discovery return zero repos and scan conclude "no git repositories found" despite the target being full of them. os.Stat already followed the link for the is-directory check, so the error path silently branched at WalkDir instead of surfacing a clear problem. Canonicalize the root with filepath.EvalSymlinks before stat'ing; the walk then starts at a real directory path. Symlinks ENCOUNTERED during the walk (as opposed to the root itself) are left as-is — we don't chase every link on disk, only the one the user explicitly named as a root. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8142695880
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
On macOS (APFS/HFS+ default) and Windows (NTFS default), Repo.jsonl and repo.jsonl resolve to the same file on disk. The case-sensitive counts[base] / seen[slug] checks treated two repos named Repo and repo as distinct, handed each the bare basename, and let the filesystem silently merge their JSONL and state files — one scan's output overwriting the other with no warning. Fold to lower case for both the duplicate count and the seen-set so case-only basename differences now trigger the hash-suffix branch. Emitted slugs keep their original case for readability; the collision check that decides filesystem safety runs on the lower-cased form, which is what the OS compares against. Test pairs /a/Team/Repo with /b/OSS/repo and asserts the slugs differ even under strings.EqualFold and both carry a hash suffix — without the fix, both would land on bare "Repo"/"repo" and collide on disk. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5bd94ab0d9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Codex Review: Didn't find any major issues. Bravo. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
If extract.Run panicked inside a worker goroutine (unexpected git binary output, nil deref from a future refactor, etc.) the goroutine would crash the entire scan process before its manifest.Repos[idx] slot was written. At best the repo stayed forever at Status="pending" — indistinguishable from "worker never dispatched this repo", which misled both operators and downstream automation. At worst, the process died mid-batch and the other in-flight extracts were lost. Wrap the per-job call in runJobSafely: defer-recover converts any panic into a ManifestRepo with Status="failed" and an Error prefixed with "panic:". The sibling workers keep pulling from the jobs channel, so the rest of the batch completes. extract.Run is not expected to panic under normal inputs — this is a defensive seam for the "impossible" cases the branch review flagged as the only remaining untested failure mode. Package-level runJobFn indirection lets the test override the worker body to force a deterministic panic without needing a broken git binary or a real extract bug to reproduce against. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The breakdown was rendered on both the team report and the --email profile report whenever the dataset spanned multiple repos. On reflection the team view adds little: "how is git history distributed across the repos in this scan" is nearly tautological and usually better answered by manifest.json or running stats per-repo. The section earns its place only when filtered by developer, where the same numbers turn into a real consolidation story: "where did this person spend their time across projects". Remove the section from the team template; keep it on the profile template where it was already filtered by email. Drop the unused ReportData.Repos field and the stats.RepoBreakdown call from Generate; ProfileReportData.Repos remains the single populated home for the metric. Team-report regression test now asserts the section does NOT render even on a multi-repo dataset; the profile test (unchanged) continues to cover the positive case with email filtering. README and METRICS.md rewritten to frame the metric as profile-only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The old `scan --report <file>` rendered one consolidated team report across every scanned repo, which mixed hotspots, bus factor, and coupling signal from unrelated codebases. A small `main.go` in a 100-commit repo could never surface beside a 50k-commit `wp-includes/` — the ranking was dominated by history size, not architectural significance. Replace the team-consolidation flow with --report-dir: for each successful repo in the manifest, load its JSONL as a standalone Dataset and render <dir>/<slug>.html via the existing report pipeline. Each HTML is equivalent to running `gitcortex report` against that one repo; no cross-repo mixing. Afterward emit <dir>/index.html with per-repo summary cards (commits, churn, devs, first→last date, status pill, relative-volume bar) linking into every successful report. Failed and pending entries surface inline so operators can spot them without opening the manifest. The only report flag that still takes a single file path is --report <file> combined with --email, which renders the consolidated profile (genuinely cross-repo signal: "where did this person spend their time"). --report alone is now a CLI error pointing at both alternatives. Tests cover both the validation branch and the end-to-end --report-dir flow: two real git repos produce two standalone HTMLs (asserted to carry no `<repo>:` path prefix, proving LoadJSONL and not LoadMultiJSONL was used) and an index with hrefs into each. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ndex slug Three MAJORs from the review of commit a0a6251: 1. Pending/skipped entries were counted as failed in both the index summary card and the CLI log line. A user-cancelled scan read as "N failed" — misleading because nothing actually broke. Separate pendingCount bucket, new ScanIndexData.PendingRepos field, and the summary card shows "(M failed) (K pending)" as distinct chips. Log line drops zero buckets so a clean run still says "N ok" instead of "N ok, 0 failed, 0 pending". 2. os.Create and reportpkg.Generate errors aborted the entire batch, discarding every HTML already written AND skipping the index. Policy now matches LoadJSONL: the entry is downgraded to Status=failed with an explanatory Error, best-effort os.Remove cleans up any partial file, and the loop keeps going. Operators get an index listing exactly which repos rendered and which didn't. 3. A repo literally named `index` would emit <dir>/index.html and then be overwritten by the landing page. Add a reservedSlugs registry to scan.discovery; isReservedSlug triggers the same hash branch as a duplicate-basename collision. Case-insensitive per the existing fold (macOS/Windows can't tell `Index` from `index` on default filesystems). Tests cover all three: mixed ok+failed+pending manifest asserts the correct counts appear in the index, slug collisions force the hash, and all-failed still emits index.html without tripping the MaxCommits==0 guard in the bar-width template expression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both pages now render the identical 'Generated by gitcortex · <date>' footer — same link attributes, same color and text-decoration, same border-top spacing. The duplicate date in the subtitle (removed) kept the top strip uncluttered for the only stable identifier that varied between index and repo pages: the scanned roots. Visually the index now reads as the sibling of every per-repo report it links to instead of a one-off landing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Shorten the H1 from "Scan Index" to "Index" and drop the "roots: ..." subtitle line. The roots were borderline-useful — the user picked them, so echoing them back added chrome without adding information. With the data summary cards carrying the interesting counts below, a terser heading reads cleaner. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two small subtractions driven by the same principle — don't repeat signal the surrounding layout already conveys: - Drop the "ok" status pill. The link itself plus the populated metric cells make status obvious for successful entries; showing a neutral pill on every card added noise. Failed and pending pills stay because those statuses swap the card body for an italic "No report available." and the pill is the only affordance marking which failure mode it was. - Drop the "N / M max" caption next to the relative-volume bar. The bar already visualizes the ratio, the "commits" metric cell already shows N in the same row, and the max is inferred by eyeballing the widest bar. Keeping the bar alone preserves comparison-at-a-glance without a third copy of the same number. Test updated to assert `status-ok` pill is absent while failed / pending pills remain. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The green .repo.ok border competed with the layout's default border-radius and made a table of healthy repos look noisier than it needed to. Red (failed) and amber (pending) borders stay as attention cues for the genuinely actionable states; the absence of a stripe on ok cards now reads as "nothing to flag" by contrast. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The dates cell used to show "2024-01-15 → 2026-04-18" raw, forcing operators scanning a list of 30 repos to do date math to pick out the abandoned ones. Add a recency chip above the date range: "today" / "Nd ago" / "Nmo ago" / "Ny ago", colored by freshness bucket — green for fresh (≤30d), gray for stable (≤1y), amber for stale (>1y). The raw dates stay as supporting detail for anyone who wants the exact range. HumanizeAgo is the caller entry point (uses time.Now); humanizeAgoAt takes an explicit reference time for deterministic tests. Table test pins the day/month/year transitions and the 365-day stable/stale boundary; render test asserts the chip class reaches the HTML and is absent from failed entries that have no dates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Seven small polish items surfaced by the post-recency review. All
live in or around the new scan-index landing page:
- Recency label at 360-364 days read as "12mo ago" under the
stable color. Clamp months at 11 so the label progression never
crosses into two-digit territory inside the still-stable band.
- Ok entry whose dataset had no dates (empty JSONL, rename-only
history) rendered an orphan arrow `<br>→ ` in the dates cell.
Guard the date range behind `{{if .LastCommitDate}}` so the cell
emits nothing instead of a lone glyph.
- Log line started every run with `0 ok` even when the count was
zero, drifting from the summary card's zero-suppression
convention. Build the log bucket list the same way — only
non-zero counts, and open with the first non-zero bucket.
- `Data.Roots` field and `.subtitle` CSS rule were dead after the
roots-subtitle removal commit. Drop both plus the now-unused
Roots assignment at the call site.
- Convert `reservedSlugs` from a package-level mutable map to a
`isReservedSlug` switch. Keeps the reserved set immutable — no
risk of a future test silently leaking an entry across the
process.
- Strengthen the recency render test: anchor the chip inside the
`.dates` wrapper so a refactor that moves the chip outside its
structural context fails the test instead of passing on the
CSS class alone.
- Extend the humanizeAgoAt table with 360/364-day "stable" cases
(pinning the new month clamp) and a sub-day-future case that
documents the "today" fallback for intra-day committer-date
skew.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "Index · N repositories" heading restated information the summary strip's "Repositories" card already surfaces one line below. Removing it lets the three-card summary (Repositories, Total commits, Unique devs) open the page — the counts users actually came to see. Dead `h1` CSS rule dropped at the same time. Test now anchors on the "Repositories" label in the summary strip, which survives layout tweaks better than a specific H1 string. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The index template shipped its own `.summary` / `.summary-card` classes with rules byte-for-byte identical to `.cards` / `.card` in the team and profile templates. Two copies of the same definition drift — a future restyle of "our card look" would land in one place and quietly skip the other. Rename both the CSS rules and the HTML usage to the shared `.cards` / `.card` names. No visual change; the three pages now share one vocabulary for the summary-strip pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The index inherited discovery's case-sensitive slug sort, which on
a real scan surfaced WordPress.git before gitcortex.git purely
because the uppercase W sorts before lowercase g. The order didn't
reflect anything an operator cares about.
Resort before rendering: failed first (actionable — and the red-
bordered "No report available" cards need to be impossible to miss),
then pending/skipped, then ok entries ranked by commit count desc
so the heaviest healthy repos open the tail. Slug asc breaks ties
in every bucket so identical-shaped scans still render identically.
Test with a mixed manifest {big=50c, broken=failed, pending, small=5c}
asserts the rendered order is broken → pending → big → small, i.e.
status bucket wins over alphabetical, commits win within the ok
bucket.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7666e57eb5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
--email alone used to fall through validation, reach the profile
branch after scan.Run completed, and call os.Create("") — surfacing
a cryptic `open : no such file or directory` error only AFTER the
slow multi-repo extract phase had already finished. On a home-dir
scan that's minutes to hours of work thrown away because of a
flag pair that was always going to fail.
Add the symmetric up-front check alongside the existing
--report-without-email rejection: --email requires --report <file>,
refuse with a clear message before scan.Run. Test uses an empty
TempDir to confirm the error surfaces without the "no git
repositories found" wrapper that would indicate discovery ran.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
humanizeAgoAt computed `days = int(now.Sub(t).Hours() / 24)`. Go's int conversion truncates toward zero for negative values, so a last-commit date 12 hours in the future (tomorrow's YYYY-MM-DD relative to a noon `now`) produced int(-0.5) == 0 — slipping past the `case days < 0` future-date guard and rendering as "today" / fresh. Quietly mislabeled clock-skewed commits as today's activity. Compare UTC date midnights directly: parse `lastDate` as midnight UTC, reduce `now` to its UTC midnight, return empty whenever `t.After(today)`. Same-day sub-day skew still falls into days==0 → "today" (documented day-granularity fallback). The "case days < 0" branch is kept defensively for refactor safety. Tests add tomorrow and two-days-future cases that trip the gap under the old arithmetic and now correctly produce empty labels. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 049cdb5463
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The working-tree check accepted any non-symlink `.git` entry, which
meant a random regular file called `.git` (source dump, build
artifact, a user accidentally creating the wrong thing) made the
parent directory register as a repo. The walker then SkipDir'd the
entire subtree, which had two bad outcomes:
1. Any real repo nested under that parent silently disappeared
from the scan — the landing index never listed it and
operators had no hint it was being skipped.
2. The fake parent hit the extract phase, failed with the
cryptic "not a git repository", and inflated the failed
count in the manifest with a bug that wasn't a repo problem
at all.
Factor the detection into isWorkingTreeEntry: accept directories
and regular files whose first 8 bytes are "gitdir: " (the linked-
worktree / submodule pointer format git actually writes), reject
everything else including irregular files and symlinks. The
symlink policy is unchanged — this just widens the validation to
catch the "arbitrary file named .git" case too.
Tests cover all four paths: dir (existing), gitdir pointer (new
positive), arbitrary file (new negative — also asserts the nested
real repo underneath IS discovered after the fix), and symlink
(existing).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 302d13f7ef
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
scan.runOne emits Status="skipped" when the worker picks up a job after ctx has already cancelled. The summary strip already folded skipped into PendingRepos, but the per-row renderer forwarded the raw status into `class="repo skipped"` and `status-skipped` — CSS selectors the template never defined, so skipped rows lost the amber border and status pill. A cancelled scan would show a "1 pending" card in the summary but N unstyled white rows below, hiding exactly the cancellation fallout that most needs triage. Fold skipped into pending at the entry boundary in renderScanReportDir so the row class names align with the summary counts. Rather than widen the template's CSS vocabulary to two near-identical rules, collapse the synonym at the one place where the internal status becomes user-visible chrome. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
scan --report --email hardcoded the profile's RepoName to
filepath.Base(roots[0]) — fine for single-root scans, misleading
for multi-root runs where the dataset actually spans every --root.
In practice: `gitcortex scan --root ~/work --root ~/oss ... --email
me@x.com` produced a report titled "· work", which readers
interpreted as belonging only to the first root when the stats
covered both.
profileScanLabel picks a label based on root cardinality:
- 1 root → basename (unchanged common case)
- 2-3 roots → joined with " + " so the scope is readable
- 4+ roots → "N roots" (joining many basenames bloats the H1
and the exact set lives in the scan log and manifest)
Smoke-tested on a 50-root microservice workspace: H1 now reads
"<dev> · 50 roots" instead of "<dev> · <first-ms-name>".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
|
Codex Review: Didn't find any major issues. Keep it up! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Two small perf wins bundled together — both touch the scan landing code path. --parallel default was hardcoded to 4. Fine for 2008-era laptops, conservative on anything modern — a 12-thread dev box was leaving 8 workers on the table whenever the operator forgot to pass the flag. Swap in defaultScanParallel(): runtime.NumCPU() floored at 2 (single-core CI still makes progress) and capped at 16 (git I/O stops scaling past ~8-16 on SSDs; avoids over-subscription on 64-core servers). renderScanReportDir's per-repo dev-set aggregation called stats.TopContributors(ds, 0) just to walk email strings. That function copies every ContributorStat by value and runs a full sort by commit count — both wasted work when the caller discards the struct and ordering. New stats.DevEmails(ds) returns only the email slice, unordered. Per-repo savings are ms-scale but cumulative on large scans. Tests: DefaultScanParallel asserts the [2, 16] invariant; DevEmails pins the set for the standard makeDataset fixture. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Walks one or more roots, finds every git repo, runs extract per repo in parallel, and emits a manifest plus optional consolidated HTML report. The report grows a per-repository breakdown surfacing how work distributes across repos — both in the team report and in the --email developer profile, where Files/commits/churn are filtered to that dev's contributions.