Skip to content

Add scan command for multi-repo discovery and consolidated reporting#10

Merged
lex0c merged 37 commits intomainfrom
feat/scan-multi-repo
Apr 21, 2026
Merged

Add scan command for multi-repo discovery and consolidated reporting#10
lex0c merged 37 commits intomainfrom
feat/scan-multi-repo

Conversation

@lex0c
Copy link
Copy Markdown
Owner

@lex0c lex0c commented Apr 20, 2026

Walks one or more roots, finds every git repo, runs extract per repo in parallel, and emits a manifest plus optional consolidated HTML report. The report grows a per-repository breakdown surfacing how work distributes across repos — both in the team report and in the --email developer profile, where Files/commits/churn are filtered to that dev's contributions.

  • internal/scan: discovery, gitignore-style ignore matcher, worker pool. Two-pass slug naming makes .state stable across runs so resume keeps working when sibling repos share basenames.
  • internal/stats: commitEntry carries the LoadMultiJSONL pathPrefix; RepoBreakdown groups commits + files by repo with optional email filter (per-dev file count uses devCommits, not the repo total).
  • internal/report: Per-Repository Breakdown section in both the main and profile templates, gated on len(Repos) > 1.

lex0c and others added 2 commits April 20, 2026 11:49
Walks one or more roots, finds every git repo, runs extract per repo
in parallel, and emits a manifest plus optional consolidated HTML
report. The report grows a per-repository breakdown surfacing how
work distributes across repos — both in the team report and in the
--email developer profile, where Files/commits/churn are filtered
to that dev's contributions.

- internal/scan: discovery, gitignore-style ignore matcher, worker
  pool. Two-pass slug naming makes <slug>.state stable across runs
  so resume keeps working when sibling repos share basenames.
- internal/stats: commitEntry carries the LoadMultiJSONL pathPrefix;
  RepoBreakdown groups commits + files by repo with optional email
  filter (per-dev file count uses devCommits, not the repo total).
- internal/report: Per-Repository Breakdown section in both the main
  and profile templates, gated on len(Repos) > 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bare clones (foo.git dirs with HEAD/objects/refs at the root rather
than inside .git/) weren't picked up — discovery only accepted
working-tree layouts. Since mirrors, fixture clones, and
GitHub-style server dirs all use this shape, the omission blocked
the most common "point gitcortex scan at a folder of .git clones"
validation path.

isBareRepo requires all three canonical entries to avoid false
positives from stray HEAD/ or refs/ directories.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lex0c
Copy link
Copy Markdown
Owner Author

lex0c commented Apr 20, 2026

@codex review

LoadMultiJSONL prefixes file paths with '<repo>:' so cross-repo
aggregates can attribute files back to their source. The suspect
detector's directory-glob suggestions inherited that prefix, which
made the warning's remediation command a no-op: extract --ignore
and scan --extract-ignore both expect bare, repo-relative globs.

Strip the prefix in CollectAllSuggestions before dedup — same
shape from different repos (dist/*, node_modules/*) now collapses
to one canonical suggestion that works across every repo in the
scan. The stripping is conservative: a colon only counts as a
prefix delimiter when the preceding segment contains no slash or
glob metachars, preserving the rare but valid case of a colon
embedded in a real path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 02100f5c0a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/scan/discovery.go Outdated
Comment thread internal/stats/repo_breakdown.go Outdated
lex0c and others added 2 commits April 20, 2026 12:13
The unconditional SkipDir on an ignored directory short-circuited
the matcher's last-match-wins semantics: patterns like `vendor/` +
`!vendor/keep` would prune vendor before vendor/keep could be
examined, so intentionally re-included repositories were silently
dropped from the scan.

Matcher now exposes CouldReinclude(dir), which returns true iff any
negation rule could fire on a descendant of dir — explicit child
references (!vendor/keep) and basename-only negations (!keep) both
force descent. Discovery descends instead of pruning whenever that
check is true, letting per-path Match() decide. Unrelated
exclusions keep pruning effective for the common case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ds.commits is keyed by SHA; when a scan ingests repos that share a
commit (fork, mirror, cherry-pick) the later ingest overwrote the
earlier one. RepoBreakdown iterated ds.commits directly, so the
consolidated report silently under-counted commits and churn for one
side of every collision — materially wrong percentages in exactly
the multi-repo scenarios scan was built for.

Track commits in a parallel map[repo][]*commitEntry populated at
ingest. File-side lookups keep using ds.commits (one record per SHA
is enough there); RepoBreakdown iterates the per-repo slice, which
preserves every commit regardless of collisions.

Regression test uses LoadMultiJSONL with two JSONLs that share a
SHA — without the fix, one of the two repos disappears from the
breakdown entirely.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lex0c
Copy link
Copy Markdown
Owner Author

lex0c commented Apr 20, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

if !pat.Match(path) {

P2 Badge Normalize repo prefixes before suspect-pattern matching

Multi-repo loads prefix each file path as <repo>:; matching the raw path here causes root-level heuristics to miss common cases like alpha:vendor/x.go and alpha:package-lock.json. In stats runs with multiple --input files, this can suppress suspect-path warnings and --ignore suggestions even when generated/vendor content dominates churn, reducing the accuracy of the guidance users rely on to clean noisy metrics.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/report/template.go Outdated
lex0c and others added 2 commits April 20, 2026 12:29
DetectSuspectFiles fed ds.files keys straight into the pattern
predicates, but multi-repo loads prefix every path with <repo>:.
Root-level vendor, package-lock.json, go.sum, and friends missed
detection — `strings.HasPrefix(p, "vendor/")` and
`p == "package-lock.json"` both fail when the string actually begins
with `alpha:vendor/x.go` or equals `alpha:package-lock.json`.

Nested occurrences like `repo:pkg/vendor/x.go` still matched via the
`/vendor/` substring check, so the bug looked partial in the wild:
heavy repos with nested generated content triggered the warning,
while a Go project with vendor/ at the root under the same scan went
silently unflagged and users kept chasing bus-factor ghosts from
vendored code.

Strip the `<repo>:` prefix once per file and feed the normalized
path to both Match and Suggest. Nested paths are unaffected; single-
repo loads (no prefix) pass through unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Per-Repository Breakdown section fed pctFloat's output (a
string) into printf "%.0f". html/template's CSS sanitizer saw the
resulting %!f(string=...) as unsafe and replaced every bar width
with ZgotmplZ — the commit-share visualization rendered at zero
width in every multi-repo report.

PctOfTotalCommits is already a float64 on RepoStat, so printf it
directly. Drop the $maxRepoCommits variable that was only guarding
a zero-churn case already covered by PctOfTotalCommits returning 0.

Regression test asserts no ZgotmplZ or %!f(string= markers appear
in a multi-repo render.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lex0c
Copy link
Copy Markdown
Owner Author

lex0c commented Apr 20, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 98efb10989

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/scan/discovery.go Outdated
Comment thread internal/scan/ignore.go
lex0c and others added 2 commits April 20, 2026 12:48
Discover descended into ignored directories only to give negation
rules a chance to re-include descendants, but after the descent
decision it still ran the .git detection on the ignored parent.
When the parent itself held a .git (e.g. `vendor/` with vendor/.git
plus a sibling !vendor/keep rule), the walker wrongly recorded the
ignored parent as a repo and then returned filepath.SkipDir from
the repo branch — which cut off the very descent the negation
relied on. Both outcomes violated the documented last-match-wins
semantics: the ignored dir leaked into scan output, and the
re-included child never got a chance to register.

Return early as soon as an ignored dir is walked-into-for-negation.
The .git detection is now reached only for dirs the matcher did not
ignore; re-included descendants are examined in their own callback
invocation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CouldReinclude only recognized negation rules whose pattern began
with a literal `dir/` prefix. Globbed variants that can legitimately
match descendants of an ignored directory — `!vendor*/keep`,
`!*/keep`, `!*/vendor/keep` — returned false, so the walker pruned
the parent and the re-included repos silently disappeared.

Per-segment path.Match against the dir's segments handles both
literal and glob first segments uniformly. A pattern is a candidate
reinclude iff it has strictly more segments than dir and every
leading segment matches the corresponding dir segment (path.Match
treats `vendor*`, `*`, literals, etc. the same way). Basename-only
and `**/`-prefixed negations keep their fire-anywhere semantics.

Discovery test uses `vendor*/` + `!vendor*/keep` across vendor/ and
vendor-old/ to prove both re-included repos are now kept while
vendor/garbage stays out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lex0c
Copy link
Copy Markdown
Owner Author

lex0c commented Apr 20, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 26ac75e762

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/scan/discovery.go Outdated
Comment thread internal/scan/scan.go Outdated
lex0c and others added 4 commits April 20, 2026 13:35
The 6-hex (24-bit) hash suffix used for colliding basenames admits
a tiny but real birthday-collision probability: two absolute paths
whose SHA-1 digests share the first six hex chars would land on the
same slug, and scan would silently write both repos' data to the
same <slug>.jsonl and <slug>.state. Corruption of the exact kind
the hash was added to prevent.

Retry with a longer hash (up to the full 40 hex) whenever the
proposed slug set still has duplicates. Happy path remains
readable six-char suffixes; the rare collision grows until unique
or panics past SHA-1 — the latter would be a cryptographic event,
not a scan-time bug.

Tests: 10k-duplicate property test (24-bit birthday expects ~3
collisions, forces retry path to fire naturally), determinism
across re-runs of the same input, and a targeted test that builds
two paths colliding at 6 hex and asserts the retry separates them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Discover took no context, so a ctx-cancelled scan still walked the
entire tree before returning. On home-directory scans where the walk
phase can be the longest leg of the whole command, Ctrl+C felt
unresponsive and anyone running interactive scans saw seconds of
post-cancel filesystem churn.

Thread ctx into Discover; each filepath.WalkDir callback checks
ctx.Err() before doing filesystem work and returns it to
short-circuit the walk. A pre-cancelled context test asserts
Discover returns context.Canceled with no repos instead of draining
the tree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three additions that fill coverage holes called out in the self-
assessment of this branch:

1. TestGenerate_PerRepoBreakdownWidthsAreNumeric now asserts at
   least one bar width renders above zero. The absence-of-error-
   markers check would previously pass a regression where every
   width silently collapsed to 0% (bars invisible but template
   still valid).

2. TestDiscover_AbortsMidWalk measures a baseline walk time,
   cancels at 25% of that interval, and asserts the cancelled walk
   returned ctx.Canceled, cut short before baseline, and produced
   fewer than the full repo count. Complements the existing pre-
   cancelled test, which only proved the first callback checks
   ctx.

3. TestGenerateProfile_MultiRepoBreakdownFiltersByEmail covers the
   scan --email --report flow end-to-end: two repos, one with a
   colleague-exclusive commit + file, assert the breakdown section
   renders, commit counts per repo reflect only the filtered
   dev, and colleague-only files do not bleed into the report.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
README gains a full `Scan: discover and aggregate every repo under a
root` section covering the headline flows: consolidated team report,
`--email` personal cross-repo profile, multi-root workspaces. Covers
the output directory layout (slug.jsonl, slug.state, manifest.json),
the `.gitcortex-ignore` syntax with negation examples, and the
tuning flags forwarded to the consolidated report. HTML report
section notes the new Per-Repository Breakdown that appears on
multi-repo runs. Architecture tree picks up internal/scan and
internal/stats/repo_breakdown.

METRICS.md adds a Per-Repository Breakdown entry describing
columns, how the repo label is derived from the LoadMultiJSONL
path prefix, and the divergence-between-commits-and-churn signal.
Vendor/generated warning section notes the prefix-stripping for
multi-JSONL inputs so readers understand why suggestions come out
repo-relative.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lex0c
Copy link
Copy Markdown
Owner Author

lex0c commented Apr 20, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0947705215

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread cmd/gitcortex/main.go Outdated
Comment thread internal/scan/discovery.go
lex0c and others added 2 commits April 20, 2026 14:00
scanCmd parsed --since only after scan.Run had already walked every
root and extracted every discovered repo. An invalid value
(`--since 1yy`, `--since nope`) cost minutes-to-hours of discovery
and extraction on large workspaces before the CLI finally rejected
the flag. Move parseSince up alongside the other flag validation so
obvious typos fail instantly without doing any filesystem work.

Test runs scanCmd against an empty TempDir with --since bogus and
asserts two things: the error mentions --since (early validation
fired), and it does NOT contain "no git repositories found" (the
error scan.Run would have produced had it been reached). Together
these prove scan.Run was not called.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
filepath.WalkDir has a well-known quirk: when the ROOT argument is
a symlink it visits only the link entry (reported as ModeSymlink)
and refuses to descend into the target. Users whose `~/work` is a
symlink to `/mnt/data/work` — a common setup on machines with a
separate data disk — would see discovery return zero repos and
scan conclude "no git repositories found" despite the target being
full of them.

os.Stat already followed the link for the is-directory check, so
the error path silently branched at WalkDir instead of surfacing
a clear problem. Canonicalize the root with filepath.EvalSymlinks
before stat'ing; the walk then starts at a real directory path.
Symlinks ENCOUNTERED during the walk (as opposed to the root
itself) are left as-is — we don't chase every link on disk, only
the one the user explicitly named as a root.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lex0c
Copy link
Copy Markdown
Owner Author

lex0c commented Apr 20, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8142695880

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/scan/discovery.go Outdated
On macOS (APFS/HFS+ default) and Windows (NTFS default),
Repo.jsonl and repo.jsonl resolve to the same file on disk. The
case-sensitive counts[base] / seen[slug] checks treated two repos
named Repo and repo as distinct, handed each the bare basename,
and let the filesystem silently merge their JSONL and state files
— one scan's output overwriting the other with no warning.

Fold to lower case for both the duplicate count and the seen-set so
case-only basename differences now trigger the hash-suffix branch.
Emitted slugs keep their original case for readability; the
collision check that decides filesystem safety runs on the
lower-cased form, which is what the OS compares against.

Test pairs /a/Team/Repo with /b/OSS/repo and asserts the slugs
differ even under strings.EqualFold and both carry a hash suffix —
without the fix, both would land on bare "Repo"/"repo" and collide
on disk.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lex0c
Copy link
Copy Markdown
Owner Author

lex0c commented Apr 20, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5bd94ab0d9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread cmd/gitcortex/main.go Outdated
Comment thread internal/scan/ignore.go Outdated
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Bravo.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

lex0c and others added 13 commits April 20, 2026 17:43
If extract.Run panicked inside a worker goroutine (unexpected git
binary output, nil deref from a future refactor, etc.) the
goroutine would crash the entire scan process before its
manifest.Repos[idx] slot was written. At best the repo stayed
forever at Status="pending" — indistinguishable from "worker
never dispatched this repo", which misled both operators and
downstream automation. At worst, the process died mid-batch and
the other in-flight extracts were lost.

Wrap the per-job call in runJobSafely: defer-recover converts any
panic into a ManifestRepo with Status="failed" and an Error
prefixed with "panic:". The sibling workers keep pulling from the
jobs channel, so the rest of the batch completes. extract.Run is
not expected to panic under normal inputs — this is a defensive
seam for the "impossible" cases the branch review flagged as the
only remaining untested failure mode.

Package-level runJobFn indirection lets the test override the
worker body to force a deterministic panic without needing a
broken git binary or a real extract bug to reproduce against.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The breakdown was rendered on both the team report and the --email
profile report whenever the dataset spanned multiple repos. On
reflection the team view adds little: "how is git history distributed
across the repos in this scan" is nearly tautological and usually
better answered by manifest.json or running stats per-repo. The
section earns its place only when filtered by developer, where the
same numbers turn into a real consolidation story: "where did this
person spend their time across projects".

Remove the section from the team template; keep it on the profile
template where it was already filtered by email. Drop the unused
ReportData.Repos field and the stats.RepoBreakdown call from
Generate; ProfileReportData.Repos remains the single populated home
for the metric. Team-report regression test now asserts the section
does NOT render even on a multi-repo dataset; the profile test
(unchanged) continues to cover the positive case with email
filtering. README and METRICS.md rewritten to frame the metric as
profile-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The old `scan --report <file>` rendered one consolidated team report
across every scanned repo, which mixed hotspots, bus factor, and
coupling signal from unrelated codebases. A small `main.go` in a
100-commit repo could never surface beside a 50k-commit `wp-includes/`
— the ranking was dominated by history size, not architectural
significance.

Replace the team-consolidation flow with --report-dir: for each
successful repo in the manifest, load its JSONL as a standalone
Dataset and render <dir>/<slug>.html via the existing report
pipeline. Each HTML is equivalent to running `gitcortex report`
against that one repo; no cross-repo mixing. Afterward emit
<dir>/index.html with per-repo summary cards (commits, churn, devs,
first→last date, status pill, relative-volume bar) linking into
every successful report. Failed and pending entries surface inline
so operators can spot them without opening the manifest.

The only report flag that still takes a single file path is
--report <file> combined with --email, which renders the consolidated
profile (genuinely cross-repo signal: "where did this person spend
their time"). --report alone is now a CLI error pointing at both
alternatives.

Tests cover both the validation branch and the end-to-end --report-dir
flow: two real git repos produce two standalone HTMLs (asserted to
carry no `<repo>:` path prefix, proving LoadJSONL and not
LoadMultiJSONL was used) and an index with hrefs into each.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ndex slug

Three MAJORs from the review of commit a0a6251:

1. Pending/skipped entries were counted as failed in both the index
   summary card and the CLI log line. A user-cancelled scan read as
   "N failed" — misleading because nothing actually broke. Separate
   pendingCount bucket, new ScanIndexData.PendingRepos field, and
   the summary card shows "(M failed) (K pending)" as distinct
   chips. Log line drops zero buckets so a clean run still says
   "N ok" instead of "N ok, 0 failed, 0 pending".

2. os.Create and reportpkg.Generate errors aborted the entire
   batch, discarding every HTML already written AND skipping the
   index. Policy now matches LoadJSONL: the entry is downgraded to
   Status=failed with an explanatory Error, best-effort os.Remove
   cleans up any partial file, and the loop keeps going. Operators
   get an index listing exactly which repos rendered and which
   didn't.

3. A repo literally named `index` would emit <dir>/index.html and
   then be overwritten by the landing page. Add a reservedSlugs
   registry to scan.discovery; isReservedSlug triggers the same
   hash branch as a duplicate-basename collision. Case-insensitive
   per the existing fold (macOS/Windows can't tell `Index` from
   `index` on default filesystems).

Tests cover all three: mixed ok+failed+pending manifest asserts the
correct counts appear in the index, slug collisions force the hash,
and all-failed still emits index.html without tripping the
MaxCommits==0 guard in the bar-width template expression.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both pages now render the identical 'Generated by gitcortex · <date>'
footer — same link attributes, same color and text-decoration, same
border-top spacing. The duplicate date in the subtitle (removed)
kept the top strip uncluttered for the only stable identifier that
varied between index and repo pages: the scanned roots. Visually
the index now reads as the sibling of every per-repo report it
links to instead of a one-off landing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Shorten the H1 from "Scan Index" to "Index" and drop the
"roots: ..." subtitle line. The roots were borderline-useful — the
user picked them, so echoing them back added chrome without
adding information. With the data summary cards carrying the
interesting counts below, a terser heading reads cleaner.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two small subtractions driven by the same principle — don't repeat
signal the surrounding layout already conveys:

- Drop the "ok" status pill. The link itself plus the populated
  metric cells make status obvious for successful entries;
  showing a neutral pill on every card added noise. Failed and
  pending pills stay because those statuses swap the card body
  for an italic "No report available." and the pill is the only
  affordance marking which failure mode it was.

- Drop the "N / M max" caption next to the relative-volume bar.
  The bar already visualizes the ratio, the "commits" metric cell
  already shows N in the same row, and the max is inferred by
  eyeballing the widest bar. Keeping the bar alone preserves
  comparison-at-a-glance without a third copy of the same
  number.

Test updated to assert `status-ok` pill is absent while failed /
pending pills remain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The green .repo.ok border competed with the layout's default
border-radius and made a table of healthy repos look noisier than
it needed to. Red (failed) and amber (pending) borders stay as
attention cues for the genuinely actionable states; the absence of
a stripe on ok cards now reads as "nothing to flag" by contrast.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The dates cell used to show "2024-01-15 → 2026-04-18" raw, forcing
operators scanning a list of 30 repos to do date math to pick out
the abandoned ones. Add a recency chip above the date range:
"today" / "Nd ago" / "Nmo ago" / "Ny ago", colored by freshness
bucket — green for fresh (≤30d), gray for stable (≤1y), amber for
stale (>1y). The raw dates stay as supporting detail for anyone who
wants the exact range.

HumanizeAgo is the caller entry point (uses time.Now); humanizeAgoAt
takes an explicit reference time for deterministic tests. Table test
pins the day/month/year transitions and the 365-day stable/stale
boundary; render test asserts the chip class reaches the HTML and
is absent from failed entries that have no dates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Seven small polish items surfaced by the post-recency review. All
live in or around the new scan-index landing page:

- Recency label at 360-364 days read as "12mo ago" under the
  stable color. Clamp months at 11 so the label progression never
  crosses into two-digit territory inside the still-stable band.

- Ok entry whose dataset had no dates (empty JSONL, rename-only
  history) rendered an orphan arrow `<br>→ ` in the dates cell.
  Guard the date range behind `{{if .LastCommitDate}}` so the cell
  emits nothing instead of a lone glyph.

- Log line started every run with `0 ok` even when the count was
  zero, drifting from the summary card's zero-suppression
  convention. Build the log bucket list the same way — only
  non-zero counts, and open with the first non-zero bucket.

- `Data.Roots` field and `.subtitle` CSS rule were dead after the
  roots-subtitle removal commit. Drop both plus the now-unused
  Roots assignment at the call site.

- Convert `reservedSlugs` from a package-level mutable map to a
  `isReservedSlug` switch. Keeps the reserved set immutable — no
  risk of a future test silently leaking an entry across the
  process.

- Strengthen the recency render test: anchor the chip inside the
  `.dates` wrapper so a refactor that moves the chip outside its
  structural context fails the test instead of passing on the
  CSS class alone.

- Extend the humanizeAgoAt table with 360/364-day "stable" cases
  (pinning the new month clamp) and a sub-day-future case that
  documents the "today" fallback for intra-day committer-date
  skew.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "Index · N repositories" heading restated information the
summary strip's "Repositories" card already surfaces one line
below. Removing it lets the three-card summary (Repositories,
Total commits, Unique devs) open the page — the counts users
actually came to see. Dead `h1` CSS rule dropped at the same time.

Test now anchors on the "Repositories" label in the summary strip,
which survives layout tweaks better than a specific H1 string.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The index template shipped its own `.summary` / `.summary-card`
classes with rules byte-for-byte identical to `.cards` / `.card`
in the team and profile templates. Two copies of the same
definition drift — a future restyle of "our card look" would land
in one place and quietly skip the other.

Rename both the CSS rules and the HTML usage to the shared
`.cards` / `.card` names. No visual change; the three pages now
share one vocabulary for the summary-strip pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The index inherited discovery's case-sensitive slug sort, which on
a real scan surfaced WordPress.git before gitcortex.git purely
because the uppercase W sorts before lowercase g. The order didn't
reflect anything an operator cares about.

Resort before rendering: failed first (actionable — and the red-
bordered "No report available" cards need to be impossible to miss),
then pending/skipped, then ok entries ranked by commit count desc
so the heaviest healthy repos open the tail. Slug asc breaks ties
in every bucket so identical-shaped scans still render identically.

Test with a mixed manifest {big=50c, broken=failed, pending, small=5c}
asserts the rendered order is broken → pending → big → small, i.e.
status bucket wins over alphabetical, commits win within the ok
bucket.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lex0c
Copy link
Copy Markdown
Owner Author

lex0c commented Apr 20, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7666e57eb5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread cmd/gitcortex/main.go
Comment thread internal/report/scan_index.go Outdated
lex0c and others added 2 commits April 20, 2026 20:05
--email alone used to fall through validation, reach the profile
branch after scan.Run completed, and call os.Create("") — surfacing
a cryptic `open : no such file or directory` error only AFTER the
slow multi-repo extract phase had already finished. On a home-dir
scan that's minutes to hours of work thrown away because of a
flag pair that was always going to fail.

Add the symmetric up-front check alongside the existing
--report-without-email rejection: --email requires --report <file>,
refuse with a clear message before scan.Run. Test uses an empty
TempDir to confirm the error surfaces without the "no git
repositories found" wrapper that would indicate discovery ran.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
humanizeAgoAt computed `days = int(now.Sub(t).Hours() / 24)`. Go's
int conversion truncates toward zero for negative values, so a
last-commit date 12 hours in the future (tomorrow's YYYY-MM-DD
relative to a noon `now`) produced int(-0.5) == 0 — slipping past
the `case days < 0` future-date guard and rendering as "today" /
fresh. Quietly mislabeled clock-skewed commits as today's activity.

Compare UTC date midnights directly: parse `lastDate` as midnight
UTC, reduce `now` to its UTC midnight, return empty whenever
`t.After(today)`. Same-day sub-day skew still falls into days==0
→ "today" (documented day-granularity fallback). The "case days <
0" branch is kept defensively for refactor safety.

Tests add tomorrow and two-days-future cases that trip the gap
under the old arithmetic and now correctly produce empty labels.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lex0c
Copy link
Copy Markdown
Owner Author

lex0c commented Apr 20, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 049cdb5463

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/scan/discovery.go Outdated
The working-tree check accepted any non-symlink `.git` entry, which
meant a random regular file called `.git` (source dump, build
artifact, a user accidentally creating the wrong thing) made the
parent directory register as a repo. The walker then SkipDir'd the
entire subtree, which had two bad outcomes:

  1. Any real repo nested under that parent silently disappeared
     from the scan — the landing index never listed it and
     operators had no hint it was being skipped.
  2. The fake parent hit the extract phase, failed with the
     cryptic "not a git repository", and inflated the failed
     count in the manifest with a bug that wasn't a repo problem
     at all.

Factor the detection into isWorkingTreeEntry: accept directories
and regular files whose first 8 bytes are "gitdir: " (the linked-
worktree / submodule pointer format git actually writes), reject
everything else including irregular files and symlinks. The
symlink policy is unchanged — this just widens the validation to
catch the "arbitrary file named .git" case too.

Tests cover all four paths: dir (existing), gitdir pointer (new
positive), arbitrary file (new negative — also asserts the nested
real repo underneath IS discovered after the fix), and symlink
(existing).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lex0c
Copy link
Copy Markdown
Owner Author

lex0c commented Apr 20, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 302d13f7ef

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/report/scan_index.go
Comment thread cmd/gitcortex/main.go Outdated
lex0c and others added 2 commits April 20, 2026 21:00
scan.runOne emits Status="skipped" when the worker picks up a job
after ctx has already cancelled. The summary strip already folded
skipped into PendingRepos, but the per-row renderer forwarded the
raw status into `class="repo skipped"` and `status-skipped` — CSS
selectors the template never defined, so skipped rows lost the
amber border and status pill. A cancelled scan would show a
"1 pending" card in the summary but N unstyled white rows below,
hiding exactly the cancellation fallout that most needs triage.

Fold skipped into pending at the entry boundary in
renderScanReportDir so the row class names align with the summary
counts. Rather than widen the template's CSS vocabulary to two
near-identical rules, collapse the synonym at the one place where
the internal status becomes user-visible chrome.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
scan --report --email hardcoded the profile's RepoName to
filepath.Base(roots[0]) — fine for single-root scans, misleading
for multi-root runs where the dataset actually spans every --root.
In practice: `gitcortex scan --root ~/work --root ~/oss ... --email
me@x.com` produced a report titled "· work", which readers
interpreted as belonging only to the first root when the stats
covered both.

profileScanLabel picks a label based on root cardinality:

  - 1 root → basename (unchanged common case)
  - 2-3 roots → joined with " + " so the scope is readable
  - 4+ roots → "N roots" (joining many basenames bloats the H1
    and the exact set lives in the scan log and manifest)

Smoke-tested on a 50-root microservice workspace: H1 now reads
"<dev> · 50 roots" instead of "<dev> · <first-ms-name>".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lex0c
Copy link
Copy Markdown
Owner Author

lex0c commented Apr 21, 2026

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Keep it up!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Two small perf wins bundled together — both touch the scan landing
code path.

--parallel default was hardcoded to 4. Fine for 2008-era laptops,
conservative on anything modern — a 12-thread dev box was leaving
8 workers on the table whenever the operator forgot to pass the
flag. Swap in defaultScanParallel(): runtime.NumCPU() floored at 2
(single-core CI still makes progress) and capped at 16 (git I/O
stops scaling past ~8-16 on SSDs; avoids over-subscription on
64-core servers).

renderScanReportDir's per-repo dev-set aggregation called
stats.TopContributors(ds, 0) just to walk email strings. That
function copies every ContributorStat by value and runs a full
sort by commit count — both wasted work when the caller discards
the struct and ordering. New stats.DevEmails(ds) returns only the
email slice, unordered. Per-repo savings are ms-scale but
cumulative on large scans.

Tests: DefaultScanParallel asserts the [2, 16] invariant; DevEmails
pins the set for the standard makeDataset fixture.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lex0c lex0c merged commit 15cf2cf into main Apr 21, 2026
4 checks passed
@lex0c lex0c deleted the feat/scan-multi-repo branch April 21, 2026 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant