Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -296,6 +296,7 @@ Each profile includes:
- **Collaboration**: top devs sharing the same files (ranked by `shared_lines` = Σ min(linesA, linesB))
- **Weekend %**: off-hours work ratio
- **Top files**: most impacted files by churn
- **Top commits**: the dev's largest individual commits by lines changed (additions + deletions); surfaces vendored drops and bulk rewrites that can skew the totals

### Coupling analysis

Expand Down
3 changes: 2 additions & 1 deletion docs/METRICS.md
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,7 @@ Per-developer report combining multiple metrics.
| Specialization | Herfindahl index over the **full** per-directory file-count distribution: Σ pᵢ² where pᵢ is the share of the dev's files in directory i. 1 = all files in one directory (narrow specialist); 1/N for a uniform spread across N directories; approaches 0 as the distribution widens. Computed before the top-5 Scope truncation so it reflects actual breadth. Labels (see `specBroadGeneralistMax`, `specBalancedMax`, `specFocusedMax` constants): `< 0.15` broad generalist, `< 0.35` balanced, `< 0.7` focused specialist, `≥ 0.7` narrow specialist. Herfindahl, not Gini, because Gini would collapse "1 file in 1 dir" and "1 file in each of 5 dirs" to the same value (both have zero inequality among buckets), which misses the specialization distinction. **Measures file distribution, not domain expertise** — see caveat below. **Display vs raw:** CLI and HTML show the value rounded to 3 decimals (`%.3f`) for readability; JSON output preserves the full float64. Band classification runs against the raw float, so a value like 0.149 lands in `broad generalist` even though %.2f would have rounded it to `0.15`. JSON consumers that reproduce the banding must use the raw value, not a rounded version. |
| Contribution type | Based on del/add ratio: growth (<0.4), balanced (0.4-0.8), refactor (>0.8) |
| Collaborators | Top 5 devs sharing code with this dev. Ranked by `shared_lines` (Σ min(linesA, linesB) across shared files), tiebreak `shared_files`, then email. Same `shared_lines` semantics as the Developer Network metric — discounts trivial one-line touches so "collaborator" reflects real overlap. |
| Top commits | The dev's top 10 commits by `lines_changed` (additions + deletions), tiebreak `sha asc`. Same ranking key and tiebreak as the dataset-level Top Commits section so the two read consistently side by side. Messages follow the same 80-character truncation rule and are only populated when `extract` ran with `--include-commit-messages`. Rendered in the CLI `profile` stat and in the standalone `--email` HTML profile page; intentionally omitted from the main report's Developer Profiles cards to keep those compact. **Divergence from dataset-level Top Commits:** commits with a zero `author_date` are dropped from the per-dev list (they share the guard that protects grid/monthly bucketing); the dataset-level section renders them as `0001-01-01`. Negligible in practice — the JSONL extract always emits `author_date` — but worth knowing if you compare the two views. |

## Top Commits

Expand Down Expand Up @@ -373,7 +374,7 @@ Every ranking function has an explicit tiebreaker so the same input produces the
| `dev-network` | shared_lines | shared_files |
| `profile` | commits | email asc |

A third-level tiebreaker on path/sha/email asc is applied where primary and secondary can both tie (`churn-risk`, `coupling`, `dev-network`) so ordering is stable even with exact equality on the first two keys. Inside each profile, the `TopFiles`, `Scope`, and `Collaborators` sub-lists are also sorted with explicit tiebreakers (path / dir / email asc) so their internal ordering is deterministic too.
A third-level tiebreaker on path/sha/email asc is applied where primary and secondary can both tie (`churn-risk`, `coupling`, `dev-network`) so ordering is stable even with exact equality on the first two keys. Inside each profile, the `TopFiles`, `TopCommits`, `Scope`, and `Collaborators` sub-lists are also sorted with explicit tiebreakers (path / sha / dir / email asc) so their internal ordering is deterministic too.

Inside `busfactor`, the per-file `TopDevs` list is sorted by lines desc with an email asc tiebreaker. Without it, binary assets and small files where two devs contribute equal lines (e.g. `.gif`, `.png`, one-line configs) produced a different `TopDevs` email order on every run.

Expand Down
21 changes: 21 additions & 0 deletions internal/report/profile_template.go
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,27 @@ footer { margin-top: 40px; padding-top: 16px; border-top: 1px solid #d0d7de; col
</table>
{{end}}

{{if .Profile.TopCommits}}
{{$hasMsg := (index .Profile.TopCommits 0).Message}}
<h2>Top Commits</h2>
<p class="hint">This developer's largest individual commits by lines changed (additions + deletions). A handful of outsized commits (vendored drops, bulk renames, generated code) reads very differently from a steady stream of medium-sized ones, even when the totals match.</p>
<table>
<tr><th>SHA</th><th>Date</th><th>Lines</th><th>Files</th>{{if $hasMsg}}<th>Message</th>{{end}}</tr>
{{range .Profile.TopCommits}}
<tr>
<td class="mono">{{printf "%.12s" .SHA}}</td>
<td class="mono" style="font-size:11px;">{{.Date}}</td>
<td>{{thousands .LinesChanged}}</td>
<td>{{thousands .FilesChanged}}</td>
{{if $hasMsg}}<td class="truncate">{{.Message}}</td>{{end}}
</tr>
{{end}}
{{if gt .Profile.TopCommitsHidden 0}}
<tr><td colspan="{{if $hasMsg}}5{{else}}4{{end}}" style="color:#656d76; font-style:italic; text-align:center;">+{{.Profile.TopCommitsHidden}} more commits not shown</td></tr>
{{end}}
</table>
{{end}}

{{if .ActivityYears}}
<h2 style="display:flex; justify-content:space-between; align-items:center;">Activity <button onclick="var h=document.getElementById('prof-act-heatmap'),t=document.getElementById('prof-act-table');h.hidden=!h.hidden;t.hidden=!t.hidden;this.textContent=h.hidden?'heatmap':'table'" style="font-size:11px; font-weight:normal; padding:2px 10px; border:1px solid #d0d7de; border-radius:4px; background:#f6f8fa; color:#24292f; cursor:pointer;">table</button></h2>
<p class="hint">Monthly commit heatmap. Darker = more commits. Gaps = inactive periods; steady cadence signals healthy pace. Hover for details; toggle to table for exact numbers. · {{docRef "activity"}}</p>
Expand Down
52 changes: 52 additions & 0 deletions internal/report/report_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -256,6 +256,58 @@ func TestGenerateProfile_SmokeRender(t *testing.T) {
}
}

func TestProfileTmpl_TopCommitsShortSHAAndMessageGate(t *testing.T) {
// Two invariants on the per-dev Top Commits block:
// 1. SHAs shorter than 12 chars must not crash template execution.
// LoadJSONL does not enforce SHA length, and the previous
// {{slice .SHA 0 12}} raised "index out of range" on any short
// input, aborting profile generation for the whole page.
// 2. The Message column header and cells must drop out when no
// commit carries a message — mirrors the dataset-level Top
// Commits convention so `extract --include-commit-messages` is
// a strict opt-in, not a silent empty-column penalty.
data := ProfileReportData{
GeneratedAt: "2024-01-01 00:00",
RepoName: "t",
Profile: stats.DevProfile{
Name: "N", Email: "n@x",
Commits: 1, ActiveDays: 1,
FirstDate: "2024-01-01", LastDate: "2024-01-01",
TopCommits: []stats.DevCommit{
{SHA: "c1", Date: "2024-01-01", LinesChanged: 10, FilesChanged: 1},
{SHA: "abcdef1234567890", Date: "2024-01-02", LinesChanged: 20, FilesChanged: 2},
},
},
}
var buf bytes.Buffer
if err := profileTmpl.Execute(&buf, data); err != nil {
t.Fatalf("profileTmpl.Execute: %v", err)
}
out := buf.String()

if !strings.Contains(out, ">c1<") {
t.Errorf("short 2-char SHA should render intact, got:\n%s", out)
}
if !strings.Contains(out, ">abcdef123456<") {
t.Errorf("16-char SHA should truncate to 12, got:\n%s", out)
}
if strings.Contains(out, "abcdef1234567890") {
t.Errorf("16-char SHA leaked past the 12-char cap")
}

// Message column must be absent when all TopCommits have empty messages.
topBlock := out
if idx := strings.Index(out, "<h2>Top Commits"); idx >= 0 {
topBlock = out[idx:]
}
if end := strings.Index(topBlock, "</table>"); end >= 0 {
topBlock = topBlock[:end]
}
if strings.Contains(topBlock, "<th>Message</th>") {
t.Errorf("Message column should not render when no commit has a message, got:\n%s", topBlock)
}
}

func TestGenerateProfile_UnknownEmail(t *testing.T) {
ds := loadFixture(t)
var buf bytes.Buffer
Expand Down
4 changes: 2 additions & 2 deletions internal/report/template.go
Original file line number Diff line number Diff line change
Expand Up @@ -343,7 +343,7 @@ footer { margin-top: 40px; padding-top: 16px; border-top: 1px solid #d0d7de; col
<tr><th>SHA</th><th>Author</th><th>Date</th><th>Lines</th><th>Files</th>{{if and (gt (len .TopCommits) 0) (index .TopCommits 0).Message}}<th>Message</th>{{end}}</tr>
{{range .TopCommits}}
<tr>
<td class="mono">{{slice .SHA 0 12}}</td>
<td class="mono">{{printf "%.12s" .SHA}}</td>
<td>{{.AuthorName}}</td>
<td class="mono">{{.Date}}</td>
<td>{{thousands .LinesChanged}}</td>
Expand Down Expand Up @@ -372,7 +372,7 @@ footer { margin-top: 40px; padding-top: 16px; border-top: 1px solid #d0d7de; col
{{end}}

{{if .Profiles}}
<h2>Developer Profiles</h2>
<h2>Developer Profiles{{if lt (len .Profiles) .Summary.TotalDevs}} <span style="font-size:13px; color:#656d76; font-weight:normal;">{{thousands (len .Profiles)}} of {{thousands .Summary.TotalDevs}}</span>{{end}}</h2>
<p class="hint">Per-developer view. Use to spot silos (narrow scope + few collaborators), knowledge concentration (high pace on few directories), and cultural patterns (weekend or refactor-heavy work). · {{docRef "profile"}}</p>
{{range .Profiles}}
<div style="background:#fff; border:1px solid #d0d7de; border-radius:6px; padding:16px; margin-bottom:16px;">
Expand Down
23 changes: 23 additions & 0 deletions internal/stats/format.go
Original file line number Diff line number Diff line change
Expand Up @@ -520,6 +520,29 @@ func (f *Formatter) PrintProfiles(profiles []DevProfile) error {
}
}

if len(p.TopCommits) > 0 {
fmt.Fprintln(f.w)
fmt.Fprintln(f.w, " Top commits:")
for _, tc := range p.TopCommits {
// Defensive slice: LoadJSONL does not validate SHA
// length, so hand-built fixtures (e.g. "c1") or a
// future ingest path that emits abbreviated SHAs
// would panic on a fixed tc.SHA[:12]. The other SHA
// slice sites in this file (TopCommits / LatestCommits)
// carry the same latent risk and are left as-is so
// this change stays scoped to the new Top-commits block.
sha := tc.SHA
if len(sha) > 12 {
sha = sha[:12]
}
fmt.Fprintf(f.w, " %s %s %6d lines %3d files %s\n",
sha, tc.Date, tc.LinesChanged, tc.FilesChanged, tc.Message)
}
if p.TopCommitsHidden > 0 {
fmt.Fprintf(f.w, " ... (+%d more commits not shown)\n", p.TopCommitsHidden)
}
}

if len(p.MonthlyActivity) > 0 {
fmt.Fprintln(f.w, " Activity:")
maxCommits := 0
Expand Down
82 changes: 81 additions & 1 deletion internal/stats/stats.go
Original file line number Diff line number Diff line change
Expand Up @@ -1328,6 +1328,14 @@ type DevProfile struct {
// whole footprint or just a sample. Zero when the dev's touched
// file count fits in 10.
TopFilesHidden int
// TopCommits is the dev's largest commits by LinesChanged (add+del),
// capped at 10. Mirrors the dataset-level TopCommits metric so a
// reader can see which individual commits drive this dev's churn
// footprint — a handful of huge vendored-drop commits reads very
// differently from a steady stream of medium ones, even when the
// totals match. TopCommitsHidden follows the TopFilesHidden pattern.
TopCommits []DevCommit
TopCommitsHidden int
Scope []DirScope
// ScopeHidden / ExtensionsHidden count the buckets dropped by the
// top-5 truncation so CLI and HTML can surface "+N more" — without
Expand Down Expand Up @@ -1369,6 +1377,22 @@ type DevFileContrib struct {
Churn int64
}

// DevCommit is a single commit attributed to the dev, carrying the
// fields needed to render the per-dev "top commits" list. Mirrors the
// shape of BigCommit (the dataset-level TopCommits type) minus the
// AuthorName/AuthorEmail fields — those are redundant in a per-dev view
// where every entry belongs to the same author. Message is truncated
// at 80 chars (same as TopCommits) to keep the CLI/HTML table narrow.
type DevCommit struct {
SHA string
Date string
Message string
Additions int64
Deletions int64
LinesChanged int64
FilesChanged int
}

// DevExtContrib is a dev's footprint in a single extension bucket.
// Churn is the summed per-file dev-lines (from fe.devLines), so it
// reflects lines the dev personally added/removed across files that
Expand Down Expand Up @@ -1525,16 +1549,44 @@ func DevProfiles(ds *Dataset, filterEmail string, n int) []DevProfile {
// Per-dev work grid + monthly activity
devGrid := make(map[string]*[7][24]int)
devMonthly := make(map[string]map[string]*ActivityBucket)
// Per-dev commit list for TopCommits ranking. Collected in the same
// ds.commits pass as devGrid/devMonthly so we don't iterate the full
// commit map twice; actual sort + top-10 truncation happens in the
// per-dev assembly loop below.
devCommits := make(map[string][]DevCommit)
dayIdx := [7]int{6, 0, 1, 2, 3, 4, 5} // Sunday=6, Monday=0, ...

for _, cm := range ds.commits {
for sha, cm := range ds.commits {
if !inTarget(cm.email) {
continue
}
if cm.date.IsZero() {
// Note: dataset-level TopCommits() renders zero-date commits
// as "0001-01-01"; we drop them here because grid/monthly below
// share this guard and malformed-date commits are rare enough
// in practice (JSONL extract always emits author_date) that
// the divergence is not worth branching the loop for.
continue
}

// Message is stored un-truncated on purpose: the 80-char
// truncation is deferred to the per-dev assembly loop below,
// which runs after sort + top-10 cap. A dev with thousands of
// commits would otherwise pay N small string allocations here
// just to throw away all but 10. Dataset-level TopCommits()
// truncates inline because it builds BigCommits in one pass;
// the per-dev path splits collection from projection so we can
// avoid that cost.
devCommits[cm.email] = append(devCommits[cm.email], DevCommit{
SHA: sha,
Date: cm.date.UTC().Format("2006-01-02"),
Message: cm.message,
Additions: cm.add,
Deletions: cm.del,
LinesChanged: cm.add + cm.del,
FilesChanged: cm.files,
})

if devGrid[cm.email] == nil {
devGrid[cm.email] = &[7][24]int{}
}
Expand Down Expand Up @@ -1585,6 +1637,33 @@ func DevProfiles(ds *Dataset, filterEmail string, n int) []DevProfile {
}
}

// Top commits: rank this dev's commits by lines changed, mirroring
// the dataset-level TopCommits semantics. Deterministic tiebreak on
// SHA asc so the displayed top-10 is stable across runs when a dev
// has several same-sized commits (e.g. a series of formatting
// passes each touching the same LOC count). Message truncation is
// done here, post-cap, so we pay the string-copy cost for at most
// 10 entries per dev instead of the full commit count.
topCommits := devCommits[email]
topCommitsHidden := 0
if len(topCommits) > 0 {
sort.Slice(topCommits, func(i, j int) bool {
if topCommits[i].LinesChanged != topCommits[j].LinesChanged {
return topCommits[i].LinesChanged > topCommits[j].LinesChanged
}
return topCommits[i].SHA < topCommits[j].SHA
})
if len(topCommits) > 10 {
topCommitsHidden = len(topCommits) - 10
topCommits = topCommits[:10]
}
for i := range topCommits {
if len(topCommits[i].Message) > 80 {
topCommits[i].Message = topCommits[i].Message[:77] + "..."
}
}
}

var monthly []ActivityBucket
if months, ok := devMonthly[email]; ok {
var order []string
Expand Down Expand Up @@ -1805,6 +1884,7 @@ func DevProfiles(ds *Dataset, filterEmail string, n int) []DevProfile {
LinesChanged: cs.Additions + cs.Deletions, FilesTouched: cs.FilesTouched,
ActiveDays: cs.ActiveDays, FirstDate: cs.FirstDate, LastDate: cs.LastDate,
TopFiles: topFiles, TopFilesHidden: topFilesHidden,
TopCommits: topCommits, TopCommitsHidden: topCommitsHidden,
Scope: scope, ScopeHidden: scopeHidden,
Extensions: extensions, ExtensionsHidden: extensionsHidden,
Specialization: specialization,
Expand Down
Loading
Loading