The go generate step in internal/generate/knownfiles/main.go fetches languages.yml from Linguist's main branch:
https://raw.githubusercontent.com/github-linguist/linguist/refs/heads/main/lib/linguist/languages.yml
This means the output of go generate changes whenever Linguist merges a PR that adds or renames filenames — which happens frequently. The CI lint check ("generated files up to date") then fails for any contributor whose branch doesn't include the latest generated output, even if their change has nothing to do with known files.
This just happened on #465 — Linguist added mise.lock and mise.local.lock as TOML filenames between when the contributor forked and when CI ran.
Proposed fix:
Pin the Linguist URL to a specific commit or tag instead of main:
const linguistURL = "https://raw.githubusercontent.com/github-linguist/linguist/<commit-sha>/lib/linguist/languages.yml"
Update it on a regular cadence (e.g., monthly) via a scheduled GitHub Actions workflow that:
- Fetches the latest Linguist commit SHA
- Updates the pinned SHA in
main.go
- Runs
go generate ./pkg/filetype/...
- Opens a PR if the output changed
This makes go generate deterministic — same input, same output — so contributors stop getting blocked by unrelated Linguist updates.
The
go generatestep ininternal/generate/knownfiles/main.gofetcheslanguages.ymlfrom Linguist'smainbranch:This means the output of
go generatechanges whenever Linguist merges a PR that adds or renames filenames — which happens frequently. The CI lint check ("generated files up to date") then fails for any contributor whose branch doesn't include the latest generated output, even if their change has nothing to do with known files.This just happened on #465 — Linguist added
mise.lockandmise.local.lockas TOML filenames between when the contributor forked and when CI ran.Proposed fix:
Pin the Linguist URL to a specific commit or tag instead of
main:Update it on a regular cadence (e.g., monthly) via a scheduled GitHub Actions workflow that:
main.gogo generate ./pkg/filetype/...This makes
go generatedeterministic — same input, same output — so contributors stop getting blocked by unrelated Linguist updates.