Skip to content

Pin Linguist version in go generate to make builds reproducible #466

@kehoecj

Description

@kehoecj

The go generate step in internal/generate/knownfiles/main.go fetches languages.yml from Linguist's main branch:

https://raw.githubusercontent.com/github-linguist/linguist/refs/heads/main/lib/linguist/languages.yml

This means the output of go generate changes whenever Linguist merges a PR that adds or renames filenames — which happens frequently. The CI lint check ("generated files up to date") then fails for any contributor whose branch doesn't include the latest generated output, even if their change has nothing to do with known files.

This just happened on #465 — Linguist added mise.lock and mise.local.lock as TOML filenames between when the contributor forked and when CI ran.

Proposed fix:

Pin the Linguist URL to a specific commit or tag instead of main:

const linguistURL = "https://raw.githubusercontent.com/github-linguist/linguist/<commit-sha>/lib/linguist/languages.yml"

Update it on a regular cadence (e.g., monthly) via a scheduled GitHub Actions workflow that:

  1. Fetches the latest Linguist commit SHA
  2. Updates the pinned SHA in main.go
  3. Runs go generate ./pkg/filetype/...
  4. Opens a PR if the output changed

This makes go generate deterministic — same input, same output — so contributors stop getting blocked by unrelated Linguist updates.

Metadata

Metadata

Assignees

No one assigned

    Labels

    CI/CDCI/CD work including Github ActionsTech Debtgood first issueGood for newcomershas-prThis issue has an associated PRhelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions