Research: ingest Go stdlib + pkg.go.dev docs into the deadzone registry

**Parent:** none
**Depends on:** none (research, no implementation gate)

## Problem

Go is the language deadzone itself is written in, and the natural candidate for the first "fully self-referential" lib in the registry. But **none of the three existing source kinds fit the Go stdlib cleanly**:

- `github-md` (raw markdown HTTP): `golang/go` ships only 2 `.md` files in `/doc/` (confirmed 2026-04-17: `README.md` + `godebug.md`). The Go language spec is `/doc/go_spec.html`, not markdown. Thin coverage
- `github-rst` (ReST): Go doesn't ship RST. Miss
- `scrape-via-agent` (LLM extraction on HTML): works against `go.dev/doc/` but burns LLM tokens on every re-scrape, and the structure of pkg.go.dev pages is highly redundant (sidebar, nav, code blocks → token bloat)

The Go stdlib's real documentation source is **godoc**: comments embedded in `.go` source files in `src/**`, rendered by `pkg.go.dev`. Neither `github-md` nor `scrape-via-agent` maps onto that cleanly.

## Goal

Pick **one** ingestion approach for Go stdlib + popular third-party Go libraries (context7-scale target: hundreds of Go modules eventually), document the decision in `docs/research/`, and file follow-up feature issues for implementation.

## Areas to investigate

### 1. New `kind: godoc` that walks source

Parses `.go` files directly, extracts package/type/func comments via `go/doc` stdlib, produces one chunk per exported identifier (+ one for the package overview). Pros: precise, semantic chunks; doesn't depend on any external service; stable across Go versions. Cons: a new scraper code path, probably larger than #27 scrape-via-agent was; vendoring `go/doc` is trivial but the orchestration of "which packages" is not.

### 2. `scrape-via-agent` against `pkg.go.dev` HTML

Pros: no new code, reuses #27's infrastructure, agent can summarize or re-chunk. Cons: each re-scrape = LLM tokens × N pages; `pkg.go.dev` has nav/sidebar noise that the agent needs to strip; the page-per-package model inflates URL count for the stdlib (150+ packages)

### 3. Fetch raw godoc JSON from `pkg.go.dev/<pkg>?format=json` if such a thing exists

Short spike: does `pkg.go.dev` expose a structured data endpoint? If yes, zero-LLM, zero-HTML-scraping. If no, dead option.

### 4. Hybrid: `github-md` for the handful of Go-team narrative docs + `godoc` kind for packages

Two sources per lib_id. Keeps narrative content (language-spec.md, effective-go.md — if they ever land in .md form) separable from API docs.

## Deliverables

- [ ] `docs/research/go-stdlib-ingestion.md` — decision log following the pattern of `batch-scrape-actions.md` and `embedder-choice.md`: options-considered table, sanity-check-at-scale section, picked option with "holds at 200 Go modules" and "holds at 2000 Go modules" columns
- [ ] Follow-up feature issue filed against the picked option (new kind? new scraper flag? config bump?)
- [ ] If the picked option is option 3 (pkg.go.dev JSON spike) and the spike returns "yes", file a trivial feature issue for the config addition; if "no", file whichever option 1/2/4 was second-best
- [ ] One-line addition to `docs/research/ingestion-architecture.md` decision 3 (source kinds) noting that Go stdlib is tracked separately and linking to the new research doc

## Acceptance criteria

- [ ] All 4 areas above evaluated in the research doc (table form, not prose)
- [ ] Explicit "picked option" with reasoning, dated-locked
- [ ] Sanity checks: "holds at 1 Go module", "holds at 150 stdlib packages", "holds at 2000 third-party Go modules (context7-scale)" — each column ✅ / ⚠️ / ❌
- [ ] At least one follow-up issue filed

## Out of scope (fenced)

- **No implementation** — this is research-only. The follow-up issue does implementation
- **No single-lib scope** — if the picked option requires new scraper infrastructure, it targets Go stdlib **AND** third-party Go libs with the same shape, not just golang/go
- **No pkg.go.dev scraping heuristics in this issue** — that's the follow-up
- **No comparison with context7's Go coverage** — if relevant, reference it in prose but don't benchmark
- **No embedder changes** — the embedder is orthogonal; Go docs are regular text

## Related

- #27 (scrape-via-agent) — defines option 2's infrastructure
- #46 (github-glob) — related pattern but different (glob is on GitHub, godoc is on Go source tree)
- `docs/research/ingestion-architecture.md` decision 3 — the three existing kinds
- The tursodatabase registry issue (filed same day) — this issue is its Go-flavored sibling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Research: ingest Go stdlib + pkg.go.dev docs into the deadzone registry #133

Problem

Goal

Areas to investigate

1. New `kind: godoc` that walks source

2. `scrape-via-agent` against `pkg.go.dev` HTML

3. Fetch raw godoc JSON from `pkg.go.dev/<pkg>?format=json` if such a thing exists

4. Hybrid: `github-md` for the handful of Go-team narrative docs + `godoc` kind for packages

Deliverables

Acceptance criteria

Out of scope (fenced)

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Research: ingest Go stdlib + pkg.go.dev docs into the deadzone registry #133

Description

Problem

Goal

Areas to investigate

1. New kind: godoc that walks source

2. scrape-via-agent against pkg.go.dev HTML

3. Fetch raw godoc JSON from pkg.go.dev/<pkg>?format=json if such a thing exists

4. Hybrid: github-md for the handful of Go-team narrative docs + godoc kind for packages

Deliverables

Acceptance criteria

Out of scope (fenced)

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. New `kind: godoc` that walks source

2. `scrape-via-agent` against `pkg.go.dev` HTML

3. Fetch raw godoc JSON from `pkg.go.dev/<pkg>?format=json` if such a thing exists

4. Hybrid: `github-md` for the handful of Go-team narrative docs + `godoc` kind for packages