Skip to content

feat(registry): add tursodatabase docs (tursogo, Limbo, docs.turso.tech) #132

@laradji

Description

@laradji

Parent: none
Depends on: none — all three additions use the existing github-md kind

Decision (locked 2026-04-17)

Add three libs under the tursodatabase umbrella to libraries_sources.yaml. All three use kind: github-md because the scraper's github-md path (scraper.FetchOne in internal/scraper/scraper.go:136-146) does not validate URL host — it is in effect a generic "raw markdown HTTP" fetcher, which confirmed works against docs.turso.tech/<path>.md (Mintlify serves raw markdown when .md is appended to any doc URL, verified 200 on 2026-04-17).

Why

Deadzone's storage layer runs on tursogo (Go driver) against the Turso/Limbo SQLite-compatible engine. Our own CLAUDE.md instructs agents to check context7 for library docs, but having these three directly in deadzone.db makes them searchable via the MCP server we ship — dogfooding and reducing the number of MCP servers a user needs.

Acceptance criteria

  • New entry lib_id: /tursodatabase/turso-go in libraries_sources.yaml:
    • kind: github-md
    • No versions: block (tursogo has no releases — https://api.github.com/repos/tursodatabase/turso-go/releases/latest returns no releases as of 2026-04-17). Pin to main via a bare URL list
    • URLs: 3 markdown files at repo root — README.md, CONTRIBUTING.md, LICENSE.md
  • New entry lib_id: /tursodatabase/turso in libraries_sources.yaml:
    • kind: github-md
    • Pin the most recent stable tag if one emerges; until then, use the latest prerelease tag (e.g. v0.6.0-pre.18 on 2026-04-17) with versions: { "0.6": { ref: v0.6.0-pre.18 } }. Implementer re-verifies latest at impl time
    • URLs: all docs/*.md at the pinned ref — inventory at impl time via gh api repos/tursodatabase/turso/contents/docs?ref=<tag>. On 2026-04-17 that surfaces fts.md, manual.md, testing.md, javascript-api-reference.md, plus subdirs language-reference/, sql-reference/, internals/, agent-guides/, contributing/ — walk these subdirs too
  • New entry lib_id: /tursodatabase/turso-docs in libraries_sources.yaml:
    • kind: github-md (yes, despite the non-github URLs — see Decision above)
    • No versions: block (docs.turso.tech is unversioned, always current)
    • URLs: start with a curated subset, NOT the full sitemap. At impl time, fetch https://docs.turso.tech/sitemap.xml, pick ~20–40 canonical paths covering /introduction, /sdk/*, /cli/*, /concepts/*, /features/* (whichever sections exist in the current sitemap). Each URL takes the form https://docs.turso.tech/<path>.md
    • Add a YAML comment above the URL list: # Mintlify serves raw markdown when '.md' is appended to any doc path. kind is github-md because scraper.FetchOne is host-agnostic — see issue <THIS>.
  • After each addition, run just scrape lib=/tursodatabase/<name> locally (or dispatch scrape-pack.yml -f lib=/tursodatabase/<name>) and confirm the artifact DB builds without 404s or agent errors
  • No change to internal/scraper/config.go or any Go code — this is pure registry data
  • README / CLAUDE.md — no change; the registry is self-documenting through the YAML comments

Concrete file pointers

Files to modify:

  • libraries_sources.yaml — 3 new libraries[] entries

Files to read as reference — do NOT refactor:

  • libraries_sources.yaml existing entries (e.g. /fastapi/fastapi, /modelcontextprotocol/go-sdk) as format templates
  • internal/scraper/config.goKindGithubMD constant (declared line 32, accepted by validation at line 50)
  • internal/scraper/scraper.go:136-146FetchOne implementation confirming host-agnostic HTTP fetch

Test commands (literal, for agent self-check)

  • mise exec -- go run -tags ORT ./cmd/deadzone scrape -list -config libraries_sources.yaml — each new lib_id appears in the JSON output
  • just scrape lib=/tursodatabase/turso-go — succeeds, produces artifacts/tursodatabase_turso-go/artifact.db
  • just scrape lib=/tursodatabase/turso — succeeds
  • just scrape lib=/tursodatabase/turso-docs — succeeds; curl 3 random URLs from the list against docs.turso.tech to verify all return 200 before pushing
  • just test -short — config tests still pass (there is existing coverage in internal/scraper/config_test.go that loads the whole YAML)

Out of scope (fenced)

  • No new kind like raw-md — the existing github-md kind is misleadingly named but works fine. Renaming the kind is a separate breaking-change issue
  • No scraper code changes — pure data addition
  • No Go stdlib / pkg.go.dev ingestion — tracked separately as a research issue (filed alongside this one)
  • No full sitemap import for docs.turso.tech — curated subset only; a later issue can broaden coverage once we see query patterns
  • No pinning of docs.turso.tech to a specific revision — that's hosted content, there is no revision knob
  • No cache invalidation concernslibraries_sources.yaml is hashed into the artifact cache key (per feat: .github/workflows/scrape-pack.yml — matrix scrape + cache + consolidate producing deadzone.db #126), any edit to it re-scrapes all affected libs on the next scrape-pack run

Related

Metadata

Metadata

Assignees

Labels

P2Normal — clear value, not urgentfeatureNew feature

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions