Find and classify the top GitHub repositories in any category, with parallel sub-agents and a slop-aware rubric. Picks the right query type per category — and surfaces the answer that pure repo-name search misses, even when twenty real candidates exist.
A production-ready Claude Code skill that takes a category description, picks the right gh search strategy, dispatches three parallel sub-agents to fan out across established / rising / niche lanes, scores every candidate with a 16-signal composite rubric, and presents verified top results with rationale and caveats — grounded in your actual gh output, not training-data recall.
GitHub is huge. Finding the right repo for a given category is harder than it looks, for two reasons that compound each other:
Problem 1 — The query type matters more than the ranker. For niche capabilities, a naive gh search repos "<natural-language description>" often returns zero hits because no repo's name contains the user's phrasing. The same intent expressed as gh search code "<keyword>" --filename SKILL.md (or --filename plugin.json, or --filename Cargo.toml, etc.) typically returns dozens of real candidates because tool-specific files follow predictable filenames. For Claude Code capabilities, plugins, MCP servers, and any tooling that lives inside specific filenames, repo-name search is structurally the wrong tool. Most people never figure this out and conclude "the thing I want doesn't exist."
Problem 2 — Stars are gameable, and not by a little. A 2024 Carnegie Mellon study counted ~6 million suspected fake stars across 18,617 repos. By July 2024, 16.66% of repos with 50+ stars showed fake-star activity. Premium stargazer accounts now sell for up to $5,000. AI/LLM repos are the largest non-malicious category receiving fake stars (~177,000 suspected). The strongest single fake-star tell — fork-to-star ratio under 5% on a thousand-star repo — is something no UI surfaces by default.
This skill encodes both lessons. The category router picks the right query type before searching. The composite scorer log-scales stars (capped) and applies a hard penalty for AI-slop signatures. You get answers that survive contact with reality.
When you trigger this skill in Claude Code, by the end of the conversation you have:
| Outcome | Detail |
|---|---|
| 🎯 A ranked top-N list | Each repo with score (0-100), stars, last push, license, one-liner, and a one-sentence "why it qualifies" |
| 🧪 Per-repo evidence | Score breakdown across activity / health / relevance / trust / popularity / scorecard / slop |
| 🚩 Caveats per repo | Deprecations, abandoned-but-popular flags, fork-of-X notes, suspicious growth patterns |
| 🪵 A search log | The strategy used, queries fired, candidates considered, filtered count with reasons, URLs spot-checked — so you can re-run with adjusted thresholds |
| 🔍 A classified verdict per candidate | Real evidence, not "based on common knowledge" — every repo went through a gh call in the current session |
A typical run is 60-180 seconds end-to-end and costs roughly $0.05-$0.15 in agent token usage. The gh calls and the OpenSSF Scorecard call are free. Compare to your time spent doing this manually.
git clone https://github.com/MJWNA/github-repo-discovery.git ~/.claude/skills/github-repo-discoverygh auth status # should show "Logged in to github.com"
python3 --version # 3.9 or newerIf gh isn't installed: brew install gh && gh auth login.
The skill becomes available after restart. Claude Code reads ~/.claude/skills/ at session start.
In any project, just say things like:
- "What are the top Python orchestration libraries for AI agents?"
- "Is there an open-source vector database with good Rust support?"
- "Top GitHub repos for [niche topic]?"
- "Find me a tool to [do X]"
- "What's the best library for [Y]?"
- "Compare repos for [Z]"
- "Find me a Claude Code skill that does [X]"
- "Is there an MCP server for [Y]?"
The skill description has 17+ trigger phrases — it's deliberately easy to invoke.
Claude will:
- Restate the category and announce the chosen strategy (so you can redirect)
- Run the primary
ghquery - Decide whether to dispatch parallel sub-agents (>10 candidates → yes; ≤10 → score directly)
- Score every candidate with
scripts/score_repo.py - Spot-check 2 of every 5 returned URLs
- Present the top N with rationale, caveats, and the search log
Total time: ~30-90 seconds for narrow categories; ~2-3 minutes for broad ones with parallel-agent dispatch.
It splits any "find me top GitHub repos for X" task into six steps:
Match the user's category against a five-row table in SKILL.md. Pick one primary strategy:
| Category signal | Primary strategy |
|---|---|
| Claude Code skill / capability | gh search code "<kw>" --filename SKILL.md --limit 20 |
| Claude Code plugin / marketplace | Anthropic plugin marketplace + --filename plugin.json |
| Language + tool type | gh search repos --topic ... --language ... --stars '>500' --pushed '>=YYYY-MM-DD' |
| Generic category | Topic search + awesome-list lookup |
| >1000 results | Partition on stars: ranges (binary-search the cutoff) |
The strategy is announced before searching so the user can correct it.
Shell to gh CLI. Always explicit-sort (never best-match, which is opaque and changes without notice).
If the primary query returned ≤10 high-quality candidates, score directly. If >10 OR the category has clear sub-lanes, dispatch N=3 parallel sub-agents:
- A — Established: ≥1000 stars, created ≥2y ago, pushed in last 90d
- B — Rising: ≥100 stars, created ≤12mo ago, pushed in last 30d, sort by stars-per-day
- C — Niche: lower star floor, broader topic match, code search inside
SKILL.md/plugin.json
Every brief contains a literal ## What other agents are covering — DO NOT DUPLICATE block. Anthropic's own multi-agent post-mortem identified vague-brief duplication as the #1 failure mode. The brief template is in references/sub-agent-brief-template.md.
python3 scripts/score_repo.py owner/repo --keywords "kw1,kw2,kw3"Returns JSON with a 0-100 score plus a per-signal breakdown. The script implements the rubric in references/scoring-rubric.md:
- Stars are log-scaled and capped (weight 6 of 100) — they cannot dominate
- Slop penalty subtracts up to 15 after the weighted sum
- Free OpenSSF Scorecard call (no auth, ~1M repos pre-computed)
- Spot-check 2 of every 5 returned URLs via
WebFetchorscripts/verify_urls.sh - Auto-flag repos with <10 stars OR last commit >2 years for manual review
- Reject zero-tool-call sub-agent outputs (training-data fabrications)
- Note duplicate repos across agents (signals brief leakage)
# 🔍 Top {N} repos for "{category}"
| # | Repo | Stars | Last push | Score | Why |
|---|---|---|---|---|---|
## Detail per repo
## Search logThe search log is the part most users skim past — but it's the part that lets you trust or distrust the result. If three agents each fired only one tool call, the answer is suspicious regardless of the score.
A typical run, in the abstract:
- You describe the category in natural language.
- The skill announces the chosen strategy (e.g. "routing to Claude-Code-skill strategy" or "routing to language+tool-type strategy") so you can redirect if it picked wrong.
- It runs the primary
ghquery and shows the candidate count. - If the candidate set warrants it, the skill dispatches three parallel sub-agents (established / rising / niche).
- Every candidate is scored with
scripts/score_repo.pyagainst the rubric. - The top N are returned in a markdown table — repo, stars, last push, score, one-line rationale — followed by per-repo detail and a search log showing what was searched, what was filtered, and what was verified.
A run from "I asked the question" to "I have a ranked, verified shortlist" is typically 60-180 seconds.
For readers who want to know exactly what's happening under the hood, see docs/HOW-IT-WORKS.md. Highlights:
- Routing is in SKILL.md, not a Python script. Category routing is a natural-language judgment call — the LLM is the right tool. Wrapping it in Python adds ceremony without value.
- Scoring is a Python script. The rubric has 16 signals across activity / health / relevance / trust / popularity / scorecard / slop. Math is fiddly enough to warrant a script.
- Sub-agent dispatch happens via Claude Code's native subagent system. No custom orchestration. The brief template is the only thing the skill enforces.
- Verification is a bash one-liner.
verify_urls.shshellscurlto spot-check that URLs return 200/301/302. Anything else is flagged.
The full design philosophy — why query type beats ranker, why N=3 is the sweet spot, why slop penalty is a hard subtract — is in docs/PHILOSOPHY.md.
The four parallel-agent research tracks that produced the design are in research/:
research/00-synthesis.md— architecture + worked exampleresearch/01-github-search-api.md— REST + GraphQL +ghCLI mechanicsresearch/02-classification-heuristics.md— quality signals + scoring rubricresearch/03-claude-code-ecosystem.md— Claude Code ecosystem field guideresearch/04-prompting-parallel-agents.md— sub-agent brief design
Read these if you want to understand the why behind every number in the rubric.
Yes. The category router picks a different strategy (topic + star floor + freshness) when the input doesn't smell like a Claude Code capability. The Claude-Code-specific path is one of five strategies, not the whole skill.
Only what gh indexes. There are real-world repos that exist but aren't searchable via gh for hours after creation (indexing lag), and topics that aren't applied to otherwise-relevant repos (false negatives). The skill mitigates both — running the primary search plus a niche-lane code search inside SKILL.md / plugin.json catches a lot of the gaps — but you cannot count on this finding a 1-star repo created 30 minutes ago.
The skill partitions on stars: ranges. Both REST and GraphQL search cap at 1000 results regardless of paging — the workaround is to slice the query into chunks of ≤1000 and union the results. The slicing logic is in references/search-api-cheatsheet.md.
Yes:
python3 scripts/score_repo.py owner/repo --keywords "kw1,kw2,kw3"Returns JSON. Useful for one-off ranking or ad-hoc checks.
Read the breakdown. Every score has a per-signal table — if one signal is dragging it down (e.g. issue_health: 0.205 because the repo has many open issues), you decide whether that matters for your use case. The score is opinionated; the breakdown lets you override.
Marginally. Three sub-agents at ~10-15 tool calls each = ~$0.05–$0.15 per run on Sonnet/Opus. The gh and OpenSSF Scorecard calls are free. Compare to your time spent doing this manually.
The skill is opinionated about workflow but flexible about content. Quick pointers:
| Want to change | Edit |
|---|---|
| Category router rules | The strategy table in SKILL.md |
| Scoring weights | The composite formula in scripts/score_repo.py and the rubric in references/scoring-rubric.md |
| Sub-agent brief structure | references/sub-agent-brief-template.md |
| Slop detection patterns | The SUPERLATIVES and LLM_TICS constants in score_repo.py |
| Recency thresholds (90d / 30d / 12mo) | SKILL.md Step 3 — the lane definitions |
| OpenSSF Scorecard fallback (default 0.5) | The fetch_scorecard function in score_repo.py |
Possible future enhancements (not promises):
- GraphQL-first scoring path — single round-trip per candidate instead of 3-4 REST calls
- Awesome-list ingestion — automatically include curated entries from
hesreallyhim/awesome-claude-codeand friends as a fourth signal - Embedding-based relevance — cosine similarity between README and category brief for the top-50 candidates (currently lexical only)
- Detect AI-generated commits — flag repos where most commits are from a bot account or have LLM-voice messages
- Time-window selector — let the user pick 30d / 90d / 6mo / 12mo activity windows at runtime
- JSON output mode —
--jsonflag for machine-readable output, for piping into dashboards
PRs welcome.
When proposing changes, include:
- A real run output showing the change in action (paste the markdown export from chat)
- Updated documentation if the workflow changed
- Reproduction recipe if you fixed a bug
PRs that:
- Improve the category router → likely accepted
- Improve the scoring rubric (with evidence) → likely accepted
- Add new query strategies → likely accepted
- Restructure the dispatch → discuss in an issue first
These are real things that bit early users:
ghnot authenticated — the script will warn butghcalls will hit unauth rate limits (10/min vs 30/min). Rungh auth loginbefore first use.- Indexing lag — newly pushed READMEs and freshly applied topics aren't searchable for minutes to hours. Very recent repos may not show up; fall back to direct
gh api repos/{o}/{r}if you have a candidate name. - OpenSSF Scorecard misses small repos — only ~1M repos are pre-computed in the public dataset. Smaller repos get the default 0.5 (neutral). This isn't a fail; it just means the signal didn't help or hurt.
- Star inflation on AI/LLM repos — these are the most-faked category. Even after the slop penalty, expect to be skeptical of repos that show 100k+ stars on a single-author project. Cross-check fork ratio and commit history.
- Topics are a precision filter, not a recall filter — many otherwise-relevant repos never apply topics. The skill widens recall via
in:name,description, but if a repo has zero topics applied, the relevance score will lean entirely on description and README density. pushed_atis repo-level, not branch-level — a bot updating a stale branch counts as activity. The GraphQLdefaultBranchRef.target.committedDateis more accurate but costs an extra call.
Built by Ronnie Meagher from four parallel-agent research tracks (April 2026), grounded in:
- The Carnegie Mellon "Six Million Fake Stars" study for the slop-aware rubric
- Anthropic's How we built our multi-agent research system for the parallel-agent brief template
- The OpenSSF Scorecard project for the free quality-signal API
- A worked-example session that motivated the whole project: a category whose natural-language phrasing returned zero hits via repo-name search but twenty real candidates via code search inside
SKILL.md
Built with Claude Code — the skill discovers Claude Code skills, including itself.
Sister project: claude-config-audit — same author, same pattern, but for cleaning up your Claude Code installation instead of finding new things to add to it.
MIT — use it, fork it, modify it, ship it. If you make improvements, PRs back to the main repo are appreciated but not required.
- Claude Code Skills documentation
ghCLI documentation- GitHub REST search API
- GitHub GraphQL search
- OpenSSF Scorecard API
- hesreallyhim/awesome-claude-code — canonical curated list of Claude Code skills/plugins/agents
- skill-creator — the official skill that scaffolded this one
The right query beats the smarter ranker.