Skip to content

taffy-owo/quarry

Repository files navigation

简体中文 | English

🎯 Quarry

Public resource routing engine purpose-built for AI Agents.

Multi-source discovery -> intelligent ranking -> verified delivery.

Version 1.2.0 Python 3.10+ Benchmark Pass Zero Config MIT-0 License


What is this?

Quarry is a resource discovery engine designed to be called by AI Agents (Hermes, OpenClaw, etc.).

It doesn't download files. It finds the best public routes (cloud drive links, magnet URIs, ebook pages) across 28 sources, ranks them by quality, verifies liveness, and returns structured JSON.

User: "Find me Oppenheimer 4K resources"

Agent translates -> hunt.py search "Oppenheimer 2023" --4k --json

Engine returns:
  OK Top 1: Oppenheimer.2023.2160p.BluRay.REMUX -> aliyun link (verified alive)
  OK Top 2: Oppenheimer.2023.1080p.WEB-DL -> magnet (42 seeders)
  Suppressed: Oppenheimer.CAM.720p -> risky quality

Features

🔍 Multi-Source Aggregation

28 source adapters across 3 channels:

Channel Sources What they cover
Cloud Drive upyunso, pansou, ps.252035, panhunt Aliyun, Quark, Baidu, 115, PikPak, Lanzou, etc.
Torrent torznab, nyaa, dmhy, bangumi_moe, eztv, torrentgalaxy, bitsearch, tpb, yts, 1337x, limetorrents, torlock, fitgirl, torrentmac, ext_to Movies, TV, anime, games, music, macOS apps
Book annas (Anna's Archive), libgen (Library Genesis) PDF, EPUB, MOBI, academic papers, fiction & non-fiction

📊 Intelligent Ranking

  • Title-family matching: canonical, phrase, token overlap scoring
  • Quality parsing: resolution, codec, HDR, source type, lossless audio
  • Category-aware: different scoring weights for movie/TV/anime/music/software/book
  • Confidence tiers: top -> related -> risky (suppressed by default)

Pan Link Viability Probe

Cloud drive links die constantly. The engine auto-probes before delivery:

Provider Method Result
Aliyun (AliDrive) Anonymous share API alive / cancelled
Quark (Quark Drive) Share token API alive / expired
Baidu (Baidu Netdisk) Page dead-signal detection alive / removed

Dead links are auto-demoted to risky tier and never shown in text output.

Anti-Bot Layer (Optional)

Priority chain:  httpx -> curl_cffi -> urllib

Install curl-cffi to bypass DDoS-Guard and similar TLS fingerprint checks. Zero config, auto-detected.

🎬 Video Pipeline

Public video URL -> metadata extraction -> optional download:

hunt.py video probe "https://www.bilibili.com/video/BV..."
hunt.py video download "https://youtu.be/..." best

📖 Subtitle Search

On-demand subtitle discovery (user-initiated, not automatic):

hunt.py subtitle "Breaking Bad" --season 1 --episode 1 --lang zh,en --json

Sources: SubDL (multilingual), SubHD (Chinese), Jimaku (Japanese anime).


Quick Start

git clone https://github.com/taffy-owo/quarry.git
cd quarry

# Zero dependencies for basic search
python3 scripts/hunt.py search "Oppenheimer 2023" --4k

# Optional performance extras
pip install httpx                    # HTTP/2 + connection pooling
pip install pycryptodome             # Upyunso encrypted API
pip install curl-cffi                # TLS fingerprint impersonation

Search Examples

# Movies
python3 scripts/hunt.py search "Oppenheimer 2023" --4k --json
python3 scripts/hunt.py search "Oppenheimer 2023" --4k --json --explain

# TV Shows
python3 scripts/hunt.py search "Breaking Bad S05E16" --tv

# Anime
python3 scripts/hunt.py search "Kamiina Botan" --anime

# Music (lossless)
python3 scripts/hunt.py search "Jay Chou Fantasy FLAC" --music

# Software
python3 scripts/hunt.py search "Adobe Photoshop 2024" --software --channel pan

# Books
python3 scripts/hunt.py search "Clean Code epub" --book

# Skip pan link probing (faster, but may include dead links)
python3 scripts/hunt.py search "Interstellar 2014" --no-probe

Diagnostics

python3 scripts/hunt.py sources --probe --json    # Source health check
python3 scripts/hunt.py doctor --json              # System diagnostics + source_health metrics
python3 scripts/hunt.py benchmark                  # Offline precision benchmark
python3 scripts/hunt.py cache stats --json         # Cache statistics
python3 scripts/hunt.py source validate local/sources/my_source.py --json

Updating

Updating is safe regardless of how you installed:

# Git users: just pull
cd quarry && git pull

# ZIP users: download new ZIP, extract over the old folder
# (or delete and re-extract, both work)

Auto-cleanup: On first run after an update, the engine automatically detects and removes deprecated files from previous versions. No manual cleanup needed, even if you extract a ZIP on top of an old installation.

Customization

All user customizations go in the local/ directory, a safe zone that is never overwritten by updates:

local/
├── sources/          # Drop custom SourceAdapter .py files here (auto-discovered)
├── config.json       # Override ranking weights
└── .env              # Override environment variables (takes priority over root .env)

Validate custom adapters before relying on them:

python3 scripts/hunt.py source validate local/sources/my_tracker.py --json

doctor --json also exposes adaptive source-health rows such as success_rate_24h, median_latency_ms, result_yield, top_hit_rate, and recommended_query_budget. Quarry uses these cached metrics conservatively to prefer strong sources and reduce query budget for weak or recently failing ones.

Optional: Token-Based Sources

25 of 28 sources work out of the box. 3 optional sources need credentials for extra coverage:

Source Env Variable How to Get
ps.252035 / panhunt PANSOU_TOKEN Register at linux.do, login at so.252035.xyz, copy JWT from browser cookies
pansou (self-hosted) PANSOU_API_URL Deploy fish2018/pansou, set your instance URL
torznab (Jackett) TORZNAB_URL + TORZNAB_APIKEY Install Jackett, copy API key from dashboard

Add credentials to .env or local/.env:

PANSOU_TOKEN=eyJhbGciOiJIUzI1NiIs...
TORZNAB_URL=http://localhost:9117/api/v2.0/indexers/all/results/torznab
TORZNAB_APIKEY=your-api-key

See references/sources.md for detailed step-by-step instructions. Custom source adapters, ranking tweaks, and env variables in local/ are update-proof. git pull and ZIP updates both leave this directory untouched.


Agent Integration

Quarry is designed as an AI Agent skill. It's meant to be called by Agents, not used directly by humans.

For Hermes / OpenClaw

Agent config files are in agents/:

# agents/hermes.yaml - Agent instructions include:
# - Query translation workflow (CJK -> English)
# - Category-specific routing guidance
# - Result interpretation (link_alive, tiers, penalties)
# - Available command reference

Skill Definition

SKILL.md is the Agent-readable skill contract:

  • When to use: public resource discovery, release comparison, video probing
  • Query normalization: Agent should translate to English/romanized titles first; the engine has a best-effort CJK alias fallback for movie/TV/anime/general queries
  • Result interpretation: how to read link_alive, tier, penalties
  • Category routing: which sources fire first for each content type
  • 13 agent rules: ordering, fallback behavior, format hints

JSON v3 Output

python3 scripts/hunt.py search "Oppenheimer 2023" --json
python3 scripts/hunt.py search "Oppenheimer 2023" --json --explain
{
  "schema_version": "3",
  "query": "Oppenheimer 2023",
  "results": [
    {
      "tier": "top",
      "title": "Oppenheimer.2023.2160p.BluRay.REMUX.HEVC.DTS-HD",
      "link_or_magnet": "https://alipan.com/s/...",
      "provider": "aliyun",
      "source": "upyunso",
      "source_health": {
        "link_alive": true,
        "link_probe_reason": "share active"
      },
      "quality": "2160p BluRay REMUX HDR",
      "confidence": 0.95,
      "match_bucket": "exact_title_family",
      "canonical_identity": "movie:oppenheimer:2023"
    }
  ]
}

With --explain, the response includes an agent-readable explanation:

{
  "explain": {
    "why_top": ["exact title-family match", "year matched 2023"],
    "why_not_others": ["candidate X demoted: dead pan link"]
  }
}

Key fields for Agents:

Field Meaning
tier top = high confidence, related = decent, risky = unreliable
source_health.link_alive true = verified, false = dead (skip it), null = unknown
confidence 0.0 to 1.0 match confidence score
match_bucket exact_title_family, title_family_match, weak_context_match, etc.
canonical_identity Deduplication key (e.g. movie:oppenheimer:2023)

Architecture

flowchart LR
    Q["Query"] --> I["Intent\nParsing"]
    I --> A["Alias\nResolver"]
    A --> S["Multi-Source\nFan-out"]
    S --> N["Normalize"]
    N --> D["Dedup"]
    D --> P{"Pan Probe"}
    P --> R{"Ranking"}
    R -->|"Top / Related"| Out["JSON / Text"]
    R -->|"Risky"| Sup("Suppressed")
    
    style P fill:#d4af37,stroke:#aa7c11,color:#000,stroke-width:2px
    style R fill:#d4af37,stroke:#aa7c11,color:#000,stroke-width:2px
    style Out fill:#10b981,stroke:#059669,color:#fff,stroke-width:2px
    style Sup fill:#ef4444,stroke:#b91c1c,color:#fff
Loading

Routing Matrix

Category Primary -> Fallback Key Signal
Movie Pan -> YTS/TorrentGalaxy/TPB -> 1337x Year match
TV EZTV/TorrentGalaxy/TPB -> Pan S{XX}E{XX}
Anime Nyaa/DMHY/Bangumi Moe -> Pan Romanized title
Book Anna's Archive -> Pan -> 1337x/TorLock Format (pdf/epub)
Music Pan -> DMHY/Nyaa (noise-filtered) Lossless tags
Software Pan -> FitGirl/TorrentMac/TorrentGalaxy Platform hint

Project Layout

quarry/
├── scripts/
│   ├── hunt.py                    # CLI entrypoint
│   └── quarry/
│       ├── engine.py              # Search orchestration
│       ├── intent.py              # Query -> Intent -> SearchPlan
│       ├── ranking.py             # Scoring, tiers, deduplication
│       ├── pan_probe.py           # Cloud drive link viability probe
│       ├── parsers.py             # Release tag parsing (resolution, codec, HDR)
│       ├── config.py              # RankingConfig weights
│       ├── cache.py               # SQLite WAL cache
│       ├── source_validation.py   # Custom SourceAdapter contract validator
│       ├── video_core.py          # Public video pipeline (yt-dlp)
│       ├── subdl.py / subhd.py / jimaku.py   # Subtitle sources
│       └── sources/               # 28 source adapters
│           ├── base.py            # HTTPClient (httpx -> curl_cffi -> urllib)
│           ├── upyunso.py         # Cloud drive aggregator (AES encrypted API)
│           ├── pansou.py          # PanSou self-hosted pan aggregation API
│           ├── nyaa.py            # Anime torrents (RSS)
│           ├── dmhy.py            # 動漫花園 Chinese anime community (RSS)
│           ├── bangumi_moe.py     # Bangumi Moe anime torrents (JSON API)
│           ├── torrentgalaxy.py   # TorrentGalaxy general tracker (RARBG alt)
│           ├── torlock.py         # TorLock verified torrents
│           ├── ext_to.py          # EXT.to modern magnet search
│           ├── annas.py           # Anna's Archive books (HTML scraper)
│           ├── torznab.py         # Jackett/Prowlarr meta-indexer
│           └── ...                # eztv, bitsearch, tpb, yts, 1337x, etc.
├── agents/
│   ├── hermes.yaml                # Hermes Agent skill config
│   └── openclaw.yaml              # OpenClaw Agent skill config
├── local/                         # User safe zone (gitignored contents)
│   ├── sources/                   # Custom source adapters (auto-discovered)
│   ├── config.json                # Ranking weight overrides
│   └── .env                       # Environment variable overrides
├── tests/                         # 39 unit, precision, CLI, video, and benchmark tests
├── references/                    # Architecture, usage, source docs
├── SKILL.md                       # Agent-readable skill contract
├── CHANGELOG.md
└── pyproject.toml

Scope

What this does What this doesn't do
Find public download routes Download files
Rank results by quality Bypass DRM or logins
Verify cloud drive link liveness Access private trackers
Provide structured JSON for Agents Guarantee legality or permanence

Requirements

Component Dependency Required?
Core search Python 3.10+ Yes
HTTP acceleration httpx Optional
TLS impersonation curl-cffi Optional
Upyunso API pycryptodome Optional
Video pipeline yt-dlp + ffmpeg Optional

Contributing

AI coding agents: Read CONTRIBUTING.md before making any changes.
User customizations go in local/, not in scripts/.

# Run benchmark before PR
python3 scripts/hunt.py benchmark

# Run tests
python -m pytest tests/ -v

License

MIT-0, no attribution required.

Feedback and Issues

If you encounter any bugs, have feature requests, or need help with custom source adapters, please open an issue on GitHub. Pull requests are also highly welcome!

About

High-recall public resource discovery for AI agents, with tiered ranking, source health, and stable JSON output.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages