A bare-bones, self-hosted RSS reader with a Claude-powered AI pipeline that scores, clusters, and curates news for a single user. Built with FastAPI + HTMX + SQLite. Designed to become an agentic tool for turning news into newsletters, websites, podcasts, or other outputs over time.
Inspired by and partially derived from Leo Laporte's BeatCheck.
- Fetches RSS and Atom feeds on a schedule (and on demand).
- Scores every new article 0–1 for relevance to your interests, using Claude Haiku and a short free-form description of your geographic and subject-matter focus (plus an implicit signal from titles you recently starred or bookmarked).
- Clusters articles that cover the same story across multiple feeds so cross-source coverage is detectable at a glance.
- Curates the highest-score recent articles into AI-picked suggestions with a one-sentence "why" from the model.
- Surfaces everything in a single unified Top Stories tab, ranked by a blend of AI score, cross-feed popularity, and curator confidence.
- Summarizes any article on demand using the nut-graph structure, fetching the full article text via trafilatura when the RSS excerpt is too short.
- Resolves Google News RSS wrapper URLs to their real destinations at ingest so downstream scoring, clustering, and summarization all work on the actual source URL.
- Blocklist support — a comma-separated list of terms in the Scoring dialog filters matching articles out at ingest.
- Raindrop.io bookmark integration — one click stars an article and saves it with your default tags.
- Python 3.11 or newer
- An Anthropic API key (required for scoring, curation, and summarization)
- Optionally, a Raindrop.io access token if you want the bookmark integration
- macOS or Linux (Windows is untested but should work)
git clone https://github.com/smbrownai/beatcheck_web.git
cd beatcheck_web
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtFirst run will create ~/.config/beatcheck/config.toml with sensible
defaults. Add your Anthropic key there (or export ANTHROPIC_API_KEY) and
start the server:
uvicorn main:app --reloadOpen http://127.0.0.1:8000 and add feeds from the Feeds dialog. Tell the AI what matters to you in the Scoring dialog. Click Refresh to fetch new items immediately; the background scheduler also refreshes on an interval.
Configuration lives in ~/.config/beatcheck/config.toml:
db_path = "~/.config/beatcheck/feeds.db"
raindrop_token = "" # optional
anthropic_api_key = "" # required for AI features
refresh_interval_minutes = 30
retention_days = 7
default_tags = ["rss"]
host = "127.0.0.1"
port = 8000Secrets can also come from environment variables (ANTHROPIC_API_KEY,
RAINDROP_TOKEN) — useful for deployments where you don't want keys on disk.
The keyword blocklist is stored at ~/.config/beatcheck/blocklist.txt (one
term per line) but is also editable through the Scoring dialog. Matching
is whole-word and case-insensitive, applied at ingest before articles reach
the database.
The Top Stories tab blends three signals per article:
| Signal | Source | Weight |
|---|---|---|
| AI score | Claude Haiku scores each article 0–1 against your preferences | 0.55 |
| Popularity | Distinct feeds covering the same clustered story, capped at 3 | 0.30 |
| Curator boost | Confidence of the curator's pending "suggest" verdict | 0.15 |
An article surfaces in Top Stories only if at least one signal clears its gate: AI score ≥ 0.35, popularity ≥ 2 feeds, or a pending curator pick (or the article is starred). Unscored articles with no other signal are hidden until the scorer reaches them.
Small badges on each row indicate why a story surfaced: 🔥 for multi-source popularity, ✨ for curator picks.
- FastAPI for the server and HTMX endpoints — no SPA framework, just server-rendered HTML fragments swapped into the page.
- APScheduler for periodic RSS refresh and the post-fetch pipeline.
- SQLite (via aiosqlite) for all persistent state.
- feedparser + trafilatura for feed parsing and full-text extraction.
- Anthropic Python SDK with prompt caching for scoring, curation, and summarization on Claude Haiku 4.5.
The post-fetch pipeline (score → curate suggestions → retention → rebuild
clusters) lives in services/pipeline.py and is
called by both the scheduled job and the manual Refresh endpoint, so every
refresh — however triggered — does the same work.
main.py FastAPI app entrypoint
config.py TOML + env config loader
database.py SQLite schema + async helpers
scheduler.py APScheduler setup
routers/ HTTP endpoints (articles, feeds, scoring, etc.)
services/ Fetching, scoring, curation, clustering, summarization
templates/ Jinja2 templates + HTMX partials
static/ CSS + minimal JS
This repo uses pre-commit with two scanners:
- gitleaks — blocks commits that contain an Anthropic or Raindrop key (or other common secrets).
- pip-audit — scans
requirements.txtfor known CVEs whenever that file changes.
After cloning, install them once:
pip install pre-commit
pre-commit installEvery git commit then scans the staged diff against .gitleaks.toml.
To run the scan on demand:
pre-commit run --all-filesIf gitleaks ever flags a false positive, extend the [allowlist] section
of .gitleaks.toml rather than skipping the hook.
MIT — see LICENSE. Portions based on BeatCheck by Leo Laporte, also MIT-licensed.
- Leo Laporte for the original BeatCheck concept.
- Anthropic for Claude and the prompt-caching API.
- The FastAPI, HTMX, trafilatura, and feedparser communities.