BeatCheck Web

A bare-bones, self-hosted RSS reader with a Claude-powered AI pipeline that scores, clusters, and curates news for a single user. Built with FastAPI + HTMX + SQLite. Designed to become an agentic tool for turning news into newsletters, websites, podcasts, or other outputs over time.

Inspired by and partially derived from Leo Laporte's BeatCheck.

What it does

Fetches RSS and Atom feeds on a schedule (and on demand).
Scores every new article 0–1 for relevance to your interests, using Claude Haiku and a short free-form description of your geographic and subject-matter focus (plus an implicit signal from titles you recently starred or bookmarked).
Clusters articles that cover the same story across multiple feeds so cross-source coverage is detectable at a glance.
Curates the highest-score recent articles into AI-picked suggestions with a one-sentence "why" from the model.
Surfaces everything in a single unified Top Stories tab, ranked by a blend of AI score, cross-feed popularity, and curator confidence.
Summarizes any article on demand using the nut-graph structure, fetching the full article text via trafilatura when the RSS excerpt is too short.
Resolves Google News RSS wrapper URLs to their real destinations at ingest so downstream scoring, clustering, and summarization all work on the actual source URL.
Blocklist support — a comma-separated list of terms in the Scoring dialog filters matching articles out at ingest.
Raindrop.io bookmark integration — one click stars an article and saves it with your default tags.

Requirements

Python 3.11 or newer
An Anthropic API key (required for scoring, curation, and summarization)
Optionally, a Raindrop.io access token if you want the bookmark integration
macOS or Linux (Windows is untested but should work)

Quick start

git clone https://github.com/smbrownai/beatcheck_web.git
cd beatcheck_web

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

First run will create ~/.config/beatcheck/config.toml with sensible defaults. Add your Anthropic key there (or export ANTHROPIC_API_KEY) and start the server:

uvicorn main:app --reload

Open http://127.0.0.1:8000 and add feeds from the Feeds dialog. Tell the AI what matters to you in the Scoring dialog. Click Refresh to fetch new items immediately; the background scheduler also refreshes on an interval.

Configuration

Configuration lives in ~/.config/beatcheck/config.toml:

db_path = "~/.config/beatcheck/feeds.db"
raindrop_token = ""              # optional
anthropic_api_key = ""           # required for AI features
refresh_interval_minutes = 30
retention_days = 7
default_tags = ["rss"]
host = "127.0.0.1"
port = 8000

Secrets can also come from environment variables (ANTHROPIC_API_KEY, RAINDROP_TOKEN) — useful for deployments where you don't want keys on disk.

The keyword blocklist is stored at ~/.config/beatcheck/blocklist.txt (one term per line) but is also editable through the Scoring dialog. Matching is whole-word and case-insensitive, applied at ingest before articles reach the database.

How ranking works

The Top Stories tab blends three signals per article:

Signal	Source	Weight
AI score	Claude Haiku scores each article 0–1 against your preferences	0.55
Popularity	Distinct feeds covering the same clustered story, capped at 3	0.30
Curator boost	Confidence of the curator's pending "suggest" verdict	0.15

An article surfaces in Top Stories only if at least one signal clears its gate: AI score ≥ 0.35, popularity ≥ 2 feeds, or a pending curator pick (or the article is starred). Unscored articles with no other signal are hidden until the scorer reaches them.

Small badges on each row indicate why a story surfaced: 🔥 for multi-source popularity, ✨ for curator picks.

Architecture

FastAPI for the server and HTMX endpoints — no SPA framework, just server-rendered HTML fragments swapped into the page.
APScheduler for periodic RSS refresh and the post-fetch pipeline.
SQLite (via aiosqlite) for all persistent state.
feedparser + trafilatura for feed parsing and full-text extraction.
Anthropic Python SDK with prompt caching for scoring, curation, and summarization on Claude Haiku 4.5.

The post-fetch pipeline (score → curate suggestions → retention → rebuild clusters) lives in services/pipeline.py and is called by both the scheduled job and the manual Refresh endpoint, so every refresh — however triggered — does the same work.

Directory layout

main.py              FastAPI app entrypoint
config.py            TOML + env config loader
database.py          SQLite schema + async helpers
scheduler.py         APScheduler setup
routers/             HTTP endpoints (articles, feeds, scoring, etc.)
services/            Fetching, scoring, curation, clustering, summarization
templates/           Jinja2 templates + HTMX partials
static/              CSS + minimal JS

Development

Pre-commit hooks

This repo uses pre-commit with two scanners:

gitleaks — blocks commits that contain an Anthropic or Raindrop key (or other common secrets).
pip-audit — scans requirements.txt for known CVEs whenever that file changes.

After cloning, install them once:

pip install pre-commit
pre-commit install

Every git commit then scans the staged diff against .gitleaks.toml. To run the scan on demand:

pre-commit run --all-files

If gitleaks ever flags a false positive, extend the [allowlist] section of .gitleaks.toml rather than skipping the hook.

License

MIT — see LICENSE. Portions based on BeatCheck by Leo Laporte, also MIT-licensed.

Acknowledgments

Leo Laporte for the original BeatCheck concept.
Anthropic for Claude and the prompt-caching API.
The FastAPI, HTMX, trafilatura, and feedparser communities.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BeatCheck Web

What it does

Requirements

Quick start

Configuration

How ranking works

Architecture

Directory layout

Development

Pre-commit hooks

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
routers		routers
services		services
static		static
templates		templates
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
config.py		config.py
database.py		database.py
deps.py		deps.py
main.py		main.py
models.py		models.py
requirements.txt		requirements.txt
scheduler.py		scheduler.py
state.py		state.py

Folders and files

Latest commit

History

Repository files navigation

BeatCheck Web

What it does

Requirements

Quick start

Configuration

How ranking works

Architecture

Directory layout

Development

Pre-commit hooks

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages