Skip to content

smbrownai/beatcheck_web

Repository files navigation

BeatCheck Web

A bare-bones, self-hosted RSS reader with a Claude-powered AI pipeline that scores, clusters, and curates news for a single user. Built with FastAPI + HTMX + SQLite. Designed to become an agentic tool for turning news into newsletters, websites, podcasts, or other outputs over time.

Inspired by and partially derived from Leo Laporte's BeatCheck.

What it does

  • Fetches RSS and Atom feeds on a schedule (and on demand).
  • Scores every new article 0–1 for relevance to your interests, using Claude Haiku and a short free-form description of your geographic and subject-matter focus (plus an implicit signal from titles you recently starred or bookmarked).
  • Clusters articles that cover the same story across multiple feeds so cross-source coverage is detectable at a glance.
  • Curates the highest-score recent articles into AI-picked suggestions with a one-sentence "why" from the model.
  • Surfaces everything in a single unified Top Stories tab, ranked by a blend of AI score, cross-feed popularity, and curator confidence.
  • Summarizes any article on demand using the nut-graph structure, fetching the full article text via trafilatura when the RSS excerpt is too short.
  • Resolves Google News RSS wrapper URLs to their real destinations at ingest so downstream scoring, clustering, and summarization all work on the actual source URL.
  • Blocklist support — a comma-separated list of terms in the Scoring dialog filters matching articles out at ingest.
  • Raindrop.io bookmark integration — one click stars an article and saves it with your default tags.

Requirements

  • Python 3.11 or newer
  • An Anthropic API key (required for scoring, curation, and summarization)
  • Optionally, a Raindrop.io access token if you want the bookmark integration
  • macOS or Linux (Windows is untested but should work)

Quick start

git clone https://github.com/smbrownai/beatcheck_web.git
cd beatcheck_web

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

First run will create ~/.config/beatcheck/config.toml with sensible defaults. Add your Anthropic key there (or export ANTHROPIC_API_KEY) and start the server:

uvicorn main:app --reload

Open http://127.0.0.1:8000 and add feeds from the Feeds dialog. Tell the AI what matters to you in the Scoring dialog. Click Refresh to fetch new items immediately; the background scheduler also refreshes on an interval.

Configuration

Configuration lives in ~/.config/beatcheck/config.toml:

db_path = "~/.config/beatcheck/feeds.db"
raindrop_token = ""              # optional
anthropic_api_key = ""           # required for AI features
refresh_interval_minutes = 30
retention_days = 7
default_tags = ["rss"]
host = "127.0.0.1"
port = 8000

Secrets can also come from environment variables (ANTHROPIC_API_KEY, RAINDROP_TOKEN) — useful for deployments where you don't want keys on disk.

The keyword blocklist is stored at ~/.config/beatcheck/blocklist.txt (one term per line) but is also editable through the Scoring dialog. Matching is whole-word and case-insensitive, applied at ingest before articles reach the database.

How ranking works

The Top Stories tab blends three signals per article:

Signal Source Weight
AI score Claude Haiku scores each article 0–1 against your preferences 0.55
Popularity Distinct feeds covering the same clustered story, capped at 3 0.30
Curator boost Confidence of the curator's pending "suggest" verdict 0.15

An article surfaces in Top Stories only if at least one signal clears its gate: AI score ≥ 0.35, popularity ≥ 2 feeds, or a pending curator pick (or the article is starred). Unscored articles with no other signal are hidden until the scorer reaches them.

Small badges on each row indicate why a story surfaced: 🔥 for multi-source popularity, ✨ for curator picks.

Architecture

  • FastAPI for the server and HTMX endpoints — no SPA framework, just server-rendered HTML fragments swapped into the page.
  • APScheduler for periodic RSS refresh and the post-fetch pipeline.
  • SQLite (via aiosqlite) for all persistent state.
  • feedparser + trafilatura for feed parsing and full-text extraction.
  • Anthropic Python SDK with prompt caching for scoring, curation, and summarization on Claude Haiku 4.5.

The post-fetch pipeline (score → curate suggestions → retention → rebuild clusters) lives in services/pipeline.py and is called by both the scheduled job and the manual Refresh endpoint, so every refresh — however triggered — does the same work.

Directory layout

main.py              FastAPI app entrypoint
config.py            TOML + env config loader
database.py          SQLite schema + async helpers
scheduler.py         APScheduler setup
routers/             HTTP endpoints (articles, feeds, scoring, etc.)
services/            Fetching, scoring, curation, clustering, summarization
templates/           Jinja2 templates + HTMX partials
static/              CSS + minimal JS

Development

Pre-commit hooks

This repo uses pre-commit with two scanners:

  • gitleaks — blocks commits that contain an Anthropic or Raindrop key (or other common secrets).
  • pip-audit — scans requirements.txt for known CVEs whenever that file changes.

After cloning, install them once:

pip install pre-commit
pre-commit install

Every git commit then scans the staged diff against .gitleaks.toml. To run the scan on demand:

pre-commit run --all-files

If gitleaks ever flags a false positive, extend the [allowlist] section of .gitleaks.toml rather than skipping the hook.

License

MIT — see LICENSE. Portions based on BeatCheck by Leo Laporte, also MIT-licensed.

Acknowledgments

  • Leo Laporte for the original BeatCheck concept.
  • Anthropic for Claude and the prompt-caching API.
  • The FastAPI, HTMX, trafilatura, and feedparser communities.

About

A bare-bones, self-hosted RSS reader with a Claude-powered AI pipeline that scores, clusters, and curates news for a single user.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors