Personal AI news archiver. Runs on your laptop, summarizes with a local model, stores and serves through Git. Zero monthly cost.
Every morning I'd open Gmail, scroll Twitter, hop over to Hacker News — and somehow read the same OpenAI announcement over and over while missing the small Anthropic update that actually mattered. Tabs full of the same story, none of the signal.
Syndicate is the fix.
The constraint that shaped the architecture was simple: no server, no monthly bill. Which meant re-inventing the usual "Postgres + cron + S3 + CDN + Vercel" stack as things I already had at home:
| 🧠 | Compute | My laptop, on a regular schedule via launchd. No daemon, no always-on box. A missed cycle catches up cleanly within the rolling ingest window — longer absences lose older items. |
| 🤖 | AI | A local small language model (Ollama-served gemma4) for summarization, a local embedding model for dedup. Zero API spend. |
| 📦 | Storage | A second Git repo (news-archive) holds the daily JSON output. Versioned for free, no DB to host. |
| 🌐 | Hosting | GitHub Pages serves the PWA. It fetches its data straight out of news-archive. No backend, no CDN bill. |
Total operational cost: electricity. Total infrastructure: my laptop and two Git repos.
| How to use it | What you run |
|---|---|
| 🧑💻 Claude Code plugin conversational, agent-driven |
|
| ⚙️ Direct CLI cron-friendly text output |
|
| 📱 Read on phone/desktop PWA, no install |
aadarshvelu.github.io/syndicate Works offline. Add to Home Screen for native-app feel. |
Long-form install walkthrough, env-loading mechanics, and publishing notes live in INSTALL.md.
aadarshvelu.github.io/syndicate — static React/Vite PWA on GitHub Pages. Reads per-day JSON straight from the news-archive repo, caches in IndexedDB, works offline once loaded. No accounts, no backend, no data leaves the device.
Install it as a phone app (takes 10 seconds):
| 📱 iOS Safari | Open the link → Share → Add to Home Screen → Add |
| 🤖 Android Chrome | Open the link → ⋮ menu → Install app (or Add to Home screen) |
| 💻 Desktop Chrome / Edge | Open the link → address-bar install icon (⊕ in the right side) → Install |
After install, the PWA launches full-screen like a native app. The service worker caches the bundle so subsequent opens work without network — only the day's feed JSON is fetched fresh.
Gmail newsletters, RSS feeds, and Twitter — all collected into one SQLite.
The feed list lives in config/. Add a source, restart the next
run, it shows up in the archive. No service to redeploy.
Cross-channel duplicates collapse into clusters before summarization sees them — exact URL → fuzzy text → simhash → semantic embedding. I only pay the LLM once per story, not once per source. (And with a local model, even "paying once" is near-free.)
Provider is one env var (AI_PROVIDER=ollama|anthropic|openai|gemini|minimax).
Default is Ollama because it's free and runs locally. Swap to any
LiteLLM-supported provider with one row in
pipeline/AI/lm.py — no other code changes.
A React/Vite PWA hosted on GitHub Pages reads JSON from the news-archive repo, caches in IndexedDB, ranks by per-category preference with a 7-day decay. Likes are weighted (reactions count half) so a viral story doesn't pollute next week's feed.
Each box is a real module under pipeline/. Decision nodes
carry the actual thresholds used in code, not approximations.
flowchart TD
subgraph SRC[Sources]
S1[Gmail<br/>IMAP rolling window]
S2[RSS<br/>HTTP fetch of configured feeds]
S3[Twitter<br/>Playwright on configured handles]
end
SRC --> ING
subgraph ING["Stage 1 · Ingest — pipeline/ingestion/"]
I1[Fetch raw items]
I2[URL canonicalize<br/>strip tracking params, unwrap redirects]
I3{URL exists in items?}
I3 -- yes --> I4[Skip]
I3 -- no --> I5[Insert row as primary]
I1 --> I2 --> I3
end
ING --> LINK
subgraph LINK["Stage 2 · Relation linker — pipeline/relation/"]
L1[Build embedding per news item]
L2[For each tweet: nearest news by cosine]
L3{Above similarity threshold?}
L3 -- no --> L4[Standalone tweet]
L3 -- yes --> L5{Tweet posted BEFORE matched news?}
L5 -- yes --> L6[Scoop<br/>relation=standalone<br/>+ parent_cluster_id]
L5 -- no --> L7[Reaction<br/>relation=reaction<br/>+ parent_cluster_id]
L1 --> L2 --> L3
end
LINK --> DEDUP
subgraph DEDUP["Stage 3 · Dedup T1–T4 — pipeline/dedup/"]
D1{T1 exact URL or title?}
D2{T2 fuzzy text + recent?}
D3{T3 simhash near-match?}
D4{T4 semantic embedding match?}
D5[New singleton cluster]
D6[Join existing cluster]
D7[pick_primary<br/>official > aggregator > newsletter > unknown]
D1 -- yes --> D6
D1 -- no --> D2
D2 -- yes --> D6
D2 -- no --> D3
D3 -- yes --> D6
D3 -- no --> D4
D4 -- yes --> D6
D4 -- no --> D5
D5 --> D7
D6 --> D7
end
DEDUP --> SUM
subgraph SUM["Stage 4 · Summarize — pipeline/AI/"]
SM1[Pick primary items where summary IS NULL]
SM2[Merge cluster content<br/>primary title + member bodies]
SM3[DSPy ChainOfThought via configured provider]
SM4[Emit key_facts + teaser + summary<br/>+ importance + category]
SM5{Hot cluster?}
SM5 -- yes --> SM6[Bump importance]
SM5 -- no --> SM7[Importance unchanged]
SM1 --> SM2 --> SM3 --> SM4 --> SM5
end
SUM --> EXP
subgraph EXP["Stage 5 · Export — pipeline/git_export.py"]
E1[Recent days from DB]
E2[Write news-archive/<Year>/<Month>/<dd-Mon-yy>.json]
E3[git add + commit + push]
E1 --> E2 --> E3
end
EXP -- "git push HTTPS" --> ARC[(news-archive<br/>GitHub repo<br/>public, per-day JSON)]
The whole pipeline shares one SQLite at db/snapshot.db and emits a
JSON envelope per stage so any agent / cron / skill can drive it. Detailed
stage docs live alongside the code:
pipeline/dedup/doc.md,
pipeline/AI/doc.md,
pipeline/doc.md.
The frontend is a static bundle on GitHub Pages. It never talks to my
laptop — it only fetches per-day JSON files from news-archive, caches
them in the browser, and works offline once loaded. No backend, no
accounts, no server-side anything.
Every like, every read, every swipe lives in the browser's local storage. Nothing leaves the device. The ranking model is small enough to explain in one paragraph:
- Each like contributes a weight toward the category and source it belongs to.
- Older likes decay smoothly, so a story that mattered last month doesn't permanently colour next week's feed.
- Reactions count at a lighter weight than primary news — a viral cluster with several reaction-likes shouldn't dominate the future feed as if they were independent signals.
- Total stored likes are capped; the oldest get evicted when new ones arrive, so the model can't grow unbounded.
- The final score for any unread item combines the AI's importance rating with the user's accumulated category and source preferences.
The result: a feed that re-orders itself around what someone actually reads, without an account, without a recommendation server, without their data ever leaving the browser tab.
| 🩺 Inspection auto-invocable, read-only |
status · heal |
| 📥 Ingest user-only, writes DB |
ingest-gmail · ingest-rss · ingest-twitter |
| ⚙️ Process user-only, writes DB |
link-relations · dedup · summarize |
| 📤 Publish user-only, external side-effects |
export (git push) · notify (Telegram) |
| 🚀 Run chains all of the above |
run — parity with uv run syndicate |
Side-effect skills carry disable-model-invocation: true, so Claude won't
fire them by accident. You invoke them explicitly. See
INSTALL.md for the per-skill env requirements.
- It's local. Skills read your
.env, write to local SQLite, and talk to Ollama onlocalhost. Claude Code reaches all of those. Claude's chat web app can't — that runtime is sandboxed off from your machine. - Twitter scraping is fragile. Playwright + a persistent Chrome profile. When X.com changes its DOM, the selectors break and I update them. Skip Twitter if you don't want that maintenance.
- Tuned for my reading. Categories, importance heuristics, and the feed
list reflect what I want to see. Easy to retune — see the category enum in
pipeline/AI/.
Issues and PRs welcome. Module-level docs live next to the code:
pipeline/*/doc.md. Start there before editing — they describe
what each module is and isn't responsible for.
MIT. Copyright (c) 2026 Aadarsh Velu.
