Core problem:
- People cannot read all the content from everyone they follow.
- People usually care about specific interests; most followed content is irrelevant to those interests.
- With follows spread across many platforms, people miss the few updates that are actually relevant.
PCCA solves this by:
- Finding truly relevant content for the user's specific subjects across X, YouTube, LinkedIn, Reddit, Apple Podcasts, Spotify, Substack, Medium, and other sources.
- Collecting posts, videos, transcripts, podcasts, and metadata into a local SQLite database.
- Delivering key ideas as compact Briefs to the user's Telegram.
scenarios.md is the product source of truth. This README is how to run the current implementation today.
You need: macOS or Windows, Python 3.10+, a Telegram bot token from @BotFather, and Google Chrome (or any Chromium-family browser already logged in to the platforms you want to follow).
# 1. clone, isolate, install
git clone <this repo> pcca && cd pcca
python3 -m venv .venv && source .venv/bin/activate
pip install -U pip setuptools wheel
pip install -e ".[dev]"
playwright install chromium
# 2. seed config
cp .env.example .env
# open .env, paste your Telegram bot token after PCCA_TELEGRAM_BOT_TOKEN=
# 3. launch the wizard
pcca run-desktopThe wizard handles everything else through four tabs:
- Config — paste the Telegram bot token, set timezone and Brief time. Leaving the token field blank later preserves the saved token.
- Use — start the local agent, send
/startto your Telegram bot, then describe your first subject in free-form English. Thin one-liners become drafts; the wizard asks for more detail before saving. - Sources — choose a platform and click Get Sources. PCCA imports follows/subscriptions from your already logged-in normal browser session and asks for inline session repair only if needed.
- Sources — prune the list if needed, click Monitor Pending Sources, then click Get Content to collect fresh items.
- Use — click Get Brief next to a subject, or send
/briefsin Telegram. Get Brief automatically rebuilds when new content or changed preferences require it.
That's it. Briefs arrive as separate Telegram messages with 👍 / 👎 / 🔖 / 🚫 / 📖 More buttons on each.
In Telegram, with the bot:
| You want to… | Do |
|---|---|
| Refresh all sources and all subjects | Tap Update Briefs or send /update_briefs |
| Get already-scored Briefs for one subject | Tap Get Briefs or send /briefs |
| React to a Brief | Tap 👍 / 👎 / 🔖 / 🚫 on the Brief message |
| Expand a Brief | Tap 📖 More |
| Give specific feedback | Reply to the Brief with text — "less hype like this", "no cursor content", etc. |
| Create another subject | Describe it in free form: "I want a separate stream for Ukrainian Sole Proprietor regulations." |
| Pause/rename/tune a subject | Tap Edit Subjects |
| Refine a subject | "Refine Vibe Coding: include release notes; exclude motivation" |
| List sources | "List sources for Vibe Coding" |
| See setup checklist | /setup |
If a session expires (you logged out somewhere), PCCA marks the source as
needs_reauth and the wizard surfaces it. As long as you stay logged in to
the platform in your normal browser, PCCA auto-refreshes its cookies before
each scrape. Use Get Sources again to trigger inline session repair when
needed.
- Bot stopped responding. Check
.envforPCCA_TELEGRAM_BOT_TOKEN. If it's empty, paste the token back and restart the agent. Logs at.pcca/logs/pcca.logwill sayTelegram service will be disabledif the token is missing. - No items collected. Use the wizard's Sources → Get Content action, then
check the Debug → Logs tab. Sources flagged
needs_reauthneed session repair from the Sources tab. - Briefs feel stale after preference change. Use
/briefs; it now rebuilds automatically when preferences changed since the last delivered Brief. - A feature says a package is missing or silently falls back. Run
pcca doctor; if anything is missing, runpip install -e ".[dev]"from the repo root and try again.
For deeper debugging, run pcca debug-bundle — it writes a redacted zip with
logs and DB summaries (no raw cookies).
The wizard wraps these; you only need them for headless / debug use.
pcca run-desktop # PyWebView wizard (default entry point)
pcca run-agent # long-lived agent (Telegram bot + non-nightly scheduler)
pcca nightly-once # scheduled one-shot collection (launchd entry point)
pcca run-nightly-once # manual/debug one-shot collection
pcca install-launchd # macOS: schedule nightly-once with wake support
pcca uninstall-launchd # macOS: remove nightly launchd schedule
pcca run-briefs-once # one-shot Brief delivery
pcca rebuild-briefs-once # force-recompute today's Briefs
pcca capture-session --platform x [--browser auto|chrome|arc|brave|edge]
pcca import-follows --subject "Subject Name" --platform x [--limit 150]
pcca youtube-rebackfill-transcripts --clean-livechat-junk
pcca youtube-rebackfill-published-at # fill missing YouTube dates from RSS
pcca audit-content-quality [--clean] # find/flag JS dumps, link lists, marketing spam
pcca audit-sources --platform linkedin # source crawl health and empty-result streaks
pcca doctor # verify installed runtime dependencies
pcca debug-bundle # redacted local support bundlepcca capture-session and pcca import-follows accept any of these platforms:
x, linkedin, youtube, spotify, substack, medium, apple_podcasts.
# Telegram bot — required
PCCA_TELEGRAM_BOT_TOKEN= # from @BotFather
# Scheduling
PCCA_TIMEZONE=UTC
PCCA_NIGHTLY_CRON=0 1 * * * # nightly content collection
PCCA_MORNING_CRON=30 8 * * * # only used when DIGEST_AUTO_SEND=true
PCCA_IN_PROCESS_NIGHTLY=false # default off; use launchd on macOS instead
PCCA_DIGEST_AUTO_SEND=false # default off — Briefs are on-demand via /briefs
PCCA_MIN_BRIEF_RELEVANCE=0.55 # send no-Briefs notice below this top score
# Browser
PCCA_BROWSER_CHANNEL=chrome # or 'bundled' for Playwright Chromium
PCCA_BROWSER_HEADFUL_PLATFORMS=x,linkedin
# Session refresh (auto re-read cookies before scrape)
PCCA_SESSION_REFRESH_ENABLED=true
PCCA_SESSION_REFRESH_COOLDOWN_SECONDS=1800
PCCA_SESSION_REFRESH_BROWSER= # chrome|arc|brave|edge; empty = auto
# Pass-2 summaries / Brief quality
PCCA_GEMINI_API_KEY= # from https://aistudio.google.com/apikey
PCCA_LLM_PROVIDER= # empty = gemini if key exists, else ollama
PCCA_LLM_MODEL= # empty = gemini-2.5-flash or llama3.1:8b
# Ollama local fallback
PCCA_OLLAMA_ENABLED=false
PCCA_OLLAMA_MODEL=llama3.1:8b
PCCA_OLLAMA_BASE_URL=http://localhost:11434
# Logging
PCCA_LOG_LEVEL=INFO # DEBUG for verbose
PCCA_LOG_FILE= # default .pcca/logs/pcca.log; "off" to disable
PCCA_STRICT_DEPS=false # true = fail startup if pyproject deps are missingThe desktop app uses an in-process scheduler for lightweight/non-nightly jobs, but Python cannot wake a sleeping laptop. For reliable overnight collection on macOS, install the launchd schedule:
pcca install-launchd
launchctl list | grep com.pcca.nightlyThis writes ~/Library/LaunchAgents/com.pcca.nightly.plist and schedules
pcca nightly-once using your PCCA_NIGHTLY_CRON hour/minute. The plist sets
Wake=true, so macOS may wake the machine for the job. In practice, keep the
laptop on AC power; closed-lid or battery-only standby can still skip wake
events depending on macOS power settings.
Dedicated launchd run logs are written under .pcca/logs/nightly-YYYY-MM-DD.log.
For best Brief quality, set PCCA_GEMINI_API_KEY from
Google AI Studio. When that key is
present and PCCA_LLM_PROVIDER is empty, PCCA uses gemini-2.5-flash for
Pass-2 summaries and falls back to Ollama if Gemini is unavailable.
To remove the schedule:
pcca uninstall-launchd.pcca/pcca.db SQLite database (subjects, items, scores, …)
.pcca/logs/pcca.log rotating app log
.pcca/browser_profiles/<platform>/ Playwright session profile per platform
.pcca/debug/browser/ screenshots + JSON breadcrumbs from failed scrapes
.pcca/debug/pcca-debug-*.zip redacted support bundles from `pcca debug-bundle`
.env runtime configuration (NOT committed)
PCCA does not drive logins for X / LinkedIn / Google / etc. Instead it reads your real browser's session cookies and injects them into its own Playwright profile. Cookie lifetimes vary:
| Platform | Lifetime |
|---|---|
X (auth_token) |
~30 days, sliding while you keep using X |
LinkedIn (li_at) |
~1 year |
| Spotify / Substack / Medium | long-lived (months+) |
| YouTube / Google (SID family) | rotates aggressively; auto-refresh handles it |
| Apple Podcasts | best-effort (varies by region/account state) |
Supported on macOS today: Chrome, Arc, Brave, Microsoft Edge. Safari and Firefox tracked in tasks.md (T-38). Windows Chromium tracked in T-37D.
Failed browser scrapes save a screenshot + JSON metadata under
.pcca/debug/browser/. Treat these as private debug artifacts — they may
contain logged-in page content. pcca debug-bundle redacts them on export.
Heuristic scoring is Cyrillic-aware (English / Ukrainian / Russian). Gemini
Pass-2 summaries also handle Ukrainian/Russian well in current testing. If you
want a fully local fallback, set PCCA_OLLAMA_ENABLED=true and pull a
multilingual model:
ollama pull llama3.1:8bThe PyWebView desktop wizard is intentionally not yet shipped on Linux (tracked in T-35). On Linux, drive PCCA through the CLI commands above — they work identically.
Phase-1 foundation is in place: collectors for nine platforms, session capture
- auto-refresh, conversational subject creation, per-Brief Telegram delivery, PyWebView wizard. Known gaps and follow-up work live in tasks.md. Notable: a pluggable Learning Strategy that reads button reactions and refinement replies (T-17), full rich-rule preference extraction including author-level conditionals (T-59), and Telegram as a source platform (T-60–T-65) are open.