Vision
Parley's system-audio tap only captures a meeting you are actively in on your Mac. This epic adds a complementary mode: a locally-driven, anti-detection browser (Camoufox / Playwright stealth) joins meetings using your own real Google account and scrapes live captions into Parley's local analysis.
That unlocks things the tap fundamentally cannot:
- Coach from the side — observe a meeting you're invited to but not actively watching, and coach the person in it (eval engine + next-move on the live transcript).
- AI holds your seat + "catch me up" — step away while the AI keeps the meeting; jump back in and a live rolling summary + key points re-orients you in seconds.
- Juggle multiple meetings — manage several joined meetings at once and switch between them; because each has its own transcript, you instantly pick up wherever you switch.
All of it is local + zero central cloud + no extra STT cost (Google's captions do the transcription, and they carry speaker names). It's the user's real account, so it looks fully legitimate — the browser automation just logs in and joins on their behalf.
Why this shape
- It's the one scenario the local tap can't cover (you're not on the call).
- Caption-scraping is the simplest possible implementation — no audio plumbing, no STT.
- Stays true to Parley's north star: open-source, fast to onboard, no central cloud dependency.
Work breakdown
Risks / open questions
- Consent & recording law (two-party-consent states / GDPR) — mitigated by appearing as a visible, real participant, but add a clear notice.
- Admit-to-meeting / waiting room for external meetings.
- Selector maintenance as Meet's DOM changes.
- Dual presence if the user is also physically in the same meeting on the same account.
- Resource cost of N concurrent browsers; cap concurrency.
- Google Meet only initially; other platforms later.
Vision
Parley's system-audio tap only captures a meeting you are actively in on your Mac. This epic adds a complementary mode: a locally-driven, anti-detection browser (Camoufox / Playwright stealth) joins meetings using your own real Google account and scrapes live captions into Parley's local analysis.
That unlocks things the tap fundamentally cannot:
All of it is local + zero central cloud + no extra STT cost (Google's captions do the transcription, and they carry speaker names). It's the user's real account, so it looks fully legitimate — the browser automation just logs in and joins on their behalf.
Why this shape
Work breakdown
Risks / open questions