PageWhisper is a high-performance AI SaaS that leverages a cutting-edge RAG (Retrieval-Augmented Generation) pipeline to allow users to have real-time, human-like conversations with their books.
- 🚀 Key Features
- 🏗️ Project Architecture
- 🔄 Workflow Diagram
- 📊 Data Flow Diagram (RAG & Voice)
- 💻 Tech Stack
- 🛠️ Installation & Setup
- 🔑 Environment Variables
- ⚙️ Configuration Details
- 🔐 Security & Optimization
- 📈 Live Dashboard & SaaS Metrics
- 🎙️ Real-time Voice Conversations: Powered by Vapi and 11 Labs for ultra-low latency, human-like dialogue.
- 📂 Intelligent PDF RAG: Custom pipeline that segments PDFs into 500-word chunks for highly accurate context retrieval.
- 💳 Tiered SaaS Subscriptions: Managed via Clerk Billing (Free, Standard, and Pro tiers).
- 🖼️ Automated Metadata: Auto-generates book covers from the first page of uploaded PDFs via
pdfjs-dist. - 🔍 Global Search: Optimized case-insensitive search across title and author using MongoDB Text Indices.
- 📱 Cinematic UI: "Dark Mode" and "Beige Literary" aesthetics built with Tailwind CSS and Shadcn UI.
PageWhisper is built on a Decoupled Client-Server Architecture optimized for edge performance, security, and real-time audio streaming.
1* Frontend (Next.js 15+ App Router): React Server Components (RSC) manage SEO and initial data fetching. Client Components handle the interactive Voice UI and audio streaming.
2* Voice Orchestration (Vapi AI): A managed service that handles the complexity of WebRTC audio streams, transcription (STT), and turn-taking logic.
3* Database (MongoDB Atlas): An indexed document store containing book metadata and, crucially, the segmented text chunks for RAG.
4* Object Store (Vercel Blob): Edge-optimized storage for raw PDF binaries and the automatically extracted cover images.
5* Identity & Billing (Clerk): Unified session management (JWT) and SaaS subscription enforcement (Free vs. Pro).
This high-level diagram visualizes how data flows between the user, your Next.js application, and the crucial third-party AI/Data services.
graph TD
%% User/Client Side
subgraph Client ["<font color='#FFFFFF'>Browser / Client-Side</font>"]
User["<font color='#000000'>User Interface</font>"]
VoiceUI["<font color='#000000'>Vapi SDK Voice Controls</font>"]
PDFUpload["<font color='#000000'>PDF/Cover Upload</font>"]
end
%% Next.js App
subgraph NextJS ["<font color='#FFFFFF'>Next.js App Router (Vercel)</font>"]
Middleware["<font color='#000000'>Clerk Middleware (Auth Proxy)</font>"]
ServerActions["<font color='#000000'>Server Actions (Secure SSI)</font>"]
API_RAG["<font color='#000000'>API Route (/api/vapi/search-book)</font>"]
end
%% Third-Party Services
subgraph AIServices ["<font color='#FFFFFF'>AI & Media Stack</font>"]
Vapi["<font color='#000000'>Vapi AI Orchestrator</font>"]
ElevenLabs["<font color='#000000'>11 Labs (TTS Persona)</font>"]
end
subgraph DataStack ["<font color='#000000'>Data & Storage Stack</font>"]
VercelBlob[("<font color='#000000'>Vercel Blob Storage</font>")]
MongoDB[("<font color='#000000'>MongoDB Atlas</font>")]
ClerkBilling["<font color='#000000'>Clerk Billing (SaaS Gating)</font>"]
end
%% Relationships
User --> PDFUpload
PDFUpload --> ServerActions
ServerActions --> VercelBlob
ServerActions --> MongoDB
ServerActions --> ClerkBilling
User --> VoiceUI
VoiceUI <==> Vapi
Vapi <--> ElevenLabs
Vapi ==> API_RAG
API_RAG ===> MongoDB
%% Styling
classDef client fill:#FF69B4,stroke:#333,stroke-width:2px;
classDef app fill:#9370DB,stroke:#333,stroke-width:1px;
classDef service fill:#E0FFE0,stroke:#333,stroke-width:1px;
classDef data fill:#FFFFE0,stroke:#333,stroke-width:1px;
class User,VoiceUI,PDFUpload client;
class NextJS,Middleware,ServerActions,API_RAG app;
class AIServices,Vapi,ElevenLabs service;
class DataStack,VercelBlob,MongoDB,ClerkBilling data;
This diagram illustrates the lifecycle of a book from upload to conversation readiness.
graph TD
A[User Uploads PDF] --> B{Server Action}
B --> C[Parse PDF Content]
C --> D[Extract Cover Image]
D --> E[Segment Text into Chunks]
E --> F[Store in MongoDB Atlas]
F --> G[Upload Files to Vercel Blob]
G --> H[Book Ready for Voice Chat]
When a user speaks, the system performs a "Tool Call" to fetch relevant book segments before responding.
sequenceDiagram
participant U as User (Browser)
participant V as Vapi AI
participant S as Next.js API (Search Tool)
participant DB as MongoDB (Segments)
U->>V: Voice Input (Audio)
V->>V: Transcribe (STT)
V->>S: Call 'search_book' (Query + BookID)
S->>DB: Text Search (Regex/Index)
DB-->>S: Return Relevant Chunks
S-->>V: Return Contextual Text
V->>V: LLM (Context + Persona)
V-->>U: Synthesized Voice (TTS)
| Layer | Technology | Usage |
|---|---|---|
| Framework | Next.js 15+ | App Router, Server Actions, and Turbopack for lightning-fast builds. |
| Language | TypeScript | Strict type safety across the RAG pipeline and API responses. |
| Voice Engine | Vapi AI | WebRTC orchestration for ultra-low latency (<500ms) voice loops. |
| Speech Synth | 11 Labs | Flash 2.5 model for high-fidelity, emotional human personas. |
| Database | MongoDB Atlas | Storing document metadata and indexed text segments for context retrieval. |
| Object Store | Vercel Blob | Edge-optimized storage for PDF binaries and generated assets. |
| Authentication | Clerk | Secure OIDC identity management and JWT session handling. |
| SaaS Billing | Clerk Billing | Tiered subscription enforcement (Free, Standard, Pro). |
| PDF Engine | PDF.js | Client-side parsing and cover image extraction from raw buffers. |
- Hybrid Rendering: Uses React Server Components (RSC) for library fetching and Client Components for the real-time audio visualizer.
- Smart Throttling: Next.js Middleware manages rate-limiting and session validation to prevent API abuse.
- Indexed Search: Implements MongoDB Text and Sparse Indices to ensure book context is retrieved in sub-100ms for the AI.
Follow these steps to get a local copy of PageWhisper up and running.
- Node.js 20+ (Recommended: Node 24 for the latest Turbopack features)
- npm or pnpm
- Vercel CLI (
npm i -g vercel) - Accounts for: Clerk, MongoDB Atlas, Vercel, and Vapi AI.
- Clone the Repository:
git clone [https://github.com/salonyranjan/page-whisper.git](https://github.com/salonyranjan/page-whisper.git)
cd page-whisper
2. Install Dependencies:
npm install- Authenticate & Link Vercel: This project uses Vercel Blob and Environment Variables managed by Vercel.
vercel login
vercel link- Sync Environment Variables: Pull the production variables into your local .env.local file:
vercel env pull .env.local- Initialize Development Server:
npm run devYour app should now be running at http://localhost:3000.
To run this project, you will need to add the following variables to your .env.local file.
Note: For local development, it is highly recommended to use the Vercel CLI command
vercel env pull .env.localto sync these securely from your dashboard.
# Clerk Authentication
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=pk_test_...
CLERK_SECRET_KEY=sk_test_...
NEXT_PUBLIC_CLERK_SIGN_IN_URL=/sign-in
NEXT_PUBLIC_CLERK_SIGN_UP_URL=/sign-up
# Database
MONGODB_URI=mongodb+srv://...
# Vercel Storage (Blob)
BLOB_READ_WRITE_TOKEN=vercel_blob_rw_...
# Vapi AI (Voice)
NEXT_PUBLIC_VAPI_API_KEY=...
NEXT_PUBLIC_ASSISTANT_ID=...This project integrates several industry-leading APIs. Below are the specific configuration requirements for each service to ensure the RAG and Voice pipelines function correctly.
To enable the voice assistant, your Vapi Assistant must be configured with a specific Tool Call:
1. Assistant System Prompt: Use a prompt that instructs the AI to "act as the book" and use the search_book tool for context.
2. Custom Tool: Create a tool named search_book in the Vapi Dashboard:
* Method: POST
* URL: https://your-deployment.vercel.app/api/vapi/search-book
* Parameters:
* query (string): The search terms.
* bookId (string): The ID of the book to search within.
Configure your Clerk dashboard to handle the SaaS lifecycle:
- Middleware: Ensure
proxy.ts(ormiddleware.ts) is active to inject auth headers into API routes. - Redirects: Set the following paths in the Clerk Dashboard:
* Sign-In:
/sign-in* Sign-Up:/sign-up* After Sign-In:/ - Billing (Pro Tier): Enable Clerk Billing and create three plans:
free,standard, andpro. The app enforces limits based on these exact slugs.
For high-performance Retrieval-Augmented Generation (RAG), you must run the following command in your MongoDB Atlas shell to enable text searching:
db.booksegments.createIndex({ content: "text", bookId: 1 })Security is not an afterthought in PageWhisper. We implement a Zero-Trust Backend philosophy:
- Server-Side Identity (SSI): Unlike standard implementations, PageWhisper resolves user identity via Clerk's
auth()within Next.js Server Actions. This prevents "ID Spoofing," where a malicious user could attempt to write data to another user's library by modifying client-side payloads. - JWT Handshake Validation: Every Vapi voice session is initialized with a short-lived OIDC token. The background "Tool Calls" between Vapi and our API are secured via shared secrets and session-bound validation.
- Clock Skew Mitigation: Custom middleware logic accounts for system clock discrepancies between the client and Clerk's global servers, preventing the "JWT not active yet" loop in high-latency environments.
- Blob Access Control: All uploaded PDFs are stored with randomized suffixes and accessed through signed or specific public patterns, ensuring that raw file paths are not easily guessable.
To provide a "conversational" feel, we optimized every layer of the RAG stack:
| Optimization | Technique | Impact |
|---|---|---|
| Vector-Lite Search | MongoDB Text & Sparse Indexing | Sub-100ms context retrieval without the overhead of a dedicated vector DB. |
| Smart Chunking | 500-Word Windowing | Optimizes the LLM's context window for more accurate "Persona" responses. |
| Edge Storage | Vercel Blob | Minimizes Time to First Byte (TTFB) for PDF parsing and cover rendering. |
| Streaming UI | Partial Handshake Listeners | Transcript messages are rendered via partial streams, so the UI updates as the AI thinks. |
| Dynamic Caching | force-dynamic fetching |
Ensures that newly uploaded books appear instantly without stale cache issues. |
The application uses a custom text-segmentation algorithm that processes raw PDF buffers into a searchable schema. By indexing bookId and content together, the system can scale to thousands of books while maintaining near-instant response times for the Vapi voice agent.
// Optimized MongoDB Indexing Strategy
await db.collection('booksegments').createIndex({
content: "text",
bookId: 1
});PageWhisper integrates real-time monitoring to track the health of the RAG pipeline and the growth of the subscription ecosystem.
The application's performance is monitored across four primary dimensions:
| Metric | Monitoring Provider | Target Benchmark |
|---|---|---|
| Voice Latency | Vapi Dashboard | < 500ms (Handshake to TTS) |
| Auth Success | Clerk Dashboard | 99.9% Uptime |
| Search Accuracy | MongoDB Profiler | < 100ms Query Execution |
| Storage I/O | Vercel Storage | Zero-fail PDF Buffer Stream |
We utilize Clerk Billing to manage the revenue engine. The dashboard allows for real-time tracking of:
- MRR (Monthly Recurring Revenue): Real-time tracking of Standard ($9.99) and Pro ($19.99) conversions.
- Subscription Churn: Monitored via Clerk's webhook listeners to automatically revoke access when a plan is cancelled.
- Feature Gating: Dynamic enforcement of: * Book Quotas: 1 (Free) vs. 10 (Standard) vs. 100+ (Pro). * Time Caps: Automated session termination via backend countdown hooks when plan minutes are exhausted.
Every interaction is logged to refine the AI's "Reading Comprehension":
- Segment Hit Rate: Tracking which book chunks are most frequently retrieved.
- Tool Call Success: Monitoring the
search_bookAPI endpoint to ensure the AI always has the context it needs to answer accurately.
- Database Scaling: MongoDB Atlas auto-scaling is enabled to handle spikes in book segment indexing.
- Cold Starts: Optimized via Next.js Edge Runtime to ensure the voice handshake is instant, even after periods of inactivity.
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Built with ❤️ by Salony Ranjan