📖 PageWhisper

Transform static PDFs into interactive, voice-synthesized personas.

PageWhisper is a high-performance AI SaaS that leverages a cutting-edge RAG (Retrieval-Augmented Generation) pipeline to allow users to have real-time, human-like conversations with their books.

Live Demo

📍 Table of Contents

🚀 Key Features
🏗️ Project Architecture
🔄 Workflow Diagram
📊 Data Flow Diagram (RAG & Voice)
💻 Tech Stack
- 🛠️ Technical Breakdown
🛠️ Installation & Setup
- 🚀 Getting Started
🔑 Environment Variables
⚙️ Configuration Details
🔐 Security & Optimization
📈 Live Dashboard & SaaS Metrics

🚀 Key Features

🎙️ Real-time Voice Conversations: Powered by Vapi and 11 Labs for ultra-low latency, human-like dialogue.
📂 Intelligent PDF RAG: Custom pipeline that segments PDFs into 500-word chunks for highly accurate context retrieval.
💳 Tiered SaaS Subscriptions: Managed via Clerk Billing (Free, Standard, and Pro tiers).
🖼️ Automated Metadata: Auto-generates book covers from the first page of uploaded PDFs via pdfjs-dist.
🔍 Global Search: Optimized case-insensitive search across title and author using MongoDB Text Indices.
📱 Cinematic UI: "Dark Mode" and "Beige Literary" aesthetics built with Tailwind CSS and Shadcn UI.

🏗️ Project Architecture

PageWhisper is built on a Decoupled Client-Server Architecture optimized for edge performance, security, and real-time audio streaming.

Core Tech Stack Overview

🧩 System Components

1* Frontend (Next.js 15+ App Router): React Server Components (RSC) manage SEO and initial data fetching. Client Components handle the interactive Voice UI and audio streaming.

2* Voice Orchestration (Vapi AI): A managed service that handles the complexity of WebRTC audio streams, transcription (STT), and turn-taking logic.

3* Database (MongoDB Atlas): An indexed document store containing book metadata and, crucially, the segmented text chunks for RAG.

4* Object Store (Vercel Blob): Edge-optimized storage for raw PDF binaries and the automatically extracted cover images.

5* Identity & Billing (Clerk): Unified session management (JWT) and SaaS subscription enforcement (Free vs. Pro).

🔄 System Architecture Diagram

This high-level diagram visualizes how data flows between the user, your Next.js application, and the crucial third-party AI/Data services.

graph TD
    %% User/Client Side
    subgraph Client ["<font color='#FFFFFF'>Browser / Client-Side</font>"]
        User["<font color='#000000'>User Interface</font>"]
        VoiceUI["<font color='#000000'>Vapi SDK Voice Controls</font>"]
        PDFUpload["<font color='#000000'>PDF/Cover Upload</font>"]
    end

    %% Next.js App
    subgraph NextJS ["<font color='#FFFFFF'>Next.js App Router (Vercel)</font>"]
        Middleware["<font color='#000000'>Clerk Middleware (Auth Proxy)</font>"]
        ServerActions["<font color='#000000'>Server Actions (Secure SSI)</font>"]
        API_RAG["<font color='#000000'>API Route (/api/vapi/search-book)</font>"]
    end

    %% Third-Party Services
    subgraph AIServices ["<font color='#FFFFFF'>AI & Media Stack</font>"]
        Vapi["<font color='#000000'>Vapi AI Orchestrator</font>"]
        ElevenLabs["<font color='#000000'>11 Labs (TTS Persona)</font>"]
    end

    subgraph DataStack ["<font color='#000000'>Data & Storage Stack</font>"]
        VercelBlob[("<font color='#000000'>Vercel Blob Storage</font>")]
        MongoDB[("<font color='#000000'>MongoDB Atlas</font>")]
        ClerkBilling["<font color='#000000'>Clerk Billing (SaaS Gating)</font>"]
    end

    %% Relationships
    User --> PDFUpload
    PDFUpload --> ServerActions
    ServerActions --> VercelBlob
    ServerActions --> MongoDB
    ServerActions --> ClerkBilling

    User --> VoiceUI
    VoiceUI <==> Vapi
    Vapi <--> ElevenLabs
    
    Vapi ==> API_RAG
    API_RAG ===> MongoDB

    %% Styling
    classDef client fill:#FF69B4,stroke:#333,stroke-width:2px;
    classDef app fill:#9370DB,stroke:#333,stroke-width:1px;
    classDef service fill:#E0FFE0,stroke:#333,stroke-width:1px;
    classDef data fill:#FFFFE0,stroke:#333,stroke-width:1px;

    class User,VoiceUI,PDFUpload client;
    class NextJS,Middleware,ServerActions,API_RAG app;
    class AIServices,Vapi,ElevenLabs service;
    class DataStack,VercelBlob,MongoDB,ClerkBilling data;

🔄 Workflow Diagram

This diagram illustrates the lifecycle of a book from upload to conversation readiness.

graph TD
    A[User Uploads PDF] --> B{Server Action}
    B --> C[Parse PDF Content]
    C --> D[Extract Cover Image]
    D --> E[Segment Text into Chunks]
    E --> F[Store in MongoDB Atlas]
    F --> G[Upload Files to Vercel Blob]
    G --> H[Book Ready for Voice Chat]

📊 Data Flow Diagram (RAG & Voice)

When a user speaks, the system performs a "Tool Call" to fetch relevant book segments before responding.

sequenceDiagram
    participant U as User (Browser)
    participant V as Vapi AI
    participant S as Next.js API (Search Tool)
    participant DB as MongoDB (Segments)

    U->>V: Voice Input (Audio)
    V->>V: Transcribe (STT)
    V->>S: Call 'search_book' (Query + BookID)
    S->>DB: Text Search (Regex/Index)
    DB-->>S: Return Relevant Chunks
    S-->>V: Return Contextual Text
    V->>V: LLM (Context + Persona)
    V-->>U: Synthesized Voice (TTS)

💻 Tech Stack

Frontend & Core

AI & Voice Orchestration

Backend & Infrastructure

🛠️ Technical Breakdown

Layer	Technology	Usage
Framework	Next.js 15+	App Router, Server Actions, and Turbopack for lightning-fast builds.
Language	TypeScript	Strict type safety across the RAG pipeline and API responses.
Voice Engine	Vapi AI	WebRTC orchestration for ultra-low latency (<500ms) voice loops.
Speech Synth	11 Labs	Flash 2.5 model for high-fidelity, emotional human personas.
Database	MongoDB Atlas	Storing document metadata and indexed text segments for context retrieval.
Object Store	Vercel Blob	Edge-optimized storage for PDF binaries and generated assets.
Authentication	Clerk	Secure OIDC identity management and JWT session handling.
SaaS Billing	Clerk Billing	Tiered subscription enforcement (Free, Standard, Pro).
PDF Engine	PDF.js	Client-side parsing and cover image extraction from raw buffers.

⚙️ Architecture Highlights

Hybrid Rendering: Uses React Server Components (RSC) for library fetching and Client Components for the real-time audio visualizer.
Smart Throttling: Next.js Middleware manages rate-limiting and session validation to prevent API abuse.
Indexed Search: Implements MongoDB Text and Sparse Indices to ensure book context is retrieved in sub-100ms for the AI.

🛠️ Installation & Setup

Follow these steps to get a local copy of PageWhisper up and running.

📋 Prerequisites

Node.js 20+ (Recommended: Node 24 for the latest Turbopack features)
npm or pnpm
Vercel CLI (npm i -g vercel)
Accounts for: Clerk, MongoDB Atlas, Vercel, and Vapi AI.

🚀 Getting Started

Clone the Repository:

   git clone [https://github.com/salonyranjan/page-whisper.git](https://github.com/salonyranjan/page-whisper.git)
   cd page-whisper

2. Install Dependencies:

npm install

Authenticate & Link Vercel: This project uses Vercel Blob and Environment Variables managed by Vercel.

vercel login
vercel link

Sync Environment Variables: Pull the production variables into your local .env.local file:

vercel env pull .env.local

Initialize Development Server:

npm run dev

Your app should now be running at http://localhost:3000.

🔑 Environment Variables

To run this project, you will need to add the following variables to your .env.local file.

Note: For local development, it is highly recommended to use the Vercel CLI command vercel env pull .env.local to sync these securely from your dashboard.

`.env.local` Template

# Clerk Authentication
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=pk_test_...
CLERK_SECRET_KEY=sk_test_...
NEXT_PUBLIC_CLERK_SIGN_IN_URL=/sign-in
NEXT_PUBLIC_CLERK_SIGN_UP_URL=/sign-up

# Database
MONGODB_URI=mongodb+srv://...

# Vercel Storage (Blob)
BLOB_READ_WRITE_TOKEN=vercel_blob_rw_...

# Vapi AI (Voice)
NEXT_PUBLIC_VAPI_API_KEY=...
NEXT_PUBLIC_ASSISTANT_ID=...

⚙️ Configuration Details

This project integrates several industry-leading APIs. Below are the specific configuration requirements for each service to ensure the RAG and Voice pipelines function correctly.

🎙️ Vapi AI Configuration

To enable the voice assistant, your Vapi Assistant must be configured with a specific Tool Call:

1. Assistant System Prompt: Use a prompt that instructs the AI to "act as the book" and use the search_book tool for context. 2. Custom Tool: Create a tool named search_book in the Vapi Dashboard: * Method: POST * URL: https://your-deployment.vercel.app/api/vapi/search-book * Parameters: * query (string): The search terms. * bookId (string): The ID of the book to search within.

🔐 Clerk Authentication & Billing

Configure your Clerk dashboard to handle the SaaS lifecycle:

Middleware: Ensure proxy.ts (or middleware.ts) is active to inject auth headers into API routes.
Redirects: Set the following paths in the Clerk Dashboard: * Sign-In: /sign-in * Sign-Up: /sign-up * After Sign-In: /
Billing (Pro Tier): Enable Clerk Billing and create three plans: free, standard, and pro. The app enforces limits based on these exact slugs.

🍃 MongoDB Indexing

For high-performance Retrieval-Augmented Generation (RAG), you must run the following command in your MongoDB Atlas shell to enable text searching:

db.booksegments.createIndex({ content: "text", bookId: 1 })

(back to top)

🔐 Security & Optimization

🛡️ Security Architecture

Security is not an afterthought in PageWhisper. We implement a Zero-Trust Backend philosophy:

Server-Side Identity (SSI): Unlike standard implementations, PageWhisper resolves user identity via Clerk's auth() within Next.js Server Actions. This prevents "ID Spoofing," where a malicious user could attempt to write data to another user's library by modifying client-side payloads.
JWT Handshake Validation: Every Vapi voice session is initialized with a short-lived OIDC token. The background "Tool Calls" between Vapi and our API are secured via shared secrets and session-bound validation.
Clock Skew Mitigation: Custom middleware logic accounts for system clock discrepancies between the client and Clerk's global servers, preventing the "JWT not active yet" loop in high-latency environments.
Blob Access Control: All uploaded PDFs are stored with randomized suffixes and accessed through signed or specific public patterns, ensuring that raw file paths are not easily guessable.

⚡ Performance Optimization

To provide a "conversational" feel, we optimized every layer of the RAG stack:

Optimization	Technique	Impact
Vector-Lite Search	MongoDB Text & Sparse Indexing	Sub-100ms context retrieval without the overhead of a dedicated vector DB.
Smart Chunking	500-Word Windowing	Optimizes the LLM's context window for more accurate "Persona" responses.
Edge Storage	Vercel Blob	Minimizes Time to First Byte (TTFB) for PDF parsing and cover rendering.
Streaming UI	Partial Handshake Listeners	Transcript messages are rendered via partial streams, so the UI updates as the AI thinks.
Dynamic Caching	`force-dynamic` fetching	Ensures that newly uploaded books appear instantly without stale cache issues.

🚀 Scaling the RAG Pipeline

The application uses a custom text-segmentation algorithm that processes raw PDF buffers into a searchable schema. By indexing bookId and content together, the system can scale to thousands of books while maintaining near-instant response times for the Vapi voice agent.

// Optimized MongoDB Indexing Strategy
await db.collection('booksegments').createIndex({ 
  content: "text", 
  bookId: 1 
});

(back to top)

📈 Live Dashboard & SaaS Metrics

PageWhisper integrates real-time monitoring to track the health of the RAG pipeline and the growth of the subscription ecosystem.

📊 Real-Time Service Health

The application's performance is monitored across four primary dimensions:

Metric	Monitoring Provider	Target Benchmark
Voice Latency	Vapi Dashboard	< 500ms (Handshake to TTS)
Auth Success	Clerk Dashboard	99.9% Uptime
Search Accuracy	MongoDB Profiler	< 100ms Query Execution
Storage I/O	Vercel Storage	Zero-fail PDF Buffer Stream

💳 SaaS Lifecycle Management

We utilize Clerk Billing to manage the revenue engine. The dashboard allows for real-time tracking of:

MRR (Monthly Recurring Revenue): Real-time tracking of Standard ($9.99) and Pro ($19.99) conversions.
Subscription Churn: Monitored via Clerk's webhook listeners to automatically revoke access when a plan is cancelled.
Feature Gating: Dynamic enforcement of: * Book Quotas: 1 (Free) vs. 10 (Standard) vs. 100+ (Pro). * Time Caps: Automated session termination via backend countdown hooks when plan minutes are exhausted.

🧪 Conversation Analytics

Every interaction is logged to refine the AI's "Reading Comprehension":

Segment Hit Rate: Tracking which book chunks are most frequently retrieved.
Tool Call Success: Monitoring the search_book API endpoint to ensure the AI always has the context it needs to answer accurately.

🛠️ Maintenance & Scaling

Database Scaling: MongoDB Atlas auto-scaling is enabled to handle spikes in book segment indexing.
Cold Starts: Optimized via Next.js Edge Runtime to ensure the voice handshake is instant, even after periods of inactivity.

🤝 Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

Built with ❤️ by Salony Ranjan

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
app		app
components		components
database		database
hooks		hooks
lib		lib
public/assets		public/assets
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
LICENSE		LICENSE
README.md		README.md
components.json		components.json
eslint.config.mjs		eslint.config.mjs
next.config.mjs		next.config.mjs
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
proxy.ts		proxy.ts
tsconfig.json		tsconfig.json
types.d.ts		types.d.ts

Folders and files

Latest commit

History

Repository files navigation

📖 PageWhisper

📍 Table of Contents

🚀 Key Features

🏗️ Project Architecture

Core Tech Stack Overview

🧩 System Components

🔄 System Architecture Diagram

🔄 Workflow Diagram

📊 Data Flow Diagram (RAG & Voice)

💻 Tech Stack

Frontend & Core

AI & Voice Orchestration

Backend & Infrastructure

🛠️ Technical Breakdown

⚙️ Architecture Highlights

🛠️ Installation & Setup

📋 Prerequisites

🚀 Getting Started

🔑 Environment Variables

.env.local Template

⚙️ Configuration Details

🎙️ Vapi AI Configuration

🔐 Clerk Authentication & Billing

🍃 MongoDB Indexing

🔐 Security & Optimization

🛡️ Security Architecture

⚡ Performance Optimization

🚀 Scaling the RAG Pipeline

📈 Live Dashboard & SaaS Metrics

📊 Real-Time Service Health

💳 SaaS Lifecycle Management

🧪 Conversation Analytics

🛠️ Maintenance & Scaling

🤝 Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`.env.local` Template

Packages