Echo Scribe - Voice Note Transcription 🎤→📝

Version 1.03 – July 7, 2025

📝 Changelog

v1.03 (2025-07-07)

iOS Shortcut API Integration: Added permanent API keys for every user enabling direct iOS Shortcut integration
Programmatic API Endpoint: New /api/transcribe endpoint with Bearer token authentication for developers
API Key Management: Secure 32-character hex API keys with automatic generation and dashboard management
Enhanced Dashboard: Added API key card with copy/reveal functionality and iOS shortcut download button
Database Migration: Added api_key column to users table with automatic backfill for existing users
Rate Limiting Integration: API endpoints use existing webhook rate limiting (5 requests/minute per user)
Privacy-First API: Same zero-logging policy as email processing with JSON-only responses
Developer Experience: Complete API documentation with curl examples and error handling

v1.0.2 (2025-07-05)

Documentation Enhancement: Comprehensive review and alignment of README and Architecture documentation
Project Structure Verification: Confirmed all components, features, and integrations are properly documented
Security Policy Updates: Verified all security measures and privacy guarantees are accurately documented
Deployment Guide Consistency: Ensured all deployment instructions are complete and accurate
Development Workflow Optimization: Confirmed all development modes and testing procedures are documented

v1.01 (2025-07-05)

Beautiful HTML Email Templates for Enhancements: Enhanced emails (cleaned, summary, quick summary) now use modern, styled HTML templates for a much better reading experience.
Markdown Rendering for Summaries: Summaries and quick summaries are now delivered as formatted HTML, not plain text. Markdown is converted to HTML using the Showdown library.
Showdown Integration: Added Showdown for robust markdown-to-HTML conversion in all enhancement emails.
New Markdown Utility: Added src/lib/markdown.ts for reusable, secure markdown-to-HTML conversion across the codebase.
Consistent Branding: All enhancement emails now match the look and feel of other system emails.
Improved User Experience: Enhanced emails are easier to read, with clear sections, bullet points, and action items.

Turn your voice notes into text in seconds, not minutes.

Tired of listening to long voice messages? This service converts your WhatsApp voice notes (or any audio) into readable text almost instantly. Just email the audio file, and get the words back in under a minute—no apps, no uploads, no fuss.

🚀 What It Does

It's ridiculously simple: Send voice note ➜ Get multiple versions back.

🎤 Record a voice note in WhatsApp (or any app)
📧 Email the audio file to your personal transcription address
⚡ Get raw transcript first (15-30 seconds) - never wait for enhancements
✨ Receive AI enhancements (optional) - cleaned formatting, summaries, action items

Works with everything: M4A, MP3, WAV, OGG files up to 25 minutes long
🔒 Completely private: Audio processed securely and deleted immediately
📱 No tech skills needed: Just email the file—configure preferences once
💰 Almost free to run: Built on free tiers (Vercel Hobby + Cloudflare D1 + Mailgun free)—only OpenAI costs money

Perfect for...

Skimming long voice notes with instant AI summaries and key points
Getting clean, formatted text for documents and professional use
Extracting action items from meeting recordings and voice memos
Feeding enhanced transcripts to AI tools for further processing
Converting voice notes to structured, searchable text records
Accessibility when you can't listen to audio

💰 Cost Breakdown

Almost entirely free to run! Built specifically to leverage free tiers:

Vercel Hobby: Free (100GB bandwidth, 1000 function invocations/month)
Cloudflare D1: Free (5GB storage, 25 million reads/month)
Mailgun: Free (10,000 emails/month)
OpenAI Whisper: ~$0.006/minute of audio ⚡ (Only paid service)

Real cost example: Processing 100 voice notes (avg 2 minutes each) = ~$1.20/month total

🏗️ Technical Overview

Built as a production-ready Next.js 14 application with Google authentication, Cloudflare D1 database, and OpenAI Whisper transcription. Features an innovative "Always Raw + Optional Enhancements" system, user preference management, comprehensive admin dashboard, and real-time voice processing optimized for Vercel deployment.

New in v1.0.2:

The src/lib/markdown.ts utility provides secure, reusable markdown-to-HTML conversion for all enhancement emails, leveraging the Showdown library. This ensures summaries and quick summaries are delivered as beautiful, formatted HTML.
Comprehensive documentation review ensuring all features, security measures, and deployment procedures are properly documented and aligned between README and Architecture documentation.

Privacy-First Architecture: Zero transcript content logging, in-memory-only audio processing, and no persistent storage of voice data. All transcript content is delivered directly via email without being stored on servers, ensuring maximum privacy protection for sensitive voice communications.

📋 Complete Architecture Documentation - Detailed system architecture, component interactions, data flows, and operational considerations.

🎯 Project Status: PRODUCTION READY ✅

All 4 phases successfully implemented with background processing:

✅ Phase 1 - Foundation (Complete)

Next.js 14 with App Router and TypeScript
Google OAuth authentication with NextAuth
Cloudflare D1 database integration
User approval workflow system
Route protection middleware

✅ Phase 2 - Voice Processing (Complete)

Mailgun webhook integration
OpenAI Whisper streaming transcription
Email processing with 60-second timeout optimization
Comprehensive error handling and user feedback
Production-ready voice note processing pipeline
Optimized FormData processing (Simplified from formidable to Next.js native - 100+ lines reduced)

✅ Phase 3 - Admin Interface & Production (Complete)

Complete admin dashboard with DataTable
User dashboard with voice history and instructions
Production security features and rate limiting
Mobile-responsive design with shadcn/ui
Comprehensive error pages and monitoring
Full Vercel deployment optimization

✅ Phase 4 - User Preferences & Background Processing (Complete)

"Always Raw + Optional Enhancements" processing system
User preference management with boolean enhancement flags
Background processing with GPT-4.1 nano (cleanup & summary)
Secure token-based background API with authentication
Multi-email delivery system (raw + enhanced versions)
RESTful preferences API with authentication
Interactive preferences UI with real-time preview
Privacy audit and fixes applied - zero transcript logging guaranteed
Database status tracking for enhancement progress monitoring

🔄 Pending Security Enhancement

Webhook Signature Verification: Complete Mailgun HMAC signature validation for production security

🏗️ Architecture Overview

Hybrid Deployment Strategy

This application uses a hybrid architecture combining the best of both platforms:

Vercel: Hosts the Next.js application and API routes (optimal for React/Next.js)
Cloudflare D1: Provides the database backend (fast, serverless SQLite)
Communication: D1 database accessed via Cloudflare's REST API from Vercel

This setup provides excellent performance, cost efficiency, and leverages each platform's strengths while maintaining privacy-first design principles with zero transcript logging and in-memory-only processing.

Complete Tech Stack

Framework: Next.js 14 (App Router)
Language: TypeScript
Authentication: NextAuth with Google OAuth
Database: Cloudflare D1 (SQLite) - accessed via REST API
UI: shadcn/ui components with Tailwind CSS
Transcription: OpenAI Whisper API
Email: Mailgun for inbound processing
Deployment: Vercel (Hobby tier optimized) + Cloudflare D1
Security: Rate limiting, CSRF protection, security headers
Privacy: Zero transcript logging, in-memory processing, no data persistence

Key Features

🔐 Complete Authentication: Google OAuth with session management
👥 User Management: Admin approval workflow with bulk operations
🎤 Smart Voice Processing: Always-raw + optional enhancements system
⚙️ User Preferences: Interactive preference management for enhancements
🤖 AI Enhancement: GPT-4.1 nano cleanup and summary processing
🔄 Background Processing: Secure token-based background enhancement API
📧 Multi-Email System: Raw transcript + enhanced versions delivered separately
📊 Admin Dashboard: Real-time user analytics and management
👤 User Dashboard: Personal voice history and usage instructions
🛡️ Production Security: Rate limiting, CSRF, security headers, background API tokens
🔒 Privacy-First: Zero transcript logging, in-memory processing, no data storage
📱 Mobile Responsive: Optimized for all device sizes
🚀 Vercel Optimized: Function timeouts and performance tuning
📈 Status Tracking: Database-based enhancement progress monitoring

🔒 Security & Privacy

Privacy-First Architecture

WhatsApp Echo is built with a privacy-first design that prioritizes user data protection:

🚫 Zero Transcript Logging: Voice transcript content is NEVER logged to console, files, or monitoring systems
🧠 In-Memory Processing: Audio files are processed entirely in memory without disk writes
📧 Immediate Delivery: Transcripts delivered via email, not stored on servers
🔍 Metadata-Only Logging: System logs contain only technical data (file size, processing time, success/failure)
🛡️ Privacy-Safe Error Handling: Error objects contain only technical metadata, never transcript content

Security Measures

🔐 Google OAuth Integration: Secure authentication using Google's OAuth 2.0 flow
🎫 JWT Session Management: NextAuth.js handles secure session tokens
🔑 Token-Based Background API: SHA256 authentication for background processing
🛡️ CSRF Protection: Token-based request validation for all forms
⚡ Rate Limiting: Per-user and per-endpoint rate limiting with abuse prevention
📋 Input Validation: Comprehensive file type, size, and format validation
🔒 Security Headers: CSP, XSS protection, clickjacking prevention, HTTPS enforcement
🤖 reCAPTCHA v2: Contact form protection against automated abuse and spam bots

What We DON'T Store

❌ Voice transcript content - Never stored in database or logs
❌ Audio files - Processed in memory only, never written to disk
❌ Sensitive user data - Only metadata and account information stored
❌ Processing content - AI enhancement results not logged or stored

What We DO Store

✅ User account information - Google email, approval status, created date
✅ Processing metadata - File size, duration, processing time, success/failure
✅ User preferences - Enhancement settings (cleanup, summary options)
✅ Technical logs - Performance metrics, error counts (no content)

Security Documentation

📋 Complete Security Policy - Detailed security measures, privacy guarantees, vulnerability reporting, and security best practices.

📂 Complete Project Structure

src/
├── app/
│   ├── admin/
│   │   └── page.tsx                    # Admin dashboard
│   ├── api/
│   │   ├── admin/users/route.ts        # Admin user management API
│   │   ├── auth/[...nextauth]/route.ts # NextAuth configuration
│   │   ├── background/enhance-transcript/route.ts # Background enhancement API
│   │   ├── config/privacy-email/route.ts # Privacy email configuration API
│   │   ├── contact/route.ts            # Contact form API with reCAPTCHA verification
│   │   ├── inbound/route.ts            # Smart webhook handler (always raw + enhancements)
│   │   └── user/preferences/route.ts   # User preferences API
│   ├── contact/
│   │   └── page.tsx                    # Contact form page with reCAPTCHA v2 protection
│   ├── dashboard/
│   │   ├── page.tsx                    # User dashboard
│   │   └── preferences/page.tsx        # User preferences management
│   ├── privacy/
│   │   └── page.tsx                    # Privacy Policy page with CCPA/CPRA compliance
│   ├── terms/
│   │   └── page.tsx                    # Terms of Service page with California law compliance
│   ├── error.tsx                       # Global error page
│   ├── not-found.tsx                   # 404 page
│   ├── globals.css                     # Global styles
│   └── layout.tsx                      # Root layout
├── components/
│   ├── admin/
│   │   ├── admin-stats.tsx             # Admin statistics cards
│   │   └── users-table.tsx             # User management table
│   └── ui/
│       ├── badge.tsx                   # Status indicators
│       ├── button.tsx                  # Interactive buttons
│       ├── card.tsx                    # Content containers
│       ├── input.tsx                   # Form inputs
│       ├── table.tsx                   # Data tables
│       └── version-badge.tsx           # Version display badge component
├── lib/
│   ├── audio.ts                        # Audio processing utilities
│   ├── auth.ts                         # NextAuth configuration
│   ├── database.ts                     # Database operations with preferences
│   ├── enhanced-errors.ts              # Advanced error handling with Sentry integration
│   ├── errors.ts                       # Core error handling system
│   ├── mailgun.ts                      # Email processing
│   ├── markdown.ts                     # Markdown-to-HTML conversion utility for enhancement emails
│   ├── rate-limit.ts                   # In-memory rate limiting
│   ├── security.ts                     # Security middleware
│   ├── utils.ts                        # Utility functions
│   ├── version.ts                      # Version utilities and centralized version management
│   ├── voice-processor.ts              # Background enhancement processing
│   └── whisper.ts                      # OpenAI Whisper integration
├── types/
│   └── index.ts                        # TypeScript definitions
├── utils/
│   ├── env.ts                          # Environment validation
│   └── id.ts                           # ID generation
├── middleware.ts                       # Route protection
└── globals.css                         # Global styles

sql/
└── schema.sql                          # Database schema

public/
├── legal/
│   ├── TERMS_OF_SERVICE.md             # Terms of Service legal document (California law)
│   └── PRIVACY_POLICY.md               # Privacy Policy legal document (CCPA/CPRA compliant)
└── images/
    ├── flickerventures.png             # Company logo
    ├── github.png                      # GitHub icon
    └── logo.png                        # Application logo

Config Files:
├── package.json                        # Dependencies
├── tsconfig.json                       # TypeScript config
├── tailwind.config.ts                  # Tailwind CSS config
├── next.config.mjs                     # Next.js config
├── postcss.config.js                   # PostCSS config
├── vercel.json                         # Vercel deployment config
├── wrangler.toml                       # Cloudflare D1 database management
├── env.example                         # Environment template
└── SECURITY.md                         # Security policy and privacy guarantees

🗄️ Database Schema

Users Table

CREATE TABLE users (
  id           TEXT PRIMARY KEY,          -- cuid() identifier
  google_email TEXT UNIQUE NOT NULL,     -- Google OAuth email
  slug         TEXT UNIQUE NOT NULL,      -- 6-char nanoid for email aliases
  approved     INTEGER NOT NULL DEFAULT 0, -- 0=pending, 1=approved
  created_at   DATETIME DEFAULT CURRENT_TIMESTAMP
);

User Preferences Table

CREATE TABLE user_preferences (
  user_id                   TEXT PRIMARY KEY,          -- Reference to users.id
  transcript_processing     TEXT DEFAULT 'raw',        -- Legacy: 'raw', 'cleanup', 'summary'
  send_cleaned_transcript   INTEGER DEFAULT 0,         -- 0=disabled, 1=enabled
  send_summary             INTEGER DEFAULT 0,         -- 0=disabled, 1=enabled
  created_at               DATETIME DEFAULT CURRENT_TIMESTAMP,
  updated_at               DATETIME DEFAULT CURRENT_TIMESTAMP,
  FOREIGN KEY(user_id) REFERENCES users(id)
);

Voice Events Table (Enhanced with Status Tracking)

CREATE TABLE voice_events (
  id                    TEXT PRIMARY KEY,          -- cuid() identifier
  user_id              TEXT NOT NULL,             -- Reference to users.id
  received_at          DATETIME DEFAULT CURRENT_TIMESTAMP,
  duration_sec         INTEGER,                   -- Duration in seconds
  bytes                INTEGER,                   -- File size in bytes
  status               TEXT DEFAULT 'pending',    -- 'pending', 'processing', 'completed', 'failed'
  processing_type      TEXT DEFAULT 'raw',        -- 'raw', 'cleanup', 'summary'
  completed_at         DATETIME,                  -- When processing completed
  error_message        TEXT,                      -- Error details if failed
  enhancements_requested TEXT,                    -- JSON array of requested enhancements
  FOREIGN KEY(user_id) REFERENCES users(id)
);

🚀 Complete Setup Guide

1. Repository Setup

git clone <repository-url>
cd whatsapp-echo
npm install

2. Environment Configuration

cp env.example .env.local
# Edit .env.local with all credentials

3. Database Setup (Cloudflare D1)

# Install and setup Wrangler
npm install -g wrangler
wrangler login

# Create database
wrangler d1 create voice-transcription-prod

# Initialize schema
wrangler d1 execute voice-transcription-prod --file=./sql/schema.sql --remote

4. Service Integration Setup

Google OAuth

Google Cloud Console
Create OAuth 2.0 credentials
Configure redirect URIs
Add client ID/secret to environment

OpenAI API

OpenAI Platform
Generate API key
Verify Whisper API access
Add to environment

Mailgun Email Service

Mailgun Console
Add and verify domain
Configure MX records
Set up inbound webhook routing
Add API credentials to environment

5. Complete Environment Variables

# NextAuth
NEXTAUTH_URL=https://your-domain.vercel.app
NEXTAUTH_SECRET=your-secret-key

# Google OAuth
GOOGLE_CLIENT_ID=your-client-id
GOOGLE_CLIENT_SECRET=your-client-secret

# Cloudflare D1
D1_URL=https://api.cloudflare.com/client/v4/accounts/ACCOUNT_ID/d1/database/DATABASE_ID/query
D1_DATABASE_ID=your-database-id
D1_API_KEY=your-api-key

# OpenAI (Whisper + GPT-4.1 nano)
OPENAI_API_KEY=sk-your-openai-key

# Mailgun
MAILGUN_DOMAIN=your-domain.com
MAILGUN_API_KEY=key-your-mailgun-key
MAILGUN_WEBHOOK_KEY=your-webhook-signing-key

# Admin Users
ADMIN_EMAILS=admin@yourdomain.com,admin2@yourdomain.com

# reCAPTCHA v2 (Contact Form Protection)
RECAPTCHA_SITE_KEY=your-recaptcha-site-key
RECAPTCHA_SECRET_KEY=your-recaptcha-secret-key

# Company Information (Legal Pages)
PRIVACY_EMAIL=privacy@yourdomain.com
EMAIL_SITE_CONTACT=hello@yourdomain.com
COMPANY_NAME=Your Company
COMPANY_ADDRESS=123 Main Street
COMPANY_CITY=Los Angeles
COMPANY_STATE=CA
COMPANY_ZIP=90027
COMPANY_FULL_ADDRESS=123 Main Street, Los Angeles, CA 90027

6. Vercel Deployment

# Install Vercel CLI
npm install -g vercel

# Deploy to production
vercel --prod

# Configure environment variables in Vercel dashboard
# Set up custom domain if needed

🎯 User Experience Flow

New User Journey

Sign Up: Google OAuth authentication
Approval: Admin reviews and approves account
Email Alias: Receive personal email address (slug@yourdomain.com)
Set Preferences: Configure enhancement options (cleanup, summary, or both)
Voice Notes: Send audio files via email
Multiple Transcripts: Receive raw transcript immediately + enhanced versions (if enabled)
Dashboard: View history, usage statistics, and manage preferences

Admin Experience

Dashboard: View all users and system statistics
User Management: Approve/revoke users with one click
Analytics: Monitor processing success rates and performance
Bulk Actions: Manage multiple users efficiently
System Health: Track timeouts and error rates

🔒 Privacy & Security

Privacy-First Architecture

This system is designed with zero transcript logging and privacy-first principles:

🚫 No Content Logging: Transcript content is NEVER logged to console, files, or monitoring systems
📊 Metadata Only: System logs contain only technical data (file size, processing time, success/failure)
💾 No Storage: Voice transcripts are not stored in database - only delivered via email
🧠 In-Memory Processing: Audio files are processed entirely in memory without disk writes
🔍 Error Safety: Error objects contain only technical metadata (length, processing type), never transcript content
📡 Monitoring Exclusion: Sentry error reporting excludes all transcript content and sensitive user data

Privacy Audit Results

✅ Console Logs: Reviewed all logging statements - no transcript content exposed
✅ Error Handling: Error objects contain only technical metadata, not sensitive data
✅ Database: Voice events table stores only metadata (duration, file size, timestamps)
✅ Monitoring: Sentry error reporting excludes transcript content
✅ Processing: All audio processing happens in memory with automatic cleanup

Your Privacy Guarantees

Immediate Delivery: Voice notes are transcribed and delivered via email immediately
No Server Storage: Transcript content is never stored on our servers
Technical Logs Only: System logs contain only performance data, never your voice content
Safe Error Handling: Even in error cases, your transcript content is never exposed
Memory-Only Processing: Audio files are processed in memory and automatically discarded

🔧 Production Features

Admin Dashboard (`/admin`)

User Management Table: Search, filter, and manage all users
Statistics Cards: Total users, approvals, voice events processed
Bulk Operations: Mass approve/revoke user access
Real-time Updates: Optimistic UI with instant feedback
Processing Analytics: Success rates and performance metrics

User Dashboard (`/dashboard`)

Personal Email Alias: Unique address for voice notes
Usage Instructions: Step-by-step guide with best practices
Voice History: Last 20 voice notes with metadata
File Guidelines: Size limits, format recommendations
Account Status: Approval status and notifications
Preferences Access: One-click navigation to enhancement settings

User Preferences (`/dashboard/preferences`)

Always Raw Processing: Guaranteed immediate transcript delivery
Optional Cleanup: Grammar and formatting improvements with GPT-4.1 nano
Optional Summary: Concise key points and action items
Real-time Preview: See exactly how many emails you'll receive
Interactive Controls: Toggle enhancements on/off with visual feedback

Security Features

Rate Limiting: In-memory system with per-endpoint limits
CSRF Protection: Token-based request validation
Security Headers: Comprehensive HTTP security headers
Input Validation: Sanitization and format validation
Error Handling: Graceful degradation with user feedback
Privacy Protection: Zero transcript logging, in-memory processing only
Background API Security: SHA256 token-based authentication for background processing
Webhook Security: Mailgun signature validation (implementation pending)
reCAPTCHA v2: Contact form protection against automated abuse and spam

Performance Optimizations

60-Second Timeout: Optimized for Vercel Hobby tier
Memory Streaming: No disk writes for audio processing
Aggressive Caching: Static asset optimization
Mobile-First: Responsive design for all devices
Error Recovery: Comprehensive error pages and fallbacks

🎤 Voice Processing Pipeline

Complete Processing Flow (Always Raw + AI Enhancements)

Email Received → Webhook Validation → User Lookup → Get User Preferences →
File Validation → Audio Download → OpenAI Whisper Transcription (Raw) → 
Database Logging (metadata only) → Raw Email Delivery (15-30s) → 
Queue Background AI Processing (if enabled) → Background Enhancement API → 
Token Validation → GPT-4.1 nano Cleanup/Summary → Enhanced Email Delivery

Privacy-Protected: All processing happens in memory with zero transcript content logging. Background Processing: Secure token-based API prevents serverless function timeouts. AI-Powered: GPT-4.1 nano provides grammar cleanup and intelligent summarization. User-Controlled: Enhancement preferences configured once in dashboard, applied to all voice notes.

🤖 AI Enhancement Options

Raw Transcript (Always Included)

Delivery: 15-30 seconds, never delayed
Content: Exactly as transcribed by OpenAI Whisper
Format: Basic punctuation, natural speech patterns
Use Case: Immediate access, direct quotes, feeding to other AI tools

Cleaned Transcript (Optional)

AI Model: GPT-4.1 nano with specialized cleanup prompts
Improvements: Fixed grammar, proper punctuation, removed filler words ("um", "uh", "like")
Formatting: Natural paragraph breaks, proper capitalization
Preservation: Original wording and tone maintained - no paraphrasing
Use Case: Professional documents, clean copy-paste text, presentation materials

Smart Summary (Optional)

AI Model: GPT-4.1 nano with structured summarization prompts
Format: Markdown with organized sections
Sections: Main Topic (always), Key Points, Action Items, Important Details
Length: ≤150 words for comprehensive summaries
Content: Bullet-pointed key ideas, quoted important specifics (names/dates/numbers)
Use Case: Quick overview, meeting notes, task extraction, executive summaries

Enhancement Quality Control

Hallucination Prevention: AI prompts designed to only use content from original transcript
Accuracy Focus: Quality checks to verify every statement comes from source material
Content Preservation: Enhancement improves format/structure without adding new information
Error Safety: Failed enhancements don't affect raw transcript delivery

Timeout Management (< 60 seconds)

Total Safety Margin: 55 seconds (5s buffer)
Parsing & Validation: < 2 seconds
User Lookup: < 1 second
Audio Download: < 10 seconds (with timeout)
Whisper Transcription: < 40 seconds (with timeout)
Email Response: < 2 seconds

File Processing Guidelines

Optimal: < 5MB, < 3 minutes (15-25 seconds total)
Good: 5-10MB, 3-5 minutes (25-40 seconds total)
Maximum: 15MB, 5-8 minutes (40-55 seconds total)
Rejected: > 15MB (immediate error response)

Supported Formats

M4A (best) - Optimal compression and speed
MP3 (good) - Wide compatibility
WAV (acceptable) - Large files, slower processing
OGG (acceptable) - Good compression

📊 Monitoring & Analytics

Admin Dashboard Metrics

User Statistics: Total, approved, pending counts
Processing Stats: Success rates, error rates
Performance: Average processing times
System Health: Timeout occurrences, error patterns

Production Monitoring

# Check Vercel function logs
vercel logs

# Monitor processing metrics
grep "Processing Metrics" logs

# Check rate limiting stats
curl https://your-domain.vercel.app/api/admin/stats

Error Tracking

Comprehensive Error Categories: 8 specific error types
User-Friendly Messages: Clear guidance for each error
Admin Notifications: Critical error alerts
Performance Metrics: Processing time breakdowns

🛡️ Security Implementation

Rate Limiting (In-Memory)

Webhook: 5 requests/minute per user
Admin API: 30 requests/minute per admin
General API: 100 requests/minute per IP
Authentication: 10 attempts/15 minutes per IP

Security Headers

X-Frame-Options: Prevent clickjacking
X-Content-Type-Options: Prevent MIME sniffing
X-XSS-Protection: XSS protection
Strict-Transport-Security: HTTPS enforcement
Content-Security-Policy: Script execution control

Input Validation

Email Validation: Format and length checks
File Validation: Size, type, and content verification
SQL Injection Prevention: Parameterized queries
XSS Prevention: Input sanitization

🐛 Troubleshooting Guide

Security Issues

If you discover a security vulnerability, please report it responsibly:

DO NOT create public GitHub issues for security vulnerabilities
Email the repository maintainer with detailed information
See SECURITY.md for complete vulnerability reporting guidelines

Common Issues & Solutions

1. Webhook Timeout Errors

Symptoms: 504 Gateway Timeout, incomplete processing Solutions:

Reduce file size (< 10MB recommended)
Check OpenAI API status and quota
Verify Mailgun webhook URL configuration
Test with smaller files first

2. Authentication Problems

Symptoms: Login failures, session errors Solutions:

Verify Google OAuth configuration
Check NEXTAUTH_SECRET and NEXTAUTH_URL
Confirm redirect URIs match exactly
Test with different browsers/incognito

3. Admin Dashboard Issues

Symptoms: Permission denied, loading errors Solutions:

Verify admin email in ADMIN_EMAILS
Check database connectivity
Clear browser cache and cookies
Test API endpoints directly

4. Voice Processing Failures

Symptoms: No transcription emails, error responses Solutions:

Verify OpenAI API key and quota
Check Mailgun domain configuration
Test with different audio formats
Monitor Vercel function logs

Debug Commands

# Test webhook endpoint
curl -X GET "https://your-domain.vercel.app/api/inbound"

# Check OpenAI connectivity
curl -H "Authorization: Bearer $OPENAI_API_KEY" \
  "https://api.openai.com/v1/models"

# Test Mailgun API
curl -u "api:$MAILGUN_API_KEY" \
  "https://api.mailgun.net/v3/domains"

# Verify database
wrangler d1 execute voice-transcription-prod \
  --command="SELECT COUNT(*) FROM users;" --remote

🚀 Deployment Checklist

Pre-Deployment

All environment variables configured
Database schema applied
Google OAuth redirect URIs updated
Mailgun webhook URL configured
OpenAI API key verified
Admin emails configured

Vercel Configuration

vercel.json properly configured
Function timeouts set (60s for webhook)
Security headers applied
Environment variables added to Vercel
Domain configured (if using custom)

Post-Deployment Testing

🔄 Usage Instructions

For End Users

📖 Complete Guide: See the User Manual for detailed instructions

Get Access: Sign up and wait for admin approval
Receive Email: Get your personal alias (abc123@yourdomain.com)
Set Preferences: Configure enhancement options (/dashboard/preferences)
Send Voice Notes: Attach audio files to emails
Receive Multiple Transcripts: Get raw transcript immediately + enhanced versions
View History: Check dashboard for past voice notes

"Always Raw + AI-Powered Enhancements" System

How It Works:

Raw Transcript (Always): Delivered in 15-30 seconds, exactly as transcribed by OpenAI Whisper
Cleaned Transcript (Optional): Grammar fixes, proper punctuation, removed filler words, paragraph breaks
Smart Summary (Optional): Structured summary with main topic, key points, action items, and important details (≤150 words)
Multiple Emails: Each version arrives as a separate, clearly labeled email
Background Processing: Secure token-based API prevents serverless function timeouts

AI Enhancement Details:

Cleanup Enhancement: Corrects transcription mistakes, fixes punctuation/capitalization, removes "um/uh/like", adds natural paragraph breaks - preserves original wording
Summary Enhancement: Extracts main topic, bullet-pointed key ideas, actionable tasks with context, and quoted important details (names/dates/numbers)
User Control: Configure preferences once in dashboard - applies to all future voice notes
Quality Focused: AI prompts designed to prevent hallucination and maintain accuracy

User Benefits:

Immediate Access: Never wait for enhancements - raw transcript arrives first
Flexible Options: Enable cleanup, summary, both, or neither via dashboard preferences
Clear Labeling: Email subjects clearly indicate which version you're reading ([Raw], [Cleaned], [Summary])
No Delays: Enhanced processing happens in background without affecting speed
Reliable Processing: Background API ensures enhancements complete even for large files
Professional Quality: Enhanced versions ready for documents, AI tools, and business use

For Administrators

Access Dashboard: Navigate to /admin with admin account
Manage Users: Approve/revoke access, view statistics
Monitor System: Check processing rates and errors
Bulk Operations: Manage multiple users efficiently
System Health: Monitor performance and timeouts

📈 Performance Benchmarks

Processing Times (Production)

Small Files (< 1MB): 8-15 seconds
Medium Files (1-5MB): 15-35 seconds
Large Files (5-15MB): 35-55 seconds
Timeout Rate: < 2% (well within 60s limit)

System Capacity (Vercel Hobby)

Concurrent Processing: 1 webhook at a time
Daily Quota: Based on Vercel function invocations
Success Rate: 95%+ for properly formatted files
Error Recovery: Comprehensive error handling

📚 Additional Documentation

For detailed information about specific aspects of the system:

Security Policy - Comprehensive security measures, privacy guarantees, and vulnerability reporting
User Manual - Complete end-user guide for signup, workflow, and usage
Deployment Guide - Complete deployment and configuration instructions improvements

📝 License

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

🎉 Project Complete

This voice note transcription service is now production-ready with:

✅ Complete user authentication and management
✅ Real-time voice processing pipeline with background enhancements
✅ Professional admin and user dashboards
✅ Production security and performance optimization
✅ Privacy-first architecture with zero transcript logging
✅ Comprehensive error handling and monitoring
✅ Mobile-responsive design
✅ Full Vercel deployment optimization
✅ Secure background processing API with token authentication
✅ Database status tracking for enhancement progress
✅ Comprehensive security policy and privacy guarantees

Perfect for organizations needing reliable voice-to-text processing with user management, admin oversight, and enterprise-grade privacy protection.

🔄 Next Steps

Webhook Signature Verification: Complete Mailgun HMAC signature validation for production security

🛠️ Development Guide

Development Modes

🚀 Full Development Mode (Recommended)

npm run dev:api

URL: http://localhost:3000
Features: Complete frontend + API routes + AI analysis
Best for: Testing full functionality, API development
Environment: Uses .env.local for local development
Database: Connects to your configured D1 database or development fallback

🧪 Production Environment Testing

# Pull production environment variables
vercel env pull .env.local

# Run with production configuration
vercel dev

URL: http://localhost:3000
Features: Production environment variables + local development
Best for: Debugging production issues, testing with real data
Environment: Uses production environment variables

🔍 Database Testing

# Test database connection
wrangler d1 execute voice-transcription-db --command "SELECT 1 as test" --remote

# Run schema updates
wrangler d1 execute voice-transcription-db --file=./sql/schema.sql --remote

Development Environment Setup

Install dependencies:
```
npm install
```

Set up environment variables:

cp env.example .env.local
# Edit .env.local with your development credentials

Start development server:
```
npm run dev:api
```
Access the application:
- Frontend: http://localhost:3000
- API Routes: http://localhost:3000/api/*
- Admin Dashboard: http://localhost:3000/admin
- User Dashboard: http://localhost:3000/dashboard

Development Features

Hot Reload: Automatic reload on file changes
API Route Testing: All API endpoints available locally
Database Integration: Connect to D1 database or use development fallback
Authentication: Google OAuth with development callbacks
Real-time Logging: Console logs for debugging (privacy-compliant)
Error Handling: Comprehensive error pages and debugging info
Privacy-Safe Debugging: Logs contain only metadata, never transcript content

Development Workflow

Start development server: npm run dev:api
Make changes: Edit code with hot reload
Test functionality: Use local environment for testing
Debug issues: Check console logs and error pages
Deploy: Push changes to Vercel for production testing

Development URLs

Local Development: http://localhost:3000
Google OAuth Redirect: http://localhost:3000/api/auth/callback/google
Webhook Testing: http://localhost:3000/api/inbound
Admin Panel: http://localhost:3000/admin
User Dashboard: http://localhost:3000/dashboard

📱 Programmatic API

iOS Shortcut Integration

Every user gets a permanent API key for seamless iOS Shortcut integration. Record voice notes directly from your iPhone and get instant transcriptions without opening any app.

Quick Setup

Get Your API Key: Visit your dashboard to copy your personal API key
Download iOS Shortcut: One-click download from your dashboard
Configure API Key: Use the visual setup instructions to paste your API key in the shortcut
Start Recording: Use the shortcut from anywhere on your iPhone (see demo for workflow)
Instant Results: Get transcriptions in seconds, no app switching needed

Visual Guides

📖 Setup Instructions: Step-by-step visual guide showing exactly where to paste your API key in the iOS Shortcut configuration
🎬 Usage Demo: Animated demonstration of the complete voice note forwarding workflow from recording to transcription

Credit & Inspiration

Special thanks to Giacomo Melzi for the original iOS Shortcut concept! His innovative implementation inspired this feature.

How Echo Scribe's Implementation Differs:

Managed Service: Echo Scribe acts as a secure proxy service between your iOS Shortcut and OpenAI
No Personal OpenAI Key Required: Users don't need their own OpenAI API accounts or billing
Centralized Management: Echo Scribe handles OpenAI API keys, billing, and rate limiting centrally
User-Specific Authentication: Each user gets their own Echo Scribe API key for secure access
Integrated Features: Leverages Echo Scribe's existing user management, preferences, and privacy protections

This approach provides the same powerful iOS Shortcut functionality while eliminating the need for users to manage their own OpenAI accounts, making voice transcription accessible to everyone.

API Endpoint

POST /api/transcribe
Authorization: Bearer your-api-key
Content-Type: multipart/form-data

# Example usage
curl -X POST \
  -H "Authorization: Bearer your-api-key" \
  -F "file=@voice-note.m4a" \
  https://your-domain.vercel.app/api/transcribe

Response Format

{
  "text": "Your transcribed voice note content here..."
}

Features

🔐 Secure Authentication: Personal API key per user
⚡ Fast Processing: Same 15-30 second transcription speed
📊 Usage Tracking: API calls logged in your dashboard
🔄 Rate Limited: 5 requests per minute (same as email processing)
📱 iOS Optimized: Works seamlessly with iOS Shortcuts app
🎤 Multiple Formats: Supports .m4a, .mp3, .wav, .ogg files
💾 No Email Required: Direct JSON response, no email processing

Use Cases

Voice Memos: Quick transcription of personal notes
Meeting Notes: Instant transcripts during calls
Content Creation: Voice-to-text for writing workflows
Accessibility: Audio content made searchable and readable
Automation: Integration with other iOS shortcuts and workflows

Error Handling

// Invalid API key
{ "error": "Invalid or missing API key" }

// File too large  
{ "error": "File size exceeds 15MB limit" }

// Unsupported format
{ "error": "Unsupported file format" }

// Rate limit exceeded
{ "error": "Rate limit exceeded - 5 requests per minute" }

API Security

🔑 Bearer Token Authentication: Standard OAuth-style authentication
🛡️ Rate Limiting: Per-user limits prevent abuse
📝 Privacy-First: Same zero-logging policy as email processing
🔒 Secure Key Generation: Cryptographically secure API keys
👤 User Isolation: Each user's data completely isolated

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
docs		docs
public		public
sql		sql
src		src
.cursorrules		.cursorrules
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
README.md		README.md
SECURITY.md		SECURITY.md
env.example		env.example
next.config.mjs		next.config.mjs
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
sentry.edge.config.ts		sentry.edge.config.ts
sentry.server.config.ts		sentry.server.config.ts
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json
vercel.json		vercel.json
wrangler.toml.example		wrangler.toml.example

Folders and files

Latest commit

History

Repository files navigation

Echo Scribe - Voice Note Transcription 🎤→📝

📝 Changelog

v1.03 (2025-07-07)

v1.0.2 (2025-07-05)

v1.01 (2025-07-05)

🚀 What It Does

Perfect for...

💰 Cost Breakdown

🏗️ Technical Overview

🎯 Project Status: PRODUCTION READY ✅

🔄 Pending Security Enhancement

🏗️ Architecture Overview

Hybrid Deployment Strategy

Complete Tech Stack

Key Features

🔒 Security & Privacy

Privacy-First Architecture

Security Measures

What We DON'T Store

What We DO Store

Security Documentation

📂 Complete Project Structure

🗄️ Database Schema

Users Table

User Preferences Table

Voice Events Table (Enhanced with Status Tracking)

🚀 Complete Setup Guide

1. Repository Setup

2. Environment Configuration

3. Database Setup (Cloudflare D1)

4. Service Integration Setup

Google OAuth

OpenAI API

Mailgun Email Service

5. Complete Environment Variables

6. Vercel Deployment

🎯 User Experience Flow

New User Journey

Admin Experience

🔒 Privacy & Security

Privacy-First Architecture

Privacy Audit Results

Your Privacy Guarantees

🔧 Production Features

Admin Dashboard (/admin)

User Dashboard (/dashboard)

User Preferences (/dashboard/preferences)

Security Features

Performance Optimizations

🎤 Voice Processing Pipeline

Complete Processing Flow (Always Raw + AI Enhancements)

🤖 AI Enhancement Options

Raw Transcript (Always Included)

Cleaned Transcript (Optional)

Smart Summary (Optional)

Enhancement Quality Control

Timeout Management (< 60 seconds)

File Processing Guidelines

Supported Formats

📊 Monitoring & Analytics

Admin Dashboard Metrics

Production Monitoring

Error Tracking

🛡️ Security Implementation

Rate Limiting (In-Memory)

Security Headers

Input Validation

🐛 Troubleshooting Guide

Security Issues

Common Issues & Solutions

1. Webhook Timeout Errors

2. Authentication Problems

3. Admin Dashboard Issues

4. Voice Processing Failures

Debug Commands

🚀 Deployment Checklist

Admin Dashboard (`/admin`)

User Dashboard (`/dashboard`)

User Preferences (`/dashboard/preferences`)

Packages