Version 1.03 β July 7, 2025
- iOS Shortcut API Integration: Added permanent API keys for every user enabling direct iOS Shortcut integration
- Programmatic API Endpoint: New
/api/transcribeendpoint with Bearer token authentication for developers - API Key Management: Secure 32-character hex API keys with automatic generation and dashboard management
- Enhanced Dashboard: Added API key card with copy/reveal functionality and iOS shortcut download button
- Database Migration: Added
api_keycolumn to users table with automatic backfill for existing users - Rate Limiting Integration: API endpoints use existing webhook rate limiting (5 requests/minute per user)
- Privacy-First API: Same zero-logging policy as email processing with JSON-only responses
- Developer Experience: Complete API documentation with curl examples and error handling
- Documentation Enhancement: Comprehensive review and alignment of README and Architecture documentation
- Project Structure Verification: Confirmed all components, features, and integrations are properly documented
- Security Policy Updates: Verified all security measures and privacy guarantees are accurately documented
- Deployment Guide Consistency: Ensured all deployment instructions are complete and accurate
- Development Workflow Optimization: Confirmed all development modes and testing procedures are documented
- Beautiful HTML Email Templates for Enhancements: Enhanced emails (cleaned, summary, quick summary) now use modern, styled HTML templates for a much better reading experience.
- Markdown Rendering for Summaries: Summaries and quick summaries are now delivered as formatted HTML, not plain text. Markdown is converted to HTML using the Showdown library.
- Showdown Integration: Added Showdown for robust markdown-to-HTML conversion in all enhancement emails.
- New Markdown Utility: Added
src/lib/markdown.tsfor reusable, secure markdown-to-HTML conversion across the codebase. - Consistent Branding: All enhancement emails now match the look and feel of other system emails.
- Improved User Experience: Enhanced emails are easier to read, with clear sections, bullet points, and action items.
Turn your voice notes into text in seconds, not minutes.
Tired of listening to long voice messages? This service converts your WhatsApp voice notes (or any audio) into readable text almost instantly. Just email the audio file, and get the words back in under a minuteβno apps, no uploads, no fuss.
It's ridiculously simple: Send voice note β Get multiple versions back.
π€ Record a voice note in WhatsApp (or any app)
π§ Email the audio file to your personal transcription address
β‘ Get raw transcript first (15-30 seconds) - never wait for enhancements
β¨ Receive AI enhancements (optional) - cleaned formatting, summaries, action items
Works with everything: M4A, MP3, WAV, OGG files up to 25 minutes long
π Completely private: Audio processed securely and deleted immediately
π± No tech skills needed: Just email the fileβconfigure preferences once
π° Almost free to run: Built on free tiers (Vercel Hobby + Cloudflare D1 + Mailgun free)βonly OpenAI costs money
- Skimming long voice notes with instant AI summaries and key points
- Getting clean, formatted text for documents and professional use
- Extracting action items from meeting recordings and voice memos
- Feeding enhanced transcripts to AI tools for further processing
- Converting voice notes to structured, searchable text records
- Accessibility when you can't listen to audio
Almost entirely free to run! Built specifically to leverage free tiers:
- Vercel Hobby: Free (100GB bandwidth, 1000 function invocations/month)
- Cloudflare D1: Free (5GB storage, 25 million reads/month)
- Mailgun: Free (10,000 emails/month)
- OpenAI Whisper: ~$0.006/minute of audio β‘ (Only paid service)
Real cost example: Processing 100 voice notes (avg 2 minutes each) = ~$1.20/month total
Built as a production-ready Next.js 14 application with Google authentication, Cloudflare D1 database, and OpenAI Whisper transcription. Features an innovative "Always Raw + Optional Enhancements" system, user preference management, comprehensive admin dashboard, and real-time voice processing optimized for Vercel deployment.
New in v1.0.2:
- The
src/lib/markdown.tsutility provides secure, reusable markdown-to-HTML conversion for all enhancement emails, leveraging the Showdown library. This ensures summaries and quick summaries are delivered as beautiful, formatted HTML. - Comprehensive documentation review ensuring all features, security measures, and deployment procedures are properly documented and aligned between README and Architecture documentation.
Privacy-First Architecture: Zero transcript content logging, in-memory-only audio processing, and no persistent storage of voice data. All transcript content is delivered directly via email without being stored on servers, ensuring maximum privacy protection for sensitive voice communications.
π Complete Architecture Documentation - Detailed system architecture, component interactions, data flows, and operational considerations.
All 4 phases successfully implemented with background processing:
β Phase 1 - Foundation (Complete)
- Next.js 14 with App Router and TypeScript
- Google OAuth authentication with NextAuth
- Cloudflare D1 database integration
- User approval workflow system
- Route protection middleware
β Phase 2 - Voice Processing (Complete)
- Mailgun webhook integration
- OpenAI Whisper streaming transcription
- Email processing with 60-second timeout optimization
- Comprehensive error handling and user feedback
- Production-ready voice note processing pipeline
- Optimized FormData processing (Simplified from formidable to Next.js native - 100+ lines reduced)
β Phase 3 - Admin Interface & Production (Complete)
- Complete admin dashboard with DataTable
- User dashboard with voice history and instructions
- Production security features and rate limiting
- Mobile-responsive design with shadcn/ui
- Comprehensive error pages and monitoring
- Full Vercel deployment optimization
β Phase 4 - User Preferences & Background Processing (Complete)
- "Always Raw + Optional Enhancements" processing system
- User preference management with boolean enhancement flags
- Background processing with GPT-4.1 nano (cleanup & summary)
- Secure token-based background API with authentication
- Multi-email delivery system (raw + enhanced versions)
- RESTful preferences API with authentication
- Interactive preferences UI with real-time preview
- Privacy audit and fixes applied - zero transcript logging guaranteed
- Database status tracking for enhancement progress monitoring
- Webhook Signature Verification: Complete Mailgun HMAC signature validation for production security
This application uses a hybrid architecture combining the best of both platforms:
- Vercel: Hosts the Next.js application and API routes (optimal for React/Next.js)
- Cloudflare D1: Provides the database backend (fast, serverless SQLite)
- Communication: D1 database accessed via Cloudflare's REST API from Vercel
This setup provides excellent performance, cost efficiency, and leverages each platform's strengths while maintaining privacy-first design principles with zero transcript logging and in-memory-only processing.
- Framework: Next.js 14 (App Router)
- Language: TypeScript
- Authentication: NextAuth with Google OAuth
- Database: Cloudflare D1 (SQLite) - accessed via REST API
- UI: shadcn/ui components with Tailwind CSS
- Transcription: OpenAI Whisper API
- Email: Mailgun for inbound processing
- Deployment: Vercel (Hobby tier optimized) + Cloudflare D1
- Security: Rate limiting, CSRF protection, security headers
- Privacy: Zero transcript logging, in-memory processing, no data persistence
- π Complete Authentication: Google OAuth with session management
- π₯ User Management: Admin approval workflow with bulk operations
- π€ Smart Voice Processing: Always-raw + optional enhancements system
- βοΈ User Preferences: Interactive preference management for enhancements
- π€ AI Enhancement: GPT-4.1 nano cleanup and summary processing
- π Background Processing: Secure token-based background enhancement API
- π§ Multi-Email System: Raw transcript + enhanced versions delivered separately
- π Admin Dashboard: Real-time user analytics and management
- π€ User Dashboard: Personal voice history and usage instructions
- π‘οΈ Production Security: Rate limiting, CSRF, security headers, background API tokens
- π Privacy-First: Zero transcript logging, in-memory processing, no data storage
- π± Mobile Responsive: Optimized for all device sizes
- π Vercel Optimized: Function timeouts and performance tuning
- π Status Tracking: Database-based enhancement progress monitoring
WhatsApp Echo is built with a privacy-first design that prioritizes user data protection:
- π« Zero Transcript Logging: Voice transcript content is NEVER logged to console, files, or monitoring systems
- π§ In-Memory Processing: Audio files are processed entirely in memory without disk writes
- π§ Immediate Delivery: Transcripts delivered via email, not stored on servers
- π Metadata-Only Logging: System logs contain only technical data (file size, processing time, success/failure)
- π‘οΈ Privacy-Safe Error Handling: Error objects contain only technical metadata, never transcript content
- π Google OAuth Integration: Secure authentication using Google's OAuth 2.0 flow
- π« JWT Session Management: NextAuth.js handles secure session tokens
- π Token-Based Background API: SHA256 authentication for background processing
- π‘οΈ CSRF Protection: Token-based request validation for all forms
- β‘ Rate Limiting: Per-user and per-endpoint rate limiting with abuse prevention
- π Input Validation: Comprehensive file type, size, and format validation
- π Security Headers: CSP, XSS protection, clickjacking prevention, HTTPS enforcement
- π€ reCAPTCHA v2: Contact form protection against automated abuse and spam bots
- β Voice transcript content - Never stored in database or logs
- β Audio files - Processed in memory only, never written to disk
- β Sensitive user data - Only metadata and account information stored
- β Processing content - AI enhancement results not logged or stored
- β User account information - Google email, approval status, created date
- β Processing metadata - File size, duration, processing time, success/failure
- β User preferences - Enhancement settings (cleanup, summary options)
- β Technical logs - Performance metrics, error counts (no content)
π Complete Security Policy - Detailed security measures, privacy guarantees, vulnerability reporting, and security best practices.
src/
βββ app/
β βββ admin/
β β βββ page.tsx # Admin dashboard
β βββ api/
β β βββ admin/users/route.ts # Admin user management API
β β βββ auth/[...nextauth]/route.ts # NextAuth configuration
β β βββ background/enhance-transcript/route.ts # Background enhancement API
β β βββ config/privacy-email/route.ts # Privacy email configuration API
β β βββ contact/route.ts # Contact form API with reCAPTCHA verification
β β βββ inbound/route.ts # Smart webhook handler (always raw + enhancements)
β β βββ user/preferences/route.ts # User preferences API
β βββ contact/
β β βββ page.tsx # Contact form page with reCAPTCHA v2 protection
β βββ dashboard/
β β βββ page.tsx # User dashboard
β β βββ preferences/page.tsx # User preferences management
β βββ privacy/
β β βββ page.tsx # Privacy Policy page with CCPA/CPRA compliance
β βββ terms/
β β βββ page.tsx # Terms of Service page with California law compliance
β βββ error.tsx # Global error page
β βββ not-found.tsx # 404 page
β βββ globals.css # Global styles
β βββ layout.tsx # Root layout
βββ components/
β βββ admin/
β β βββ admin-stats.tsx # Admin statistics cards
β β βββ users-table.tsx # User management table
β βββ ui/
β βββ badge.tsx # Status indicators
β βββ button.tsx # Interactive buttons
β βββ card.tsx # Content containers
β βββ input.tsx # Form inputs
β βββ table.tsx # Data tables
β βββ version-badge.tsx # Version display badge component
βββ lib/
β βββ audio.ts # Audio processing utilities
β βββ auth.ts # NextAuth configuration
β βββ database.ts # Database operations with preferences
β βββ enhanced-errors.ts # Advanced error handling with Sentry integration
β βββ errors.ts # Core error handling system
β βββ mailgun.ts # Email processing
β βββ markdown.ts # Markdown-to-HTML conversion utility for enhancement emails
β βββ rate-limit.ts # In-memory rate limiting
β βββ security.ts # Security middleware
β βββ utils.ts # Utility functions
β βββ version.ts # Version utilities and centralized version management
β βββ voice-processor.ts # Background enhancement processing
β βββ whisper.ts # OpenAI Whisper integration
βββ types/
β βββ index.ts # TypeScript definitions
βββ utils/
β βββ env.ts # Environment validation
β βββ id.ts # ID generation
βββ middleware.ts # Route protection
βββ globals.css # Global styles
sql/
βββ schema.sql # Database schema
public/
βββ legal/
β βββ TERMS_OF_SERVICE.md # Terms of Service legal document (California law)
β βββ PRIVACY_POLICY.md # Privacy Policy legal document (CCPA/CPRA compliant)
βββ images/
βββ flickerventures.png # Company logo
βββ github.png # GitHub icon
βββ logo.png # Application logo
Config Files:
βββ package.json # Dependencies
βββ tsconfig.json # TypeScript config
βββ tailwind.config.ts # Tailwind CSS config
βββ next.config.mjs # Next.js config
βββ postcss.config.js # PostCSS config
βββ vercel.json # Vercel deployment config
βββ wrangler.toml # Cloudflare D1 database management
βββ env.example # Environment template
βββ SECURITY.md # Security policy and privacy guarantees
CREATE TABLE users (
id TEXT PRIMARY KEY, -- cuid() identifier
google_email TEXT UNIQUE NOT NULL, -- Google OAuth email
slug TEXT UNIQUE NOT NULL, -- 6-char nanoid for email aliases
approved INTEGER NOT NULL DEFAULT 0, -- 0=pending, 1=approved
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);CREATE TABLE user_preferences (
user_id TEXT PRIMARY KEY, -- Reference to users.id
transcript_processing TEXT DEFAULT 'raw', -- Legacy: 'raw', 'cleanup', 'summary'
send_cleaned_transcript INTEGER DEFAULT 0, -- 0=disabled, 1=enabled
send_summary INTEGER DEFAULT 0, -- 0=disabled, 1=enabled
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY(user_id) REFERENCES users(id)
);CREATE TABLE voice_events (
id TEXT PRIMARY KEY, -- cuid() identifier
user_id TEXT NOT NULL, -- Reference to users.id
received_at DATETIME DEFAULT CURRENT_TIMESTAMP,
duration_sec INTEGER, -- Duration in seconds
bytes INTEGER, -- File size in bytes
status TEXT DEFAULT 'pending', -- 'pending', 'processing', 'completed', 'failed'
processing_type TEXT DEFAULT 'raw', -- 'raw', 'cleanup', 'summary'
completed_at DATETIME, -- When processing completed
error_message TEXT, -- Error details if failed
enhancements_requested TEXT, -- JSON array of requested enhancements
FOREIGN KEY(user_id) REFERENCES users(id)
);git clone <repository-url>
cd whatsapp-echo
npm installcp env.example .env.local
# Edit .env.local with all credentials# Install and setup Wrangler
npm install -g wrangler
wrangler login
# Create database
wrangler d1 create voice-transcription-prod
# Initialize schema
wrangler d1 execute voice-transcription-prod --file=./sql/schema.sql --remote- Google Cloud Console
- Create OAuth 2.0 credentials
- Configure redirect URIs
- Add client ID/secret to environment
- OpenAI Platform
- Generate API key
- Verify Whisper API access
- Add to environment
- Mailgun Console
- Add and verify domain
- Configure MX records
- Set up inbound webhook routing
- Add API credentials to environment
# NextAuth
NEXTAUTH_URL=https://your-domain.vercel.app
NEXTAUTH_SECRET=your-secret-key
# Google OAuth
GOOGLE_CLIENT_ID=your-client-id
GOOGLE_CLIENT_SECRET=your-client-secret
# Cloudflare D1
D1_URL=https://api.cloudflare.com/client/v4/accounts/ACCOUNT_ID/d1/database/DATABASE_ID/query
D1_DATABASE_ID=your-database-id
D1_API_KEY=your-api-key
# OpenAI (Whisper + GPT-4.1 nano)
OPENAI_API_KEY=sk-your-openai-key
# Mailgun
MAILGUN_DOMAIN=your-domain.com
MAILGUN_API_KEY=key-your-mailgun-key
MAILGUN_WEBHOOK_KEY=your-webhook-signing-key
# Admin Users
ADMIN_EMAILS=admin@yourdomain.com,admin2@yourdomain.com
# reCAPTCHA v2 (Contact Form Protection)
RECAPTCHA_SITE_KEY=your-recaptcha-site-key
RECAPTCHA_SECRET_KEY=your-recaptcha-secret-key
# Company Information (Legal Pages)
PRIVACY_EMAIL=privacy@yourdomain.com
EMAIL_SITE_CONTACT=hello@yourdomain.com
COMPANY_NAME=Your Company
COMPANY_ADDRESS=123 Main Street
COMPANY_CITY=Los Angeles
COMPANY_STATE=CA
COMPANY_ZIP=90027
COMPANY_FULL_ADDRESS=123 Main Street, Los Angeles, CA 90027# Install Vercel CLI
npm install -g vercel
# Deploy to production
vercel --prod
# Configure environment variables in Vercel dashboard
# Set up custom domain if needed- Sign Up: Google OAuth authentication
- Approval: Admin reviews and approves account
- Email Alias: Receive personal email address (slug@yourdomain.com)
- Set Preferences: Configure enhancement options (cleanup, summary, or both)
- Voice Notes: Send audio files via email
- Multiple Transcripts: Receive raw transcript immediately + enhanced versions (if enabled)
- Dashboard: View history, usage statistics, and manage preferences
- Dashboard: View all users and system statistics
- User Management: Approve/revoke users with one click
- Analytics: Monitor processing success rates and performance
- Bulk Actions: Manage multiple users efficiently
- System Health: Track timeouts and error rates
This system is designed with zero transcript logging and privacy-first principles:
- π« No Content Logging: Transcript content is NEVER logged to console, files, or monitoring systems
- π Metadata Only: System logs contain only technical data (file size, processing time, success/failure)
- πΎ No Storage: Voice transcripts are not stored in database - only delivered via email
- π§ In-Memory Processing: Audio files are processed entirely in memory without disk writes
- π Error Safety: Error objects contain only technical metadata (length, processing type), never transcript content
- π‘ Monitoring Exclusion: Sentry error reporting excludes all transcript content and sensitive user data
- β Console Logs: Reviewed all logging statements - no transcript content exposed
- β Error Handling: Error objects contain only technical metadata, not sensitive data
- β Database: Voice events table stores only metadata (duration, file size, timestamps)
- β Monitoring: Sentry error reporting excludes transcript content
- β Processing: All audio processing happens in memory with automatic cleanup
- Immediate Delivery: Voice notes are transcribed and delivered via email immediately
- No Server Storage: Transcript content is never stored on our servers
- Technical Logs Only: System logs contain only performance data, never your voice content
- Safe Error Handling: Even in error cases, your transcript content is never exposed
- Memory-Only Processing: Audio files are processed in memory and automatically discarded
- User Management Table: Search, filter, and manage all users
- Statistics Cards: Total users, approvals, voice events processed
- Bulk Operations: Mass approve/revoke user access
- Real-time Updates: Optimistic UI with instant feedback
- Processing Analytics: Success rates and performance metrics
- Personal Email Alias: Unique address for voice notes
- Usage Instructions: Step-by-step guide with best practices
- Voice History: Last 20 voice notes with metadata
- File Guidelines: Size limits, format recommendations
- Account Status: Approval status and notifications
- Preferences Access: One-click navigation to enhancement settings
- Always Raw Processing: Guaranteed immediate transcript delivery
- Optional Cleanup: Grammar and formatting improvements with GPT-4.1 nano
- Optional Summary: Concise key points and action items
- Real-time Preview: See exactly how many emails you'll receive
- Interactive Controls: Toggle enhancements on/off with visual feedback
- Rate Limiting: In-memory system with per-endpoint limits
- CSRF Protection: Token-based request validation
- Security Headers: Comprehensive HTTP security headers
- Input Validation: Sanitization and format validation
- Error Handling: Graceful degradation with user feedback
- Privacy Protection: Zero transcript logging, in-memory processing only
- Background API Security: SHA256 token-based authentication for background processing
- Webhook Security: Mailgun signature validation (implementation pending)
- reCAPTCHA v2: Contact form protection against automated abuse and spam
- 60-Second Timeout: Optimized for Vercel Hobby tier
- Memory Streaming: No disk writes for audio processing
- Aggressive Caching: Static asset optimization
- Mobile-First: Responsive design for all devices
- Error Recovery: Comprehensive error pages and fallbacks
Email Received β Webhook Validation β User Lookup β Get User Preferences β
File Validation β Audio Download β OpenAI Whisper Transcription (Raw) β
Database Logging (metadata only) β Raw Email Delivery (15-30s) β
Queue Background AI Processing (if enabled) β Background Enhancement API β
Token Validation β GPT-4.1 nano Cleanup/Summary β Enhanced Email Delivery
Privacy-Protected: All processing happens in memory with zero transcript content logging. Background Processing: Secure token-based API prevents serverless function timeouts. AI-Powered: GPT-4.1 nano provides grammar cleanup and intelligent summarization. User-Controlled: Enhancement preferences configured once in dashboard, applied to all voice notes.
- Delivery: 15-30 seconds, never delayed
- Content: Exactly as transcribed by OpenAI Whisper
- Format: Basic punctuation, natural speech patterns
- Use Case: Immediate access, direct quotes, feeding to other AI tools
- AI Model: GPT-4.1 nano with specialized cleanup prompts
- Improvements: Fixed grammar, proper punctuation, removed filler words ("um", "uh", "like")
- Formatting: Natural paragraph breaks, proper capitalization
- Preservation: Original wording and tone maintained - no paraphrasing
- Use Case: Professional documents, clean copy-paste text, presentation materials
- AI Model: GPT-4.1 nano with structured summarization prompts
- Format: Markdown with organized sections
- Sections: Main Topic (always), Key Points, Action Items, Important Details
- Length: β€150 words for comprehensive summaries
- Content: Bullet-pointed key ideas, quoted important specifics (names/dates/numbers)
- Use Case: Quick overview, meeting notes, task extraction, executive summaries
- Hallucination Prevention: AI prompts designed to only use content from original transcript
- Accuracy Focus: Quality checks to verify every statement comes from source material
- Content Preservation: Enhancement improves format/structure without adding new information
- Error Safety: Failed enhancements don't affect raw transcript delivery
- Total Safety Margin: 55 seconds (5s buffer)
- Parsing & Validation: < 2 seconds
- User Lookup: < 1 second
- Audio Download: < 10 seconds (with timeout)
- Whisper Transcription: < 40 seconds (with timeout)
- Email Response: < 2 seconds
- Optimal: < 5MB, < 3 minutes (15-25 seconds total)
- Good: 5-10MB, 3-5 minutes (25-40 seconds total)
- Maximum: 15MB, 5-8 minutes (40-55 seconds total)
- Rejected: > 15MB (immediate error response)
- M4A (best) - Optimal compression and speed
- MP3 (good) - Wide compatibility
- WAV (acceptable) - Large files, slower processing
- OGG (acceptable) - Good compression
- User Statistics: Total, approved, pending counts
- Processing Stats: Success rates, error rates
- Performance: Average processing times
- System Health: Timeout occurrences, error patterns
# Check Vercel function logs
vercel logs
# Monitor processing metrics
grep "Processing Metrics" logs
# Check rate limiting stats
curl https://your-domain.vercel.app/api/admin/stats- Comprehensive Error Categories: 8 specific error types
- User-Friendly Messages: Clear guidance for each error
- Admin Notifications: Critical error alerts
- Performance Metrics: Processing time breakdowns
- Webhook: 5 requests/minute per user
- Admin API: 30 requests/minute per admin
- General API: 100 requests/minute per IP
- Authentication: 10 attempts/15 minutes per IP
- X-Frame-Options: Prevent clickjacking
- X-Content-Type-Options: Prevent MIME sniffing
- X-XSS-Protection: XSS protection
- Strict-Transport-Security: HTTPS enforcement
- Content-Security-Policy: Script execution control
- Email Validation: Format and length checks
- File Validation: Size, type, and content verification
- SQL Injection Prevention: Parameterized queries
- XSS Prevention: Input sanitization
If you discover a security vulnerability, please report it responsibly:
- DO NOT create public GitHub issues for security vulnerabilities
- Email the repository maintainer with detailed information
- See SECURITY.md for complete vulnerability reporting guidelines
Symptoms: 504 Gateway Timeout, incomplete processing Solutions:
- Reduce file size (< 10MB recommended)
- Check OpenAI API status and quota
- Verify Mailgun webhook URL configuration
- Test with smaller files first
Symptoms: Login failures, session errors Solutions:
- Verify Google OAuth configuration
- Check NEXTAUTH_SECRET and NEXTAUTH_URL
- Confirm redirect URIs match exactly
- Test with different browsers/incognito
Symptoms: Permission denied, loading errors Solutions:
- Verify admin email in ADMIN_EMAILS
- Check database connectivity
- Clear browser cache and cookies
- Test API endpoints directly
Symptoms: No transcription emails, error responses Solutions:
- Verify OpenAI API key and quota
- Check Mailgun domain configuration
- Test with different audio formats
- Monitor Vercel function logs
# Test webhook endpoint
curl -X GET "https://your-domain.vercel.app/api/inbound"
# Check OpenAI connectivity
curl -H "Authorization: Bearer $OPENAI_API_KEY" \
"https://api.openai.com/v1/models"
# Test Mailgun API
curl -u "api:$MAILGUN_API_KEY" \
"https://api.mailgun.net/v3/domains"
# Verify database
wrangler d1 execute voice-transcription-prod \
--command="SELECT COUNT(*) FROM users;" --remote- All environment variables configured
- Database schema applied
- Google OAuth redirect URIs updated
- Mailgun webhook URL configured
- OpenAI API key verified
- Admin emails configured
-
vercel.jsonproperly configured - Function timeouts set (60s for webhook)
- Security headers applied
- Environment variables added to Vercel
- Domain configured (if using custom)
- Authentication flow works
- Admin dashboard accessible
- User dashboard functional
- Webhook processing operational
- Email transcription working
- Error pages displaying correctly
π Complete Guide: See the User Manual for detailed instructions
- Get Access: Sign up and wait for admin approval
- Receive Email: Get your personal alias (abc123@yourdomain.com)
- Set Preferences: Configure enhancement options (/dashboard/preferences)
- Send Voice Notes: Attach audio files to emails
- Receive Multiple Transcripts: Get raw transcript immediately + enhanced versions
- View History: Check dashboard for past voice notes
How It Works:
- Raw Transcript (Always): Delivered in 15-30 seconds, exactly as transcribed by OpenAI Whisper
- Cleaned Transcript (Optional): Grammar fixes, proper punctuation, removed filler words, paragraph breaks
- Smart Summary (Optional): Structured summary with main topic, key points, action items, and important details (β€150 words)
- Multiple Emails: Each version arrives as a separate, clearly labeled email
- Background Processing: Secure token-based API prevents serverless function timeouts
AI Enhancement Details:
- Cleanup Enhancement: Corrects transcription mistakes, fixes punctuation/capitalization, removes "um/uh/like", adds natural paragraph breaks - preserves original wording
- Summary Enhancement: Extracts main topic, bullet-pointed key ideas, actionable tasks with context, and quoted important details (names/dates/numbers)
- User Control: Configure preferences once in dashboard - applies to all future voice notes
- Quality Focused: AI prompts designed to prevent hallucination and maintain accuracy
User Benefits:
- Immediate Access: Never wait for enhancements - raw transcript arrives first
- Flexible Options: Enable cleanup, summary, both, or neither via dashboard preferences
- Clear Labeling: Email subjects clearly indicate which version you're reading ([Raw], [Cleaned], [Summary])
- No Delays: Enhanced processing happens in background without affecting speed
- Reliable Processing: Background API ensures enhancements complete even for large files
- Professional Quality: Enhanced versions ready for documents, AI tools, and business use
- Access Dashboard: Navigate to
/adminwith admin account - Manage Users: Approve/revoke access, view statistics
- Monitor System: Check processing rates and errors
- Bulk Operations: Manage multiple users efficiently
- System Health: Monitor performance and timeouts
- Small Files (< 1MB): 8-15 seconds
- Medium Files (1-5MB): 15-35 seconds
- Large Files (5-15MB): 35-55 seconds
- Timeout Rate: < 2% (well within 60s limit)
- Concurrent Processing: 1 webhook at a time
- Daily Quota: Based on Vercel function invocations
- Success Rate: 95%+ for properly formatted files
- Error Recovery: Comprehensive error handling
For detailed information about specific aspects of the system:
- Security Policy - Comprehensive security measures, privacy guarantees, and vulnerability reporting
- User Manual - Complete end-user guide for signup, workflow, and usage
- Deployment Guide - Complete deployment and configuration instructions improvements
MIT License
Copyright (c) 2025 Flicker Ventures, LLC
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
This voice note transcription service is now production-ready with:
- β Complete user authentication and management
- β Real-time voice processing pipeline with background enhancements
- β Professional admin and user dashboards
- β Production security and performance optimization
- β Privacy-first architecture with zero transcript logging
- β Comprehensive error handling and monitoring
- β Mobile-responsive design
- β Full Vercel deployment optimization
- β Secure background processing API with token authentication
- β Database status tracking for enhancement progress
- β Comprehensive security policy and privacy guarantees
Perfect for organizations needing reliable voice-to-text processing with user management, admin oversight, and enterprise-grade privacy protection.
- Webhook Signature Verification: Complete Mailgun HMAC signature validation for production security
npm run dev:api- URL: http://localhost:3000
- Features: Complete frontend + API routes + AI analysis
- Best for: Testing full functionality, API development
- Environment: Uses .env.local for local development
- Database: Connects to your configured D1 database or development fallback
# Pull production environment variables
vercel env pull .env.local
# Run with production configuration
vercel dev- URL: http://localhost:3000
- Features: Production environment variables + local development
- Best for: Debugging production issues, testing with real data
- Environment: Uses production environment variables
# Test database connection
wrangler d1 execute voice-transcription-db --command "SELECT 1 as test" --remote
# Run schema updates
wrangler d1 execute voice-transcription-db --file=./sql/schema.sql --remote-
Install dependencies:
npm install
-
Set up environment variables:
cp env.example .env.local # Edit .env.local with your development credentials -
Start development server:
npm run dev:api
-
Access the application:
- Frontend: http://localhost:3000
- API Routes: http://localhost:3000/api/*
- Admin Dashboard: http://localhost:3000/admin
- User Dashboard: http://localhost:3000/dashboard
- Hot Reload: Automatic reload on file changes
- API Route Testing: All API endpoints available locally
- Database Integration: Connect to D1 database or use development fallback
- Authentication: Google OAuth with development callbacks
- Real-time Logging: Console logs for debugging (privacy-compliant)
- Error Handling: Comprehensive error pages and debugging info
- Privacy-Safe Debugging: Logs contain only metadata, never transcript content
- Start development server:
npm run dev:api - Make changes: Edit code with hot reload
- Test functionality: Use local environment for testing
- Debug issues: Check console logs and error pages
- Deploy: Push changes to Vercel for production testing
- Local Development: http://localhost:3000
- Google OAuth Redirect: http://localhost:3000/api/auth/callback/google
- Webhook Testing: http://localhost:3000/api/inbound
- Admin Panel: http://localhost:3000/admin
- User Dashboard: http://localhost:3000/dashboard
Every user gets a permanent API key for seamless iOS Shortcut integration. Record voice notes directly from your iPhone and get instant transcriptions without opening any app.
- Get Your API Key: Visit your dashboard to copy your personal API key
- Download iOS Shortcut: One-click download from your dashboard
- Configure API Key: Use the visual setup instructions to paste your API key in the shortcut
- Start Recording: Use the shortcut from anywhere on your iPhone (see demo for workflow)
- Instant Results: Get transcriptions in seconds, no app switching needed
- π Setup Instructions: Step-by-step visual guide showing exactly where to paste your API key in the iOS Shortcut configuration

- π¬ Usage Demo: Animated demonstration of the complete voice note forwarding workflow from recording to transcription

Special thanks to Giacomo Melzi for the original iOS Shortcut concept! His innovative implementation inspired this feature.
How Echo Scribe's Implementation Differs:
- Managed Service: Echo Scribe acts as a secure proxy service between your iOS Shortcut and OpenAI
- No Personal OpenAI Key Required: Users don't need their own OpenAI API accounts or billing
- Centralized Management: Echo Scribe handles OpenAI API keys, billing, and rate limiting centrally
- User-Specific Authentication: Each user gets their own Echo Scribe API key for secure access
- Integrated Features: Leverages Echo Scribe's existing user management, preferences, and privacy protections
This approach provides the same powerful iOS Shortcut functionality while eliminating the need for users to manage their own OpenAI accounts, making voice transcription accessible to everyone.
POST /api/transcribe
Authorization: Bearer your-api-key
Content-Type: multipart/form-data
# Example usage
curl -X POST \
-H "Authorization: Bearer your-api-key" \
-F "file=@voice-note.m4a" \
https://your-domain.vercel.app/api/transcribe{
"text": "Your transcribed voice note content here..."
}- π Secure Authentication: Personal API key per user
- β‘ Fast Processing: Same 15-30 second transcription speed
- π Usage Tracking: API calls logged in your dashboard
- π Rate Limited: 5 requests per minute (same as email processing)
- π± iOS Optimized: Works seamlessly with iOS Shortcuts app
- π€ Multiple Formats: Supports .m4a, .mp3, .wav, .ogg files
- πΎ No Email Required: Direct JSON response, no email processing
- Voice Memos: Quick transcription of personal notes
- Meeting Notes: Instant transcripts during calls
- Content Creation: Voice-to-text for writing workflows
- Accessibility: Audio content made searchable and readable
- Automation: Integration with other iOS shortcuts and workflows
// Invalid API key
{ "error": "Invalid or missing API key" }
// File too large
{ "error": "File size exceeds 15MB limit" }
// Unsupported format
{ "error": "Unsupported file format" }
// Rate limit exceeded
{ "error": "Rate limit exceeded - 5 requests per minute" }- π Bearer Token Authentication: Standard OAuth-style authentication
- π‘οΈ Rate Limiting: Per-user limits prevent abuse
- π Privacy-First: Same zero-logging policy as email processing
- π Secure Key Generation: Cryptographically secure API keys
- π€ User Isolation: Each user's data completely isolated