Transform your 3-minute spoken presentation into a professional, AI-powered slide deck with speaker notes in seconds.
- Audio Input Options: Upload audio files (MP3, WAV, M4A) or record directly in-browser
- 3-minute Recording Limit: Built-in timer and auto-stop functionality
- Real-time Processing: Visual progress indicators and status updates
- 5-Slide Generation: Professional presentation structure with speaker notes
- Modern Interface: Beautiful purple-to-blue gradient design
- Responsive Layout: Works perfectly on desktop and mobile devices
- Professional UI: Clean, intuitive user experience
- Visual Feedback: Progress bars, loading states, and status messages
- Speech Recognition: OpenAI Whisper API for accurate transcription
- AI Content Generation: GPT-4 powered slide structuring
- Export Options: HTML download (PDF coming soon)
- Speaker Notes: Detailed notes for each slide
- File Size Limits: 50MB max for uploaded files
- Node.js (v14 or higher)
- OpenAI API key
- Modern web browser with microphone access
-
Clone the repository
git clone <your-repo-url> cd voice-to-slide-generator
-
Install dependencies
npm install
-
Set up environment variables
cp env.example .env
Edit
.envand add your OpenAI API key:OPENAI_API_KEY=sk-your-actual-api-key-here -
Start the application
npm start
-
Open your browser Navigate to
http://localhost:3000
- Click "Start Recording" or press Spacebar
- Speak clearly into your microphone
- Recording automatically stops after 3 minutes
- Click "Stop Recording" to finish early
- Drag & drop audio files onto the upload area
- Or click to browse and select files
- Supported formats: MP3, WAV, M4A
- Maximum file size: 50MB
- Audio is automatically transcribed using OpenAI Whisper
- AI generates 5 professional slides with speaker notes
- Review your presentation in the results section
- Export as HTML for immediate use
- File Upload: Multer for handling audio file uploads
- Audio Processing: OpenAI Whisper API integration
- Content Generation: OpenAI GPT-4 for slide creation
- Export System: HTML generation with professional styling
- Audio Recording: Web Audio API with MediaRecorder
- File Handling: Drag & drop with validation
- Real-time Updates: Progress indicators and status messages
- Responsive Design: Mobile-first approach
- Speech-to-Text: OpenAI Whisper for accurate transcription
- Content Generation: GPT-4 for intelligent slide structuring
- Professional Formatting: Structured 5-slide presentations
Each generated presentation follows this professional structure:
- Introduction & Overview - Sets context and roadmap
- Key Challenge & Context - Identifies problems and urgency
- Core Strategy & Approach - Main solution framework
- Implementation & Results - Execution plan and outcomes
- Conclusion & Next Steps - Summary and call-to-action
| Endpoint | Method | Description |
|---|---|---|
/api/health |
GET | Health check |
/api/upload-audio |
POST | Upload audio file |
/api/transcribe |
POST | Transcribe audio with Whisper |
/api/generate-slides |
POST | Generate slides with GPT-4 |
/api/export-html |
POST | Export presentation as HTML |
npm run devnpm run buildnpm test- β Environment Variables: API keys stored securely
- β File Validation: Type and size restrictions
- β Error Handling: Comprehensive error management
- β Input Sanitization: XSS protection
- β CORS Configuration: Proper cross-origin setup
- PDF Export: Server-side PDF generation
- User Authentication: Login and user management
- Presentation Templates: Multiple design themes
- Collaboration: Share and edit presentations
- Analytics: Usage statistics and insights
- WebSocket Integration: Real-time progress updates
- Caching System: Improve performance
- Database Integration: Store presentation history
- Cloud Storage: Audio file management
- API Rate Limiting: Prevent abuse
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI for providing GPT-4 and Whisper APIs
- Font Awesome for beautiful icons
- Express.js for the robust backend framework
- Modern CSS for stunning visual design
- Documentation: Check this README for setup and usage
- Issues: Report bugs via GitHub Issues
- Questions: Open a discussion for general questions
- Help: Click the help icon (?) in the bottom-right corner
Made with β€οΈ for better presentations