Voice-to-Slide Generator 🎤➡️📊

Transform your 3-minute spoken presentation into a professional, AI-powered slide deck with speaker notes in seconds.

✨ Features

🎯 Core Functionality

Audio Input Options: Upload audio files (MP3, WAV, M4A) or record directly in-browser
3-minute Recording Limit: Built-in timer and auto-stop functionality
Real-time Processing: Visual progress indicators and status updates
5-Slide Generation: Professional presentation structure with speaker notes

🎨 Design Features

Modern Interface: Beautiful purple-to-blue gradient design
Responsive Layout: Works perfectly on desktop and mobile devices
Professional UI: Clean, intuitive user experience
Visual Feedback: Progress bars, loading states, and status messages

🧠 Technical Capabilities

Speech Recognition: OpenAI Whisper API for accurate transcription
AI Content Generation: GPT-4 powered slide structuring
Export Options: HTML download (PDF coming soon)
Speaker Notes: Detailed notes for each slide
File Size Limits: 50MB max for uploaded files

🚀 Quick Start

Prerequisites

Node.js (v14 or higher)
OpenAI API key
Modern web browser with microphone access

Installation

Clone the repository

git clone <your-repo-url>
cd voice-to-slide-generator

Install dependencies
```
npm install
```
Set up environment variables
```
cp env.example .env
```
Edit .env and add your OpenAI API key:
```
OPENAI_API_KEY=sk-your-actual-api-key-here
```
Start the application
```
npm start
```
Open your browser Navigate to http://localhost:3000

📋 Usage Guide

Recording Audio

Click "Start Recording" or press Spacebar
Speak clearly into your microphone
Recording automatically stops after 3 minutes
Click "Stop Recording" to finish early

Uploading Audio Files

Drag & drop audio files onto the upload area
Or click to browse and select files
Supported formats: MP3, WAV, M4A
Maximum file size: 50MB

Processing & Results

Audio is automatically transcribed using OpenAI Whisper
AI generates 5 professional slides with speaker notes
Review your presentation in the results section
Export as HTML for immediate use

🏗️ Architecture

Backend (Node.js + Express)

File Upload: Multer for handling audio file uploads
Audio Processing: OpenAI Whisper API integration
Content Generation: OpenAI GPT-4 for slide creation
Export System: HTML generation with professional styling

Frontend (Vanilla JavaScript)

Audio Recording: Web Audio API with MediaRecorder
File Handling: Drag & drop with validation
Real-time Updates: Progress indicators and status messages
Responsive Design: Mobile-first approach

AI Integration

Speech-to-Text: OpenAI Whisper for accurate transcription
Content Generation: GPT-4 for intelligent slide structuring
Professional Formatting: Structured 5-slide presentations

📊 Slide Structure

Each generated presentation follows this professional structure:

Introduction & Overview - Sets context and roadmap
Key Challenge & Context - Identifies problems and urgency
Core Strategy & Approach - Main solution framework
Implementation & Results - Execution plan and outcomes
Conclusion & Next Steps - Summary and call-to-action

🔧 API Endpoints

Endpoint	Method	Description
`/api/health`	GET	Health check
`/api/upload-audio`	POST	Upload audio file
`/api/transcribe`	POST	Transcribe audio with Whisper
`/api/generate-slides`	POST	Generate slides with GPT-4
`/api/export-html`	POST	Export presentation as HTML

🛠️ Development

Running in Development Mode

npm run dev

Building for Production

npm run build

Running Tests

npm test

🔒 Security & Best Practices

✅ Environment Variables: API keys stored securely
✅ File Validation: Type and size restrictions
✅ Error Handling: Comprehensive error management
✅ Input Sanitization: XSS protection
✅ CORS Configuration: Proper cross-origin setup

🚧 Future Enhancements

Planned Features

PDF Export: Server-side PDF generation
User Authentication: Login and user management
Presentation Templates: Multiple design themes
Collaboration: Share and edit presentations
Analytics: Usage statistics and insights

Technical Improvements

WebSocket Integration: Real-time progress updates
Caching System: Improve performance
Database Integration: Store presentation history
Cloud Storage: Audio file management
API Rate Limiting: Prevent abuse

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

OpenAI for providing GPT-4 and Whisper APIs
Font Awesome for beautiful icons
Express.js for the robust backend framework
Modern CSS for stunning visual design

📞 Support

Documentation: Check this README for setup and usage
Issues: Report bugs via GitHub Issues
Questions: Open a discussion for general questions
Help: Click the help icon (?) in the bottom-right corner

Made with ❤️ for better presentations

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
public		public
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
env.example		env.example
package-lock.json		package-lock.json
package.json		package.json
server.js		server.js
vercel.json		vercel.json

Folders and files

Latest commit

History

Repository files navigation

Voice-to-Slide Generator 🎤➡️📊

✨ Features

🎯 Core Functionality

🎨 Design Features

🧠 Technical Capabilities

🚀 Quick Start

Prerequisites

Installation

📋 Usage Guide

Recording Audio

Uploading Audio Files

Processing & Results

🏗️ Architecture

Backend (Node.js + Express)

Frontend (Vanilla JavaScript)

AI Integration

📊 Slide Structure

🔧 API Endpoints

🛠️ Development

Running in Development Mode

Building for Production

Running Tests

🔒 Security & Best Practices

🚧 Future Enhancements

Planned Features

Technical Improvements

🤝 Contributing

📝 License

🙏 Acknowledgments

📞 Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages