Professional AI-powered music services with state-of-the-art neural networks
This is a production-ready, full-stack AI music platform that I built to explore modern serverless GPU computing, payment systems, and advanced ML model deployment. The platform offers professional-grade audio processing services using state-of-the-art AI models, all running on serverless infrastructure with a credit-based payment system.
Why I built this:
- Learn serverless GPU computing with Modal Labs
- Implement a complete payment system with Stripe
- Work with cutting-edge AI models (BS-RoFormer, DeepFilterNet, Whisper AI)
- Build a production-ready SPA without frameworks (vanilla JS)
- Practice database design with Row-Level Security
- Deploy scalable ML services on NVIDIA A10G GPUs
| Service | AI Model | Quality Metric | Use Case |
|---|---|---|---|
| π΅ Music Source Separation | BS-RoFormer (ZFTurbo) | 9.65dB SDR | Isolate vocals, drums, bass, instruments |
| ποΈ AI Audio Mastering | Professional Audio Chain | Broadcast Standard | Professional music mastering |
| πΌ Music Transcription | basic-pitch + madmom + essentia | Producer-Ready MIDI | Convert audio β MIDI |
| ποΈ Speech Enhancement | DeepFilterNet + Whisper | Studio Quality | Podcast & voice processing |
| π§ Audio Generation | AudioLDM-L-Full (975M params) | High Fidelity | Text-to-audio generation |
- Serverless GPU Computing: Auto-scaling NVIDIA A10G GPUs via Modal Labs
- Credit System: Secure payment processing with Stripe + Supabase
- Row-Level Security: PostgreSQL RLS for tamper-proof credit management
- Edge Functions: Vercel serverless functions for payment webhooks
- PWA: Progressive Web App with service worker
- Real-time Processing: WebSocket updates for long-running jobs
- 99%+ Profit Margin: Optimized GPU usage (β¬0.014-β¬0.037 per track)
graph TB
A[Frontend SPA<br/>Vanilla JS + Bootstrap] --> B[Vercel Edge Functions<br/>Payment & Auth]
B --> C[Stripe Payment Gateway<br/>Credit Purchases]
B --> D[Supabase PostgreSQL<br/>Credit Management + RLS]
B --> E[Modal Serverless Platform<br/>GPU Orchestration]
E --> F[NVIDIA A10G GPU Cluster<br/>24GB VRAM]
F --> G[BS-RoFormer<br/>Music Separation]
F --> H[DeepFilterNet + Whisper<br/>Speech Enhancement]
F --> I[AudioLDM<br/>Audio Generation]
F --> J[Matchering<br/>AI Mastering]
F --> K[basic-pitch + madmom<br/>Transcription]
Frontend:
- Vanilla JavaScript (ES6+) - No framework overhead
- Bootstrap 5.3 - Responsive UI
- WaveSurfer.js - Audio visualization
- Web Audio API - Real-time processing
- Service Worker - PWA capabilities
Backend:
- Compute: Modal Labs (Python 3.11 + FastAPI)
- GPU: NVIDIA A10G (24GB VRAM)
- Payment: Stripe with webhooks
- Database: Supabase (PostgreSQL + RLS)
- Hosting: Vercel (Edge Functions + CDN)
AI/ML Stack:
BS-RoFormer (ZFTurbo) # State-of-the-art music separation
DeepFilterNet # AI noise reduction
Whisper (OpenAI) # Speech-to-text + enhancement
AudioLDM (975M params) # Text-to-audio generation
basic-pitch (Spotify) # Melody transcription
madmom + essentia # Music analysis- Node.js 18+ and npm
- Python 3.11+
- Modal account (for GPU services)
- Stripe account (for payments)
- Supabase project (for database)
- Vercel account (for deployment)
# Clone the repository
git clone https://github.com/KleinDigitalSolutions/XO-master.git
cd XO-master
# Install dependencies
npm install
# Start development server
npm run dev
# OR
python3 -m http.server 8080
# Open browser
open http://localhost:8080# Install Modal CLI
pip install modal
# Authenticate
modal token new
# Deploy all AI services
modal deploy modal_app_zfturbo_complete.py # Music Separation
modal deploy modal_app_enhancement.py # Speech Enhancement
modal deploy modal_app_matchering.py # AI Mastering
modal deploy modal_app_transcription.py # Music Transcription
modal deploy modal_app_audio_generation.py # Audio Generation
# Verify deployments
modal app listCreate a .env file with:
# Stripe
STRIPE_SECRET_KEY=sk_test_xxx
STRIPE_PUBLISHABLE_KEY=pk_test_xxx
STRIPE_WEBHOOK_SECRET=whsec_...
# Supabase
NEXT_PUBLIC_SUPABASE_URL=https://xxx.supabase.co
SUPABASE_SERVICE_ROLE_KEY=eyJ...
# Security
CREDIT_WEBHOOK_SECRET=your_secure_webhook_secret# Run the Supabase schema
psql -h db.xxx.supabase.co -U postgres -d postgres -f supabase_credit_schema.sqlNote: the frontend expects window.STRIPE_PUBLISHABLE_KEY at runtime for direct Stripe redirects.
See SETUP.md for detailed setup instructions.
The platform uses Modal Labs for serverless GPU computing, which provides:
- Auto-scaling: 0 to 100+ GPUs based on demand
- Pay-per-second billing: Only pay for actual compute time
- Cold start: <10 seconds to spin up new GPU instances
- A10G GPU: 24GB VRAM for large model inference
Example Modal Function:
@app.function(
gpu="A10G",
timeout=600,
image=zfturbo_image # Custom image with ML dependencies
)
def separate_audio(audio_file: bytes, model: str):
# Process audio on GPU
stems = bs_roformer.separate(audio_file, model)
return stemsThe credit system uses Supabase Row-Level Security (RLS) to prevent tampering:
-- Users can only view their own credits
CREATE POLICY "Users can view own profile" ON credit_users
FOR SELECT USING (auth.uid()::text = id);
-- Only service role can add credits (via webhook)
CREATE POLICY "Service role can manage users" ON credit_users
FOR ALL USING (auth.role() = 'service_role');Flow:
- User purchases credits via Stripe
- Stripe webhook triggers
/api/webhook - Webhook verifies signature and adds credits atomically
- User uses service β credits deducted via stored procedure
- All operations logged in
credit_usagetable
GPU Processing Times:
- BS-RoFormer (Music Separation): 45-120s
- Speech Enhancement: 15-30s
- AI Mastering: 30-60s
- Music Transcription: 20-45s
- Audio Generation: 10-20s
Cost Efficiency:
- Modal A10G: ~$1.10/hour ($0.0003/second)
- Average track: β¬0.014-β¬0.037 compute cost
- Credit price: β¬1.60-β¬3.49 per use
- Profit margin: 99.1-99.8%
| Metric | Value |
|---|---|
| Lines of Code | ~15,000+ |
| AI Models Deployed | 5 production models |
| API Endpoints | 4 serverless functions |
| Database Tables | 3 + RLS policies |
| Uptime | 99.9% (Vercel + Modal) |
| Processing Speed | 10x faster than local |
| Cold Start Time | <10 seconds |
- Serverless Architecture: Designing systems that scale from 0ββ
- GPU Computing: Optimizing ML inference on NVIDIA A10G
- Payment Systems: Implementing secure Stripe webhooks and idempotent transactions
- Database Security: PostgreSQL RLS policies for multi-tenant data
- ML Deployment: Containerizing and deploying large AI models
- Edge Computing: Vercel Edge Functions for low-latency API responses
- Pricing Strategy: Credit-based model vs subscriptions
- Cost Analysis: GPU compute costs vs revenue (99%+ margin)
- User Experience: Free trial β paywall β conversion funnel
- Payment UX: Supporting multiple payment methods (Klarna, PayPal, etc.)
- Infrastructure as Code: Modal apps as Python code
- Database Migrations: Version-controlled SQL schemas
- API Design: RESTful serverless functions
- Error Handling: Comprehensive retry logic and user feedback
- Security: Webhook signature verification, RLS policies, service role auth
XO-master/
βββ index.html # Main SPA (360KB comprehensive app)
βββ api/
β βββ credits.js # Credit management API
β βββ webhook.js # Stripe webhook handler
β βββ create-checkout-session.js # Payment session creation
β βββ paywall.js # Credit verification
βββ modal_app_zfturbo_complete.py # BS-RoFormer music separation
βββ modal_app_enhancement.py # Speech enhancement service
βββ modal_app_matchering.py # AI mastering service
βββ modal_app_transcription.py # Music transcription service
βββ modal_app_audio_generation.py # Audio generation service
βββ supabase_credit_schema.sql # Database schema with RLS
βββ vercel.json # Vercel deployment config
βββ package.json # Node.js dependencies
βββ docs/
βββ ARCHITECTURE.md # Technical architecture
βββ SETUP.md # Development setup guide
βββ CLAUDE.md # AI assistant context
The project uses a multi-platform deployment strategy:
vercel --prod- Static SPA served via CDN
- Serverless API functions for payments
- Automatic HTTPS and domain management
modal deploy modal_app_*.py- GPU services auto-scale based on traffic
- Pay-per-second billing
- Global deployment
- Managed PostgreSQL with automatic backups
- Row-Level Security policies
- Real-time subscriptions
- Real-time Processing: WebRTC for live audio processing
- Mobile Apps: React Native iOS/Android apps
- API Access: Developer API for integrations
- Batch Processing: Process multiple files simultaneously
- Advanced Analytics: User behavior tracking and insights
- Team Accounts: Multi-user workspaces with shared credits
- White-label Solution: Embeddable widgets for other platforms
This is primarily a portfolio project, but contributions are welcome! See CONTRIBUTING.md for guidelines.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see LICENSE file for details.
- ZFTurbo - BS-RoFormer (9.65dB SDR)
- Rikorose - DeepFilterNet noise reduction
- OpenAI - Whisper speech recognition
- Spotify - basic-pitch MIDI transcription
- Stability AI - AudioLDM text-to-audio model
- Modal Labs - Serverless GPU computing
- Vercel - Frontend hosting and edge functions
- Supabase - PostgreSQL database with RLS
- Stripe - Payment processing
Built with β€οΈ using cutting-edge AI and serverless technologies
This project demonstrates full-stack development, ML deployment, payment systems, and serverless architecture
