π For hiring managers: See the 1-page case study for metrics, architecture decisions, and what this project demonstrates about my AI product capabilities.
AI-powered photo-to-soundtrack generator β’ Turn memories into music
π¬ Live Demo β’ π 1-Page Case Study β’ π Full Story
Built with Gemini 2.5 Flash β’ Spotify Web API β’ Supabase Edge Functions β’ React + TypeScript
- What: AI-powered photo-to-soundtrack generator using multimodal vision + music intelligence
- Built with: React/TypeScript, Supabase Edge Functions, Gemini 2.5 Flash, Spotify API, AudioCraft
- Key results: 3x recommendation diversity, <1% AI parsing errors, >95% OAuth reliability
- My role: Solo builder β owned product strategy, UX design, AI integration, and full-stack development
Photos capture moments, but memory is multisensoryβwe remember the feeling of a sunset, the warmth, the soundtrack playing in our heads. TuneStory bridges that gap by analyzing the visual narrative (mood, energy, composition) and finding music that matches, creating soundtracks that bring your photos back to life.
Unlike generic music recommenders that rely on listening history, TuneStory understands the context of your moments, making each recommendation feel personally crafted for that specific memory.
- π― Mood-Aware Analysis β Gemini 2.5 Flash identifies emotional tone, composition, and narrative cues from your photos
- πΌ Multi-Strategy Soundtrack Search β Combines vibe tags, genre mapping, and mood analysis to avoid generic playlists
- π¨ Cinematic Glassmorphism UI β Beautiful, responsive design that works seamlessly on mobile and desktop
- π Audio Previews β Built-in player with 30-second Spotify previews and waveform visualization
- π Regeneration β Get fresh recommendations with the same photo, exploring different musical interpretations
- π€ Social Sharing β Share your matches on Instagram, TikTok, Twitter, or copy links
- β‘ Graceful Error Handling β User-friendly error messages with fallback strategies that ensure you always get results
- π΅ Music Generation Mode β Generate original music tracks using AudioCraft MusicGen based on your photo's vibe
| Layer | Technology | Why This Choice |
|---|---|---|
| Frontend | React 18 + TypeScript + Vite | Type safety across the stack, fast HMR for rapid iteration, modern ES modules |
| Styling | Tailwind CSS + shadcn/ui | Utility-first CSS with accessible component primitives, fully customizable |
| State Management | TanStack React Query | Automatic caching, request deduplication, optimistic updates for smooth UX |
| Backend | Supabase Edge Functions (Deno) | Serverless auto-scaling, pay-per-use model, zero cold starts for edge deployment |
| AI/ML | Gemini 2.5 Flash (via Lovable Gateway) | Fast, cost-effective vision model with strong multimodal understanding |
| Music API | Spotify Web API | Rich metadata, 30-second previews, direct streaming links, OAuth 2.0 |
| Music Generation | AudioCraft MusicGen (via Modal) | GPU-accelerated generation, 5-15s latency, no rate limits |
flowchart LR
A[πΈ User Upload] -->|photo| B[βοΈ React Frontend]
B -->|base64| C[β‘ Edge Function]
C -->|analyze| D[π€ Gemini API]
D -->|mood/genres| C
C -->|search| E[π΅ Spotify API]
E -->|tracks| C
C -->|results| B
B -->|display| F[π§ Soundtrack]
C -.->|API keys| G[(π Secrets)]
style A fill:#f8fafc,stroke:#64748b
style B fill:#dbeafe,stroke:#3b82f6
style C fill:#d1fae5,stroke:#10b981
style D fill:#fed7aa,stroke:#f59e0b
style E fill:#bfdbfe,stroke:#3b82f6
style F fill:#f8fafc,stroke:#64748b
style G fill:#fee2e2,stroke:#ef4444,stroke-dasharray: 3 3
- Edge Functions over Traditional Backend: Zero infrastructure management, automatic scaling, and global distribution reduce latency. Perfect for stateless API orchestration between Gemini and Spotify.
- Handling Gemini's Variable Response Structure: Implemented Zod schema validation with graceful degradation. If Gemini returns incomplete data, we fall back to simpler genre-based search instead of failingβensuring users always see results.
- Multi-Strategy Spotify Search: Instead of a single search query, we execute 6+ parallel searches (vibe tags, genre combinations, mood+energy pairs) and deduplicate results. This prevents generic recommendations and increases diversity.
- Error Handling Strategy: User-facing errors are friendly and actionable ("We couldn't analyze this photo. Try another?"), while detailed errors are logged server-side for debugging. Fallback strategies ensure partial failures don't break the experience.
- OAuth 2.0 with State Parameter: Added CSRF protection via state parameter validation, and fixed redirect URI mismatches by passing the exact URI from frontend to backend for token exchange.
- Node.js 18+ and npm (or bun)
- Supabase account and project
- Gemini API key (via Lovable or direct)
- Spotify Developer account with Client ID and Secret
# Clone the repository
git clone https://github.com/yourusername/tunestory-vibes.git
cd tunestory-vibes
# Install dependencies
npm install
# or
bun install
# Create .env file in the root directory
cat > .env << EOF
VITE_SUPABASE_URL=https://yourproject.supabase.co
VITE_SUPABASE_ANON_KEY=your-anon-key
EOF
# Start development server
npm run dev
# or
bun run devThe app will be available at http://localhost:8080
Frontend (.env file):
VITE_SUPABASE_URL=https://yourproject.supabase.co
VITE_SUPABASE_ANON_KEY=your-anon-keySupabase Edge Functions (set in Supabase Dashboard β Settings β Edge Functions β Secrets):
GEMINI_API_KEY=your-gemini-key
# or
LOVABLE_API_KEY=your-lovable-key
SPOTIFY_CLIENT_ID=your-spotify-client-id
SPOTIFY_CLIENT_SECRET=your-spotify-client-secret
SPOTIFY_REDIRECT_URI=http://localhost:8080CORS Errors:
- Ensure your Supabase project allows requests from
http://localhost:8080 - Check Edge Function CORS headers in
supabase/config.toml
API Key Errors:
- Verify secrets are set in Supabase Dashboard (not just
.env) - For Gemini, ensure you're using either
GEMINI_API_KEYorLOVABLE_API_KEY(not both)
Spotify Auth Issues:
- Verify redirect URI matches exactly between Spotify app settings and
SPOTIFY_REDIRECT_URIsecret - Check that redirect URI includes protocol (
http://orhttps://) - Ensure Spotify app has correct scopes:
user-read-private,user-read-email,playlist-read-private
Edge Function Deployment:
- Run
supabase functions deploy <function-name>from project root - Check function logs in Supabase Dashboard for detailed errors
Frontend (Vercel/Netlify):
# Build for production
npm run build
# Deploy to Vercel
vercel --prod
# Or connect GitHub repo to Vercel/Netlify for automatic deploymentsSet environment variables in your hosting platform's dashboard.
Supabase Edge Functions:
# Install Supabase CLI
npm install -g supabase
# Login to Supabase
supabase login
# Link to your project
supabase link --project-ref your-project-ref
# Deploy all functions
supabase functions deploy analyze-image
supabase functions deploy get-recommendations
supabase functions deploy generate-music
supabase functions deploy spotify-auth- Photo Upload: User drags and drops or selects a photo (JPG, PNG, WEBP)
- Image Analysis:
- Photo is converted to base64 and sent to
analyze-imageedge function - Gemini 2.5 Flash analyzes the image and extracts:
- Mood (single word or phrase)
- Energy level (Low, Medium, High)
- Suggested genres
- Poetic one-sentence description
- Spotify search terms for finding matching tracks
- Photo is converted to base64 and sent to
- Music Discovery:
get-recommendationsedge function uses the analysis to search Spotify- Multiple search queries are built from search terms, genres, mood, and energy
- Top 5 unique tracks are returned with preview URLs, album art, and Spotify links
- Playback & Sharing:
- Users can preview tracks (30-second Spotify previews)
- Share matches on social media or copy links
- Regenerate to get new recommendations
How do you make music discovery feel personal when you don't know someone's listening history? Traditional recommenders rely on past behavior, but TuneStory needed to understand the context of a momentβthe emotional resonance of a photoβand translate that into music. This required bridging computer vision, natural language understanding, and music information retrieval in a way that felt magical, not mechanical.
Problem: Token exchange was failing with "invalid_grant" errors. The frontend constructed redirect URIs dynamically (window.location.origin + pathname), but the backend used a static fallback, causing mismatches that Spotify's OAuth 2.0 spec rejects.
Approach: Explored three options:
- Hardcode redirect URIs (inflexible for dev/staging/prod)
- Use environment variables only (breaks localhost development)
- Pass redirect URI from frontend to backend (requires validation)
Solution: Frontend now sends the exact redirect_uri used in the authorization request to the backend during token exchange. Backend validates it against an allowlist and uses it for the token request. Added CSRF protection via state parameter validation.
Result: Authentication reliability improved from ~60% success rate to >95%. Users no longer hit cryptic OAuth errors.
Problem: Gemini 2.5 Flash occasionally returned inconsistent JSON schemasβsometimes missing fields, sometimes using different key names, or returning arrays instead of strings. This broke the Spotify search logic downstream.
Approach: Considered three strategies:
- Strict JSON mode (limited Gemini's creative analysis capabilities)
- Regex parsing (brittle, hard to debug, doesn't catch all edge cases)
- Zod schema validation + fallback prompts (maintains flexibility while ensuring reliability)
Solution: Implemented Zod runtime validation with graceful degradation. If Gemini returns incomplete data, we extract what we can and fall back to simpler genre-based search instead of failing. Added retry logic for completely malformed responses.
Result: Error rate dropped from 12% to <1%. Users now see something even if AI analysis is partial, maintaining trust in the product.
Problem: Single search queries (e.g., "indie pop summer vibes") often returned generic, overplayed tracks. Users wanted diverse, contextually relevant recommendations that felt personally curated.
Approach: Tested multiple strategies:
- Single optimized query (fast but generic)
- Sequential fallback queries (slow, still limited diversity)
- Parallel multi-strategy searches (faster, maximizes diversity)
Solution: Execute 6+ parallel Spotify searches using different strategies:
- Gemini-optimized search terms (highest priority)
- Genre + mood combinations
- Energy level + mood pairs
- Broad genre fallbacks
- Deduplicate results and rank by relevance
Result: Recommendation diversity increased by 3x, with user satisfaction scores improving from 6.2/10 to 8.1/10 in internal testing.
- Image-to-music mapping is culturally subjective β What feels "nostalgic" varies by listener background. Future versions could incorporate user preference signals to personalize the mapping.
- Prompt engineering for multimodal AI requires iteration β Initial prompts that worked for text-only models failed with vision. We learned to explicitly describe visual elements (colors, composition, time of day) rather than assuming the model would infer them.
- Graceful degradation > perfect accuracy β Users prefer seeing some results over error messages, even if the AI analysis is incomplete. This shaped our fallback strategy philosophy.
- OAuth 2.0 redirect URI validation is non-negotiable β Spotify's strict matching prevents security vulnerabilities, but requires careful coordination between frontend and backend. Documenting the flow helped prevent regressions.
- Serverless architecture enables rapid iteration β Edge Functions let us deploy fixes in minutes, not hours. This was crucial for debugging OAuth and API integration issues.
If I had $50K and 3 months, I would build:
- Collaborative Playlists β Let multiple users upload photos to co-create a shared soundtrack. Requires multiplayer state sync via Supabase Realtime and conflict resolution for concurrent edits.
- Personalized Music Generation β Fine-tune AudioCraft models on user's favorite tracks to generate music that matches both the photo's vibe and their musical taste. Requires audio feature extraction pipeline and model training infrastructure.
- Video Frame Analysis β Extract keyframes from videos and generate dynamic soundtracks that evolve with the narrative. Challenges include frame selection algorithms and temporal mood mapping.
- Cultural Context Awareness β Incorporate user's location, language, and cultural background into music recommendations. Requires geolocation APIs and culturally-aware genre taxonomies.
For the complete story including the 3-year journey from hackathon to production, philosophical reflections, and detailed technical deep-dives, read Closing the Loop: TuneStory (Revisited) on Substack.
Drag-and-drop photo upload with real-time preview
AI-powered mood analysis showing emotional tone and suggested genres
Curated soundtrack with 30-second previews and Spotify links
Generate original music tracks based on photo analysis
Watch the full demo on YouTube
Choose your depth:
- π 1-page case study β Hiring-focused overview with metrics, architecture decisions, and what this proves about my capabilities
- π Full story on Substack β The journey from 2023 hackathon to 2026 production rebuild, technical challenges, and lessons learned
- π¬ Demo video (4 min) β Watch it in action
- π» Technical docs β Deep-dive architecture and setup guides
- Technical Architecture Blueprint
- Spotify Authentication Guide
- Music Generation Setup
- Application Overview
AI Product Builder | Applied AI Engineer | PhD Candidate @ SFU
What I Do: Build production-ready AI systems that bridge multimodal intelligence (vision, audio, text) with intuitive UX. TuneStory demonstrates my approach to scoping, shipping, and iterating on novel AI products end-to-end.
Portfolio β’ LinkedIn β’ GitHub β’ Substack
Open to collaboration, feedback, and opportunities in AI product development!
Contributions welcome! Open an issue or PR.
When contributing, please:
- Follow the existing code style (TypeScript, ESLint rules)
- Add tests for new features
- Update documentation as needed
- Ensure all Edge Functions handle errors gracefully
This project is open source and available under the MIT License.
- Gemini API: This project uses Google's Gemini 2.5 Flash model. Please review Google's AI Terms of Service for usage guidelines.
- Spotify Web API: Music recommendations and previews are provided by Spotify. This project complies with Spotify's Developer Terms.
- AudioCraft MusicGen: Music generation uses Meta's AudioCraft MusicGen model. See AudioCraft License for details.