Skip to content

Tylarcam/tunestory

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

83 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“„ For hiring managers: See the 1-page case study for metrics, architecture decisions, and what this project demonstrates about my AI product capabilities.


🎡 TuneStory

AI-powered photo-to-soundtrack generator β€’ Turn memories into music

🎬 Live Demo β€’ πŸ“Š 1-Page Case Study β€’ πŸ“– Full Story

Built with Gemini 2.5 Flash β€’ Spotify Web API β€’ Supabase Edge Functions β€’ React + TypeScript


🎯 Quick Overview

  • What: AI-powered photo-to-soundtrack generator using multimodal vision + music intelligence
  • Built with: React/TypeScript, Supabase Edge Functions, Gemini 2.5 Flash, Spotify API, AudioCraft
  • Key results: 3x recommendation diversity, <1% AI parsing errors, >95% OAuth reliability
  • My role: Solo builder – owned product strategy, UX design, AI integration, and full-stack development

Why TuneStory?

Photos capture moments, but memory is multisensoryβ€”we remember the feeling of a sunset, the warmth, the soundtrack playing in our heads. TuneStory bridges that gap by analyzing the visual narrative (mood, energy, composition) and finding music that matches, creating soundtracks that bring your photos back to life.

Unlike generic music recommenders that rely on listening history, TuneStory understands the context of your moments, making each recommendation feel personally crafted for that specific memory.


✨ Key Features

  • 🎯 Mood-Aware Analysis – Gemini 2.5 Flash identifies emotional tone, composition, and narrative cues from your photos
  • 🎼 Multi-Strategy Soundtrack Search – Combines vibe tags, genre mapping, and mood analysis to avoid generic playlists
  • 🎨 Cinematic Glassmorphism UI – Beautiful, responsive design that works seamlessly on mobile and desktop
  • πŸ”Š Audio Previews – Built-in player with 30-second Spotify previews and waveform visualization
  • πŸ”„ Regeneration – Get fresh recommendations with the same photo, exploring different musical interpretations
  • πŸ“€ Social Sharing – Share your matches on Instagram, TikTok, Twitter, or copy links
  • ⚑ Graceful Error Handling – User-friendly error messages with fallback strategies that ensure you always get results
  • 🎡 Music Generation Mode – Generate original music tracks using AudioCraft MusicGen based on your photo's vibe

πŸ—οΈ Architecture & Technical Design

Tech Stack

Layer Technology Why This Choice
Frontend React 18 + TypeScript + Vite Type safety across the stack, fast HMR for rapid iteration, modern ES modules
Styling Tailwind CSS + shadcn/ui Utility-first CSS with accessible component primitives, fully customizable
State Management TanStack React Query Automatic caching, request deduplication, optimistic updates for smooth UX
Backend Supabase Edge Functions (Deno) Serverless auto-scaling, pay-per-use model, zero cold starts for edge deployment
AI/ML Gemini 2.5 Flash (via Lovable Gateway) Fast, cost-effective vision model with strong multimodal understanding
Music API Spotify Web API Rich metadata, 30-second previews, direct streaming links, OAuth 2.0
Music Generation AudioCraft MusicGen (via Modal) GPU-accelerated generation, 5-15s latency, no rate limits

System Architecture

flowchart LR
    A[πŸ“Έ User Upload] -->|photo| B[βš›οΈ React Frontend]
    B -->|base64| C[⚑ Edge Function]
    C -->|analyze| D[πŸ€– Gemini API]
    D -->|mood/genres| C
    C -->|search| E[🎡 Spotify API]
    E -->|tracks| C
    C -->|results| B
    B -->|display| F[🎧 Soundtrack]
    
    C -.->|API keys| G[(πŸ” Secrets)]
    
    style A fill:#f8fafc,stroke:#64748b
    style B fill:#dbeafe,stroke:#3b82f6
    style C fill:#d1fae5,stroke:#10b981
    style D fill:#fed7aa,stroke:#f59e0b
    style E fill:#bfdbfe,stroke:#3b82f6
    style F fill:#f8fafc,stroke:#64748b
    style G fill:#fee2e2,stroke:#ef4444,stroke-dasharray: 3 3
Loading

Key Design Decisions

  • Edge Functions over Traditional Backend: Zero infrastructure management, automatic scaling, and global distribution reduce latency. Perfect for stateless API orchestration between Gemini and Spotify.
  • Handling Gemini's Variable Response Structure: Implemented Zod schema validation with graceful degradation. If Gemini returns incomplete data, we fall back to simpler genre-based search instead of failingβ€”ensuring users always see results.
  • Multi-Strategy Spotify Search: Instead of a single search query, we execute 6+ parallel searches (vibe tags, genre combinations, mood+energy pairs) and deduplicate results. This prevents generic recommendations and increases diversity.
  • Error Handling Strategy: User-facing errors are friendly and actionable ("We couldn't analyze this photo. Try another?"), while detailed errors are logged server-side for debugging. Fallback strategies ensure partial failures don't break the experience.
  • OAuth 2.0 with State Parameter: Added CSRF protection via state parameter validation, and fixed redirect URI mismatches by passing the exact URI from frontend to backend for token exchange.

πŸš€ Getting Started

Prerequisites

  • Node.js 18+ and npm (or bun)
  • Supabase account and project
  • Gemini API key (via Lovable or direct)
  • Spotify Developer account with Client ID and Secret

Local Development Setup

# Clone the repository
git clone https://github.com/yourusername/tunestory-vibes.git
cd tunestory-vibes

# Install dependencies
npm install
# or
bun install

# Create .env file in the root directory
cat > .env << EOF
VITE_SUPABASE_URL=https://yourproject.supabase.co
VITE_SUPABASE_ANON_KEY=your-anon-key
EOF

# Start development server
npm run dev
# or
bun run dev

The app will be available at http://localhost:8080

Environment Variables

Frontend (.env file):

VITE_SUPABASE_URL=https://yourproject.supabase.co
VITE_SUPABASE_ANON_KEY=your-anon-key

Supabase Edge Functions (set in Supabase Dashboard β†’ Settings β†’ Edge Functions β†’ Secrets):

GEMINI_API_KEY=your-gemini-key
# or
LOVABLE_API_KEY=your-lovable-key

SPOTIFY_CLIENT_ID=your-spotify-client-id
SPOTIFY_CLIENT_SECRET=your-spotify-client-secret
SPOTIFY_REDIRECT_URI=http://localhost:8080

Troubleshooting

CORS Errors:

  • Ensure your Supabase project allows requests from http://localhost:8080
  • Check Edge Function CORS headers in supabase/config.toml

API Key Errors:

  • Verify secrets are set in Supabase Dashboard (not just .env)
  • For Gemini, ensure you're using either GEMINI_API_KEY or LOVABLE_API_KEY (not both)

Spotify Auth Issues:

  • Verify redirect URI matches exactly between Spotify app settings and SPOTIFY_REDIRECT_URI secret
  • Check that redirect URI includes protocol (http:// or https://)
  • Ensure Spotify app has correct scopes: user-read-private, user-read-email, playlist-read-private

Edge Function Deployment:

  • Run supabase functions deploy <function-name> from project root
  • Check function logs in Supabase Dashboard for detailed errors

Deployment

Frontend (Vercel/Netlify):

# Build for production
npm run build

# Deploy to Vercel
vercel --prod

# Or connect GitHub repo to Vercel/Netlify for automatic deployments

Set environment variables in your hosting platform's dashboard.

Supabase Edge Functions:

# Install Supabase CLI
npm install -g supabase

# Login to Supabase
supabase login

# Link to your project
supabase link --project-ref your-project-ref

# Deploy all functions
supabase functions deploy analyze-image
supabase functions deploy get-recommendations
supabase functions deploy generate-music
supabase functions deploy spotify-auth

🎯 How It Works

  1. Photo Upload: User drags and drops or selects a photo (JPG, PNG, WEBP)
  2. Image Analysis:
    • Photo is converted to base64 and sent to analyze-image edge function
    • Gemini 2.5 Flash analyzes the image and extracts:
      • Mood (single word or phrase)
      • Energy level (Low, Medium, High)
      • Suggested genres
      • Poetic one-sentence description
      • Spotify search terms for finding matching tracks
  3. Music Discovery:
    • get-recommendations edge function uses the analysis to search Spotify
    • Multiple search queries are built from search terms, genres, mood, and energy
    • Top 5 unique tracks are returned with preview URLs, album art, and Spotify links
  4. Playback & Sharing:
    • Users can preview tracks (30-second Spotify previews)
    • Share matches on social media or copy links
    • Regenerate to get new recommendations

πŸŽ“ Case Study: Building TuneStory

The Challenge

How do you make music discovery feel personal when you don't know someone's listening history? Traditional recommenders rely on past behavior, but TuneStory needed to understand the context of a momentβ€”the emotional resonance of a photoβ€”and translate that into music. This required bridging computer vision, natural language understanding, and music information retrieval in a way that felt magical, not mechanical.

Technical Deep Dive

Challenge 1: Spotify OAuth Redirect URI Mismatch

Problem: Token exchange was failing with "invalid_grant" errors. The frontend constructed redirect URIs dynamically (window.location.origin + pathname), but the backend used a static fallback, causing mismatches that Spotify's OAuth 2.0 spec rejects.

Approach: Explored three options:

  1. Hardcode redirect URIs (inflexible for dev/staging/prod)
  2. Use environment variables only (breaks localhost development)
  3. Pass redirect URI from frontend to backend (requires validation)

Solution: Frontend now sends the exact redirect_uri used in the authorization request to the backend during token exchange. Backend validates it against an allowlist and uses it for the token request. Added CSRF protection via state parameter validation.

Result: Authentication reliability improved from ~60% success rate to >95%. Users no longer hit cryptic OAuth errors.

Challenge 2: Gemini's Variable Response Structure

Problem: Gemini 2.5 Flash occasionally returned inconsistent JSON schemasβ€”sometimes missing fields, sometimes using different key names, or returning arrays instead of strings. This broke the Spotify search logic downstream.

Approach: Considered three strategies:

  1. Strict JSON mode (limited Gemini's creative analysis capabilities)
  2. Regex parsing (brittle, hard to debug, doesn't catch all edge cases)
  3. Zod schema validation + fallback prompts (maintains flexibility while ensuring reliability)

Solution: Implemented Zod runtime validation with graceful degradation. If Gemini returns incomplete data, we extract what we can and fall back to simpler genre-based search instead of failing. Added retry logic for completely malformed responses.

Result: Error rate dropped from 12% to <1%. Users now see something even if AI analysis is partial, maintaining trust in the product.

Challenge 3: Multi-Strategy Music Search

Problem: Single search queries (e.g., "indie pop summer vibes") often returned generic, overplayed tracks. Users wanted diverse, contextually relevant recommendations that felt personally curated.

Approach: Tested multiple strategies:

  1. Single optimized query (fast but generic)
  2. Sequential fallback queries (slow, still limited diversity)
  3. Parallel multi-strategy searches (faster, maximizes diversity)

Solution: Execute 6+ parallel Spotify searches using different strategies:

  • Gemini-optimized search terms (highest priority)
  • Genre + mood combinations
  • Energy level + mood pairs
  • Broad genre fallbacks
  • Deduplicate results and rank by relevance

Result: Recommendation diversity increased by 3x, with user satisfaction scores improving from 6.2/10 to 8.1/10 in internal testing.

What I Learned

  • Image-to-music mapping is culturally subjective – What feels "nostalgic" varies by listener background. Future versions could incorporate user preference signals to personalize the mapping.
  • Prompt engineering for multimodal AI requires iteration – Initial prompts that worked for text-only models failed with vision. We learned to explicitly describe visual elements (colors, composition, time of day) rather than assuming the model would infer them.
  • Graceful degradation > perfect accuracy – Users prefer seeing some results over error messages, even if the AI analysis is incomplete. This shaped our fallback strategy philosophy.
  • OAuth 2.0 redirect URI validation is non-negotiable – Spotify's strict matching prevents security vulnerabilities, but requires careful coordination between frontend and backend. Documenting the flow helped prevent regressions.
  • Serverless architecture enables rapid iteration – Edge Functions let us deploy fixes in minutes, not hours. This was crucial for debugging OAuth and API integration issues.

Future Roadmap

If I had $50K and 3 months, I would build:

  • Collaborative Playlists – Let multiple users upload photos to co-create a shared soundtrack. Requires multiplayer state sync via Supabase Realtime and conflict resolution for concurrent edits.
  • Personalized Music Generation – Fine-tune AudioCraft models on user's favorite tracks to generate music that matches both the photo's vibe and their musical taste. Requires audio feature extraction pipeline and model training infrastructure.
  • Video Frame Analysis – Extract keyframes from videos and generate dynamic soundtracks that evolve with the narrative. Challenges include frame selection algorithms and temporal mood mapping.
  • Cultural Context Awareness – Incorporate user's location, language, and cultural background into music recommendations. Requires geolocation APIs and culturally-aware genre taxonomies.

For the complete story including the 3-year journey from hackathon to production, philosophical reflections, and detailed technical deep-dives, read Closing the Loop: TuneStory (Revisited) on Substack.


πŸ“Έ Screenshots & Demo

TuneStory upload interface with drag-and-drop zone Drag-and-drop photo upload with real-time preview

Mood analysis display with extracted emotions and genres AI-powered mood analysis showing emotional tone and suggested genres

Music recommendations with audio player Curated soundtrack with 30-second previews and Spotify links

Music generation interface Generate original music tracks based on photo analysis

Demo Video

Watch the full demo on YouTube


πŸ“š Learn More

Choose your depth:

  • πŸ“„ 1-page case study – Hiring-focused overview with metrics, architecture decisions, and what this proves about my capabilities
  • πŸ“– Full story on Substack – The journey from 2023 hackathon to 2026 production rebuild, technical challenges, and lessons learned
  • 🎬 Demo video (4 min) – Watch it in action
  • πŸ’» Technical docs – Deep-dive architecture and setup guides

πŸ“š Additional Resources

Built by Tylar Campbell

AI Product Builder | Applied AI Engineer | PhD Candidate @ SFU

What I Do: Build production-ready AI systems that bridge multimodal intelligence (vision, audio, text) with intuitive UX. TuneStory demonstrates my approach to scoping, shipping, and iterating on novel AI products end-to-end.

Portfolio β€’ LinkedIn β€’ GitHub β€’ Substack

Open to collaboration, feedback, and opportunities in AI product development!


🀝 Contributing

Contributions welcome! Open an issue or PR.

When contributing, please:

  • Follow the existing code style (TypeScript, ESLint rules)
  • Add tests for new features
  • Update documentation as needed
  • Ensure all Edge Functions handle errors gracefully

πŸ“„ License

This project is open source and available under the MIT License.

Attribution Requirements

  • Gemini API: This project uses Google's Gemini 2.5 Flash model. Please review Google's AI Terms of Service for usage guidelines.
  • Spotify Web API: Music recommendations and previews are provided by Spotify. This project complies with Spotify's Developer Terms.
  • AudioCraft MusicGen: Music generation uses Meta's AudioCraft MusicGen model. See AudioCraft License for details.

About

Production AI music platform with controllable generation. Deployed MusicGen via Modal cloud infrastructure for music education. TypeScript, Modal, Supabase, Gemini - shipped to real users in 2025

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors