Skip to content

Prateek-845/PixelPlay

Repository files navigation

PixelPlay : Visual to Audio Translation Engine

PixelPlay is a multimodal AI application designed to translate visual stimuli into auditory experiences. By leveraging Computer Vision and Cross Modal Vector Embeddings, the system bridges the gap between sight and sound, enabling users to discover music that semantically matches the "vibe" of an input image.

Live Demo: https://pixelplay-demo.streamlit.app/

Test Credentials:

  • Username: lord
  • Password: 123456

Overview

Unlike traditional tag based search engines, PixelPlay utilizes Cross Modal Vector Embeddings to perform semantic matching. The core architecture relies on OpenAI's CLIP (Contrastive Language Image Pretraining) model to project both images and text descriptions of music into a shared 512 dimensional vector space.

When a user uploads an image, the system:

  1. Encodes the image into a high dimensional vector using the CLIP Visual Encoder.
  2. Computes the Cosine Similarity between the input image vector and a pre indexed vector database of 10,000 songs.
  3. Retrieves the tracks with the highest semantic correlation to the visual input.

Features

Core Functionality

  • Image to Audio Search: Upload any image file (JPG, PNG) to receive a curated list of song recommendations that match the visual content and mood.
  • Hybrid Text Refinement: Refine visual search results by adding text context (e.g., "Energetic," "Warm," "Fast paced"). The system mathematically blends the image vector with the text vector to adjust the search trajectory.
  • Audio Pivot Search: The "Find Similar" button allows users to pivot from visual search to audio-based search. Selecting this option uses the specific vector of a recommended song to find other tracks with similar audio profiles.

User Interface & Experience

  • Integrated Audio Player: Preview recommended tracks directly within the application using the native HTML5 audio player (supports MP3/M4A).
  • Spotify Integration: Each song card includes a direct "Open on Spotify" link, allowing users to instantly transition from discovery to full playback on their preferred streaming platform.
  • Real Time Metadata: Displays accurate song titles, artist names, genres, and release years. The system prioritizes real time metadata fetched via iTunes API during data enrichment, falling back to dataset values only when necessary.
  • Visual Data Analytics: Every recommendation includes a dynamic Radar Chart visualizing key audio metrics:
    • Energy: Intensity and activity level.
    • Valence: Musical positiveness.
    • Danceability: Suitability for dancing.
    • Acousticness: Confidence the track is acoustic.

System & Security

  • Secure Authentication: Complete user management system featuring secure login, account registration, and password recovery.
  • Password Hashing: User credentials are secured using bcrypt hashing standards.
  • Session Management: Persistent session states ensure users remain logged in and retain their search context during navigation.
  • Data Enrichment Pipeline: A custom built ETL pipeline fetches and updates metadata (album art, preview URLs) to ensure high quality display data without slowing down runtime performance.

Technical Architecture

The codebase follows a modular architecture, separating the production application logic from the data engineering pipelines.

Application Layer (/)

  • app.py: The central controller that orchestrates the Streamlit interface and application flow.
  • logic.py: Contains the core business logic, including CLIP model inference, vector calculations, and data loading routines.
  • ui_components.py: Manages the frontend design system, including custom CSS injection, card rendering, and chart generation.
  • auth_manager.py: Encapsulates all authentication logic, config handling, and security protocols.

Data Engineering (/data processing)

  • clean_data.py: Handles initial cleaning and normalization of the raw CSV dataset.
  • embed_songs.py: Runs the inference batch job to generate 512-dimensional vector embeddings for the entire music catalog.
  • enrich_data.py: A post-processing script that queries external APIs (iTunes) to hydrate the dataset with high-quality metadata, album artwork, and audio preview URLs.

Data Storage (/data)

  • data/: Stores the raw source datasets and intermediate files used during the build process. The production app relies on an optimized pickle file (songs_enriched.pkl) generated by the pipeline.

Installation and Local Deployment

  1. Clone the repository

    git clone [https://github.com/Prateek-845/PixelPlay.git]
    cd PixelPlay
  2. Install Dependencies Ensure you have Python installed, then run:

    pip install -r requirements.txt
  3. Run the Application

    streamlit run app.py

Note: This project is a portfolio demonstration. User accounts created in the live demo environment are ephemeral and may be reset during system updates or redeployments.

About

PixelPlay is a multimodal AI application designed to translate visual stimuli into auditory experiences. By leveraging Computer Vision and Cross Modal Vector Embeddings, the system bridges the gap between sight and sound, enabling users to discover music that semantically matches the "vibe" of an input image.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages