Audio CNN Visualizer is a full-stack application for audio classification and deep learning visualization. Users can upload WAV audio files, which are sent to a backend (Python, e.g., FastAPI) for inference using a Convolutional Neural Network (CNN). The frontend, built with Next.js and React, visualizes the model's predictions, feature maps, and audio waveforms in an interactive UI.
- Upload and analyze WAV audio files
- Real-time display of top predictions with confidence scores and emojis
- Visualization of feature maps from CNN layers
- Interactive waveform and spectrogram display
- Responsive, modern UI (Next.js, Tailwind CSS)
- Easy backend integration via environment variable
audio-cnn-main/
β
βββ audio-cnn-visualisation/ # Next.js frontend
β βββ src/
β β βββ app/
β β β βββ layout.tsx
β β β βββ page.tsx # Main UI logic, file upload, API call, visualization
β β βββ components/
β β β βββ FeatureMap.tsx # Feature map visualization
β β β βββ Waveform.tsx # Waveform visualization
β β β βββ ... # UI components
β β βββ lib/
β β βββ utils.ts # Utility functions
β βββ public/
β βββ favicon.ico
β
βββ main.py # Python backend entry (FastAPI, etc.)
βββ model.py # CNN model definition and loading
βββ requirements.txt # Python dependencies
βββ chirpingbirds.wav # Example audio file
βββ README.md
- The backend receives a WAV file, decodes it, and normalizes the waveform.
- The waveform is converted to a spectrogram (time-frequency representation) using STFT or Mel-spectrogram.
- The spectrogram is treated as an image and fed into the CNN.
- Convolutional layers scan the spectrogram with learnable filters, extracting local time-frequency patterns.
- Activation functions (e.g., ReLU) introduce non-linearity.
- Pooling layers reduce dimensionality and focus on salient features.
- Deeper layers learn higher-level audio features (e.g., timbre, rhythm).
- The final layers flatten the features and use fully connected layers for classification.
- Dataset: Commonly, datasets like ESC-50, UrbanSound8K, or custom audio datasets are used.
- Augmentation: Techniques like noise addition, time-shifting, and pitch-shifting improve generalization.
- Loss Function: Cross-entropy for classification.
- Optimization: Adam or SGD optimizers are used to minimize loss.
- Validation: Model performance is tracked on a held-out validation set.
- The backend returns:
- Predictions: Top classes and confidence scores
- Feature maps: Outputs from intermediate CNN layers
- Input spectrogram and waveform
- The frontend visualizes:
- Predictions with emojis and confidence bars
- Feature maps as heatmaps
- Waveform and spectrogram for user insight
- Feature maps help users see what the CNN is "looking at" in the audio.
- Saliency maps or Grad-CAM (not implemented here, but possible) can highlight which parts of the spectrogram most influence the prediction.
- Interactive visualization helps demystify deep learning for non-experts.
graph TD;
User-->|Uploads WAV|Frontend(Next.js)
Frontend-->|POST audio|Backend(Python API)
Backend-->|CNN inference|Model
Model-->|Predictions, features|Backend
Backend-->|JSON|Frontend
Frontend-->|Visualization|User
Request:
POST / (or /predict)
{
"audio_data": "<base64-encoded-audio>"
}Response:
{
"predictions": [
{"class": "dog", "confidence": 0.92},
{"class": "cat", "confidence": 0.05}
],
"visualization": {"conv1": {"shape": [8, 8], "values": [[...], ...]}},
"input_spectrogram": {"shape": [128, 128], "values": [[...], ...]},
"waveform": {"values": [...], "sample_rate": 44100, "duration": 2.5}
}- Node.js (for frontend)
- Python 3.8+ (for backend)
cd audio-cnn-visualisation
npm install
# Add your backend URL to .env.local
# Example:
# NEXT_PUBLIC_API_URL=https://your-backend-url
npm run devpip install -r requirements.txt
uvicorn main:app --reload- Backend: Deploy on Modal, Render, Heroku, or any cloud provider. Expose a POST endpoint for audio inference.
- Frontend: Deploy on Vercel or Netlify. Set the
NEXT_PUBLIC_API_URLenvironment variable in your frontend deployment to point to your backend.
- Open the frontend in your browser.
- Upload a WAV file.
- View predictions, feature maps, and waveform visualizations.
- Explore the interactive UI for more details.
- Build or Lint Errors: Ensure your code matches the latest pushed version and all lint errors are fixed before deploying.
- API Errors: Check that your backend is running and accessible from the frontend. Update
NEXT_PUBLIC_API_URLas needed. - Large Files: Do not commit
venv/or large model files to git. Use.gitignoreand requirements.txt.

