MIWA (Music Indexing with AI) is an intelligent music search system that helps users find songs using natural language queries. Instead of requiring exact song titles or artist names, users can describe songs using partial lyrics, mood, genre, artist characteristics, location, or any combination of these attributes.
- Natural Language Query Processing: Extract structured information from conversational queries using Claude AI
- Multi-Modal Search: Search by:
- Song title keywords
- Lyrics (exact keywords or semantic similarity)
- Artist name and characteristics
- Genre
- Release year (ranges)
- Geographic location (country or region)
- Album information
- Featured artists
- GraphRAG Integration: Powered by Neo4j graph database for efficient relationship-based retrieval
- Hybrid Retrieval: Combines full-text search (Lucene) and vector embeddings (Jina embeddings) for optimal results
- Interactive Streamlit Demo: User-friendly web interface for testing queries
- Benchmarking & Evaluation: Tools for testing extraction accuracy and retrieval performance
-
LLM Extraction Layer (
app.py)- Uses Anthropic Claude API with COSTAR-formatted prompts
- Extracts structured JSON from natural language queries
- Outputs XML for reliable parsing
-
GraphRAG Database (Neo4j)
- Stores tracks, artists, albums, genres, locations, lyrics, and descriptions
- Full-text indexes for keyword search
- Vector indexes for semantic similarity search
- Relationship-based queries for complex filtering
-
Retrieval Engine (
app.py,calculate_accuracy.py)- Multi-stage filtering and scoring system
- Combines multiple score types (title, lyrics, artist, album)
- Softmax normalization for balanced scoring
- Returns top-K results ranked by relevance
-
Evaluation Tools
benchmark_extraction.py: Tests field extraction accuracycalculate_accuracy.py: Evaluates retrieval performance (Top-1, Top-3, Top-10, Top-15)generate_prompts.py: Generates diverse test prompts using OLLAMA
- Python 3.8+
- Neo4j database (version 5.x or later)
- Anthropic API key (for Claude)
- (Optional) OLLAMA (for prompt generation)
-
Clone the repository
git clone <repository-url> cd MusicIndexingWithAI
-
Install dependencies
pip install -r requirements.txt
-
Set up environment variables
Create a
.envfile in the project root:# Anthropic API (required) ANTHROPIC_API_KEY=your_anthropic_api_key_here # Neo4j Connection (required) NEO4J_URI=bolt://localhost:7687 NEO4J_USERNAME=neo4j NEO4J_PASSWORD=your_password_here
-
Set up Neo4j Database
- Install and start Neo4j
- Create the graph structure using
src/create_graph.py - Load your music data into Neo4j
- The system will automatically create necessary indexes
streamlit run app.pyThe demo will open in your browser. You can:
- Enter natural language queries about songs
- View extracted structured information
- See retrieval results and AI-generated suggestions
- Enable debug mode to inspect extraction details
- "Find songs with 'love' in the title from the 2000s"
- "Rock songs by Norwegian artists with lyrics about mountains"
- "Songs featuring Drake released after 2015"
- "Pop songs from Europe with lyrics mentioning 'dancing'"
- "Songs by artists founded in the 1960s from the United Kingdom"
Test the extraction system on generated prompts:
python3 benchmark_extraction.pyThis will:
- Load prompts from
data_parsing/data/generated_prompts.json - Extract fields for each prompt
- Generate a benchmark report
- Update the prompts file with extracted JSON
Evaluate retrieval performance:
python3 calculate_accuracy.pyThis will:
- Load prompts with expected track information
- Run retrieval for each prompt
- Calculate Top-1, Top-3, Top-10, Top-15 accuracy
- Generate accuracy results JSON files
Generate diverse prompts for testing:
python3 generate_prompts.pyNote: Requires OLLAMA running locally with the qwen3:14b model.
MusicIndexingWithAI/
├── app.py # Main Streamlit application
├── benchmark_extraction.py # Extraction benchmarking script
├── calculate_accuracy.py # Retrieval accuracy evaluation
├── generate_prompts.py # Test prompt generation (OLLAMA)
├── requirements.txt # Python dependencies
├── README.md # This file
│
├── data/ # Sample data files
│ ├── artists_locations_countries_only.json
│ ├── genres_sample.json
│ └── locations_sample.json
│
├── data_parsing/ # Data processing scripts
│ ├── data/
│ │ ├── generated_prompts.json # Generated test prompts
│ │ ├── sample_100.json # Sample tracks for prompt generation
│ │ └── datascripts/ # Data processing utilities
│ ├── eda/ # Exploratory data analysis
│ └── utils/ # Parsing utilities
│
├── results/ # Evaluation results
│ ├── benchmark_report_*.json
│ └── accuracy_results_*.json
│
└── src/ # Source code
└── create_graph.py # Neo4j graph creation script
The extraction system uses a COSTAR-formatted prompt to extract structured information:
{
"track": {
"title_keywords": ["love", "heart"],
"year_from": 2000,
"year_to": 2010,
"genres": ["rock", "pop"],
"lyrics_keywords": ["tears"],
"lyrics_text": "songs about crying and sadness"
},
"artist": {
"name_keywords": ["beatles"],
"country": "United Kingdom",
"region": null,
"description_text": "legendary rock band"
},
"features": ["Drake"],
"album": {
"name_keywords": ["thriller"]
}
}The retrieval system uses a multi-stage approach:
- Filtering: Apply numeric filters (year ranges, views) first
- Scoring: Calculate relevance scores for:
- Title match (full-text search)
- Lyrics match (full-text + vector similarity)
- Artist name match (full-text search)
- Artist description (vector similarity)
- Album match (full-text search)
- Ranking: Combine normalized scores and return top-K results
- Serhii Dmytryshyn - GraphRAG and retrieval implementation, Neo4j integration
- Zakhar Kohut - Evaluation tools, benchmarking scripts, LLM integration and prompt generation
- Andrii Kravchuk - Data parsing and preprocessing
- Anthropic for Claude API
- Neo4j for graph database
- Jina AI for embeddings model