An interactive data visualization tool that clusters drugs based on their pathway activation patterns, helps identify relationships between drugs with similar mechanisms of action (MoA), and visualizes these relationships in an explorable 3D space.
This tool performs several key operations:
- Loads drug pathway data from CSV files
- Analyzes the data using Principal Component Analysis (PCA)
- Groups similar drugs using K-means clustering
- Creates an interactive 3D visualization for exploring the results
- Integrates mechanism of action (MoA) data to provide context
The visualization allows researchers to identify drugs with similar pathway profiles, discover potential new applications for existing drugs, and understand the relationship between pathway activation patterns and clinical effects.
- PCA-Based Dimensionality Reduction: Condenses complex pathway data into a 3D visualization
- Automatic Optimal Clustering: Finds the best number of clusters using the elbow method
- Interactive 3D Visualization: Explore the drug landscape with zoom, rotation, and pan controls
- MoA Integration: Hover over drugs to see their mechanism of action
- Cluster Analysis: See detailed statistics about each cluster
- Filtering Capabilities: Highlight drugs by MoA to observe patterns
- Connection Visualization: See how drugs with the same MoA relate to each other spatially
# Clone the repository
git clone https://github.com/TheSaezAtienzarLab/clustering-project.git
cd drug-clustering
# Create a virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt-
Place your drug pathway data CSV files in the
drugs_data/directory- Each file should be named after the drug (e.g.,
aspirin.csv) - Files should contain columns for
Term(pathway name) andNES(normalized enrichment score)
- Each file should be named after the drug (e.g.,
-
(Optional) Place your MoA data in
drugs_association/all_matched_drugs.csvwith columns for drug names and their mechanisms of action
# Run the main analysis script
python main.py- Each point represents a drug
- Colors represent different clusters
- Hover over points to see drug name, MoA, and cluster assignment
- Use the dropdown to select and highlight drugs by MoA
- Toggle connections to see relationships between drugs with the same MoA
- Use cluster statistics to understand the distribution of drugs
main.py: Main script that performs analysis and generates visualizationrequirements.txt: List of required Python packagesresults/: Directory where analysis results and visualization are saved
- Python 3.7+
- pandas
- numpy
- scikit-learn
- plotly
- kneed (for finding optimal cluster number)
If you encounter issues with missing data:
- Check that your drug pathway files follow the required format
- Ensure the MoA data file contains the correct column names
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- This tool uses Plotly for interactive visualizations
- Clustering algorithms are powered by scikit-learn
- MoA analysis was done using MoAble
Note: This tool is for research purposes only and should not be used for clinical decision making.
