Vinayak Vemula Vemula-Vinayak

Hi, I'm Vinayak Vemula 👋

MS Data Science @ Montclair State University '26 | AI/ML · NLP · Python · SQL · Power BI

Building AI-powered pipelines, machine learning models, and analytics dashboards that turn complex data into real decisions.

🧠 About Me

🎓 MS Data Science, Montclair State University (GPA 3.8/4.0, May 2026)
🤖 Passionate about AI/ML, NLP, and applied data science
💼 Former Data Analytics Intern @ Main Flow Services & Technologies
📍 New Jersey — open to NYC metro & remote roles
🛂 Available on OPT from June 2026
📫 Reach me: vinayakvemula09@gmail.com
🔗 LinkedIn

🛠 Tech Stack

Languages & ML

AI & NLP

Data & BI

Tools

🚀 Featured Projects

🌱 Recycling Awareness Data Dashboard — Master's Capstone

Python · Flask · Pandas · SciPy · Chart.js · EPA Data

Built a full-stack analytics dashboard analyzing U.S. recycling rates (1960–2022) using real EPA government data.

Engineered complete ETL pipeline from raw Excel → structured CSVs → 9 Flask REST API endpoints
Applied linear regression & Pearson correlation (Paper R²=0.97, Metal R²=0.91)
Key insight: deposit-law states recycle 2.4× more glass than non-deposit states
Interactive dashboard with choropleth maps, KPI cards, and trend visualizations

🔐 Credit Card Fraud Detection Pipeline

Python · Scikit-Learn · SMOTE · PCA · XGBoost · Matplotlib

End-to-end ML pipeline on 284,000+ financial transactions for anomaly and fraud detection.

Handled severe class imbalance with SMOTE oversampling
Applied PCA for dimensionality reduction + StandardScaler normalization
Benchmarked Logistic Regression, Random Forest, and XGBoost optimizing for fraud recall

🚗 U.S. Fatal Accidents Analytics Dashboard

Python · Pandas · SQL · Plotly · Seaborn · Mapbox

Analyzed 39,000+ FARS crash records to identify temporal, geographic, and environmental risk patterns.

Optimized SQL queries reduced report generation time by 40%
Interactive Plotly dashboard with bubble maps, choropleth maps, and drill-down filters
Key finding: evening hours (5–8 PM) + adverse lighting = top contributing factors

🧬 Graph Clustering with Graph Neural Networks (GNNs)

PyTorch · GCN · Cora Dataset · NMI · Modularity Metrics

Unsupervised graph clustering pipeline using Graph Convolutional Networks for community detection.

Preprocessed Cora citation dataset to generate graph embeddings
Evaluated with Normalized Mutual Information (NMI) and modularity metrics
Applied deep learning for representation learning on graph-structured data

📊 Classification Model Benchmarking Study

Python · Scikit-Learn · XGBoost · AdaBoost · SVM · NumPy

Comprehensive benchmarking of 8 classification algorithms on structured datasets.

Algorithms: Decision Tree, Naive Bayes (Gaussian & Multinomial), SVM (Linear & RBF), k-NN, Random Forest, AdaBoost, XGBoost
XGBoost: 97.8% accuracy · SVM (RBF): 97.6% accuracy
Full performance report with trade-off analysis across accuracy, interpretability, and compute cost

🎵 Spotify Song Popularity Prediction

Python · Scikit-Learn · Random Forest · K-Means · Linear Regression

Multi-algorithm ML on Spotify audio features to predict song popularity.

Compared classification, regression, and clustering approaches with cross-validation
Feature importance: energy, danceability, and loudness are top predictors

🎓 Certifications

Certificate	Issuer
Data Analytics	Accenture (Forage)
Databases and SQL for Data Science	IBM — Coursera
Supervised Machine Learning	DeepLearning.AI & Stanford — Coursera
Python for Everybody	Google — Coursera

📫 Let's Connect

Open to AI/ML Analyst, Junior Data Scientist, and NLP Analyst roles. Available on OPT June 2026.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly