MS Data Science @ Montclair State University '26 | AI/ML · NLP · Python · SQL · Power BI
Building AI-powered pipelines, machine learning models, and analytics dashboards that turn complex data into real decisions.
- 🎓 MS Data Science, Montclair State University (GPA 3.8/4.0, May 2026)
- 🤖 Passionate about AI/ML, NLP, and applied data science
- 💼 Former Data Analytics Intern @ Main Flow Services & Technologies
- 📍 New Jersey — open to NYC metro & remote roles
- 🛂 Available on OPT from June 2026
- 📫 Reach me: vinayakvemula09@gmail.com
Languages & ML
AI & NLP
Data & BI
Tools
🌱 Recycling Awareness Data Dashboard — Master's Capstone
Python · Flask · Pandas · SciPy · Chart.js · EPA Data
Built a full-stack analytics dashboard analyzing U.S. recycling rates (1960–2022) using real EPA government data.
- Engineered complete ETL pipeline from raw Excel → structured CSVs → 9 Flask REST API endpoints
- Applied linear regression & Pearson correlation (Paper R²=0.97, Metal R²=0.91)
- Key insight: deposit-law states recycle 2.4× more glass than non-deposit states
- Interactive dashboard with choropleth maps, KPI cards, and trend visualizations
Python · Scikit-Learn · SMOTE · PCA · XGBoost · Matplotlib
End-to-end ML pipeline on 284,000+ financial transactions for anomaly and fraud detection.
- Handled severe class imbalance with SMOTE oversampling
- Applied PCA for dimensionality reduction + StandardScaler normalization
- Benchmarked Logistic Regression, Random Forest, and XGBoost optimizing for fraud recall
Python · Pandas · SQL · Plotly · Seaborn · Mapbox
Analyzed 39,000+ FARS crash records to identify temporal, geographic, and environmental risk patterns.
- Optimized SQL queries reduced report generation time by 40%
- Interactive Plotly dashboard with bubble maps, choropleth maps, and drill-down filters
- Key finding: evening hours (5–8 PM) + adverse lighting = top contributing factors
PyTorch · GCN · Cora Dataset · NMI · Modularity Metrics
Unsupervised graph clustering pipeline using Graph Convolutional Networks for community detection.
- Preprocessed Cora citation dataset to generate graph embeddings
- Evaluated with Normalized Mutual Information (NMI) and modularity metrics
- Applied deep learning for representation learning on graph-structured data
Python · Scikit-Learn · XGBoost · AdaBoost · SVM · NumPy
Comprehensive benchmarking of 8 classification algorithms on structured datasets.
- Algorithms: Decision Tree, Naive Bayes (Gaussian & Multinomial), SVM (Linear & RBF), k-NN, Random Forest, AdaBoost, XGBoost
- XGBoost: 97.8% accuracy · SVM (RBF): 97.6% accuracy
- Full performance report with trade-off analysis across accuracy, interpretability, and compute cost
Python · Scikit-Learn · Random Forest · K-Means · Linear Regression
Multi-algorithm ML on Spotify audio features to predict song popularity.
- Compared classification, regression, and clustering approaches with cross-validation
- Feature importance: energy, danceability, and loudness are top predictors
| Certificate | Issuer |
|---|---|
| Data Analytics | Accenture (Forage) |
| Databases and SQL for Data Science | IBM — Coursera |
| Supervised Machine Learning | DeepLearning.AI & Stanford — Coursera |
| Python for Everybody | Google — Coursera |
Open to AI/ML Analyst, Junior Data Scientist, and NLP Analyst roles. Available on OPT June 2026.