Skip to content
View Vemula-Vinayak's full-sized avatar

Block or report Vemula-Vinayak

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Vemula-Vinayak/README.md

Hi, I'm Vinayak Vemula 👋

MS Data Science @ Montclair State University '26 | AI/ML · NLP · Python · SQL · Power BI

Building AI-powered pipelines, machine learning models, and analytics dashboards that turn complex data into real decisions.


🧠 About Me

  • 🎓 MS Data Science, Montclair State University (GPA 3.8/4.0, May 2026)
  • 🤖 Passionate about AI/ML, NLP, and applied data science
  • 💼 Former Data Analytics Intern @ Main Flow Services & Technologies
  • 📍 New Jersey — open to NYC metro & remote roles
  • 🛂 Available on OPT from June 2026
  • 📫 Reach me: vinayakvemula09@gmail.com
  • 🔗 LinkedIn

🛠 Tech Stack

Languages & ML

Python SQL PyTorch Scikit-Learn XGBoost

AI & NLP

HuggingFace LangChain OpenAI

Data & BI

Pandas NumPy Power BI Tableau Plotly

Tools

Flask Git MySQL Jupyter


🚀 Featured Projects

🌱 Recycling Awareness Data Dashboard — Master's Capstone

Python · Flask · Pandas · SciPy · Chart.js · EPA Data

Built a full-stack analytics dashboard analyzing U.S. recycling rates (1960–2022) using real EPA government data.

  • Engineered complete ETL pipeline from raw Excel → structured CSVs → 9 Flask REST API endpoints
  • Applied linear regression & Pearson correlation (Paper R²=0.97, Metal R²=0.91)
  • Key insight: deposit-law states recycle 2.4× more glass than non-deposit states
  • Interactive dashboard with choropleth maps, KPI cards, and trend visualizations

🔐 Credit Card Fraud Detection Pipeline

Python · Scikit-Learn · SMOTE · PCA · XGBoost · Matplotlib

End-to-end ML pipeline on 284,000+ financial transactions for anomaly and fraud detection.

  • Handled severe class imbalance with SMOTE oversampling
  • Applied PCA for dimensionality reduction + StandardScaler normalization
  • Benchmarked Logistic Regression, Random Forest, and XGBoost optimizing for fraud recall

Python · Pandas · SQL · Plotly · Seaborn · Mapbox

Analyzed 39,000+ FARS crash records to identify temporal, geographic, and environmental risk patterns.

  • Optimized SQL queries reduced report generation time by 40%
  • Interactive Plotly dashboard with bubble maps, choropleth maps, and drill-down filters
  • Key finding: evening hours (5–8 PM) + adverse lighting = top contributing factors

🧬 Graph Clustering with Graph Neural Networks (GNNs)

PyTorch · GCN · Cora Dataset · NMI · Modularity Metrics

Unsupervised graph clustering pipeline using Graph Convolutional Networks for community detection.

  • Preprocessed Cora citation dataset to generate graph embeddings
  • Evaluated with Normalized Mutual Information (NMI) and modularity metrics
  • Applied deep learning for representation learning on graph-structured data

📊 Classification Model Benchmarking Study

Python · Scikit-Learn · XGBoost · AdaBoost · SVM · NumPy

Comprehensive benchmarking of 8 classification algorithms on structured datasets.

  • Algorithms: Decision Tree, Naive Bayes (Gaussian & Multinomial), SVM (Linear & RBF), k-NN, Random Forest, AdaBoost, XGBoost
  • XGBoost: 97.8% accuracy · SVM (RBF): 97.6% accuracy
  • Full performance report with trade-off analysis across accuracy, interpretability, and compute cost

🎵 Spotify Song Popularity Prediction

Python · Scikit-Learn · Random Forest · K-Means · Linear Regression

Multi-algorithm ML on Spotify audio features to predict song popularity.

  • Compared classification, regression, and clustering approaches with cross-validation
  • Feature importance: energy, danceability, and loudness are top predictors

🎓 Certifications

Certificate Issuer
Data Analytics Accenture (Forage)
Databases and SQL for Data Science IBM — Coursera
Supervised Machine Learning DeepLearning.AI & Stanford — Coursera
Python for Everybody Google — Coursera

📫 Let's Connect

LinkedIn Email


Open to AI/ML Analyst, Junior Data Scientist, and NLP Analyst roles. Available on OPT June 2026.

Popular repositories Loading

  1. Mainflow-Internship Mainflow-Internship Public

    Internship analytics portfolio: healthcare datasets (heart disease & heart failure) with EDA, modeling, and reporting deliverables.

    Jupyter Notebook

  2. Vemula-Vinayak Vemula-Vinayak Public

  3. Student-Sleep-Happiness-Analysis Student-Sleep-Happiness-Analysis Public

    Exploratory data analysis of student lifestyle data examining sleep patterns and happiness correlations using Python.

    Jupyter Notebook

  4. Deep-Learning-MNIST-Model-Comparison Deep-Learning-MNIST-Model-Comparison Public

    Comparison of MLP and CNN architectures on MNIST using PyTorch with performance benchmarking against XGBoost.

    Jupyter Notebook

  5. Classification-Model-Comparison Classification-Model-Comparison Public

    Benchmarking classical ML classification algorithms including SVM, Random Forest, and XGBoost with performance comparison.

    Jupyter Notebook

  6. US-Fatal-Accidents-Analysis US-Fatal-Accidents-Analysis Public

    Exploratory data analysis of 2022 U.S. fatal traffic accidents using interactive maps and statistical visualization.

    Jupyter Notebook