Skip to content
View Rushikesh-S-Ware's full-sized avatar

Block or report Rushikesh-S-Ware

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Rushikesh-S-Ware/README.md

Rushikesh S. Ware

Data / ML / AI Engineer — based in Northern VA. MS Data Analytics Engineering, George Mason University (Dec 2025, GPA 3.90). 2 years as Programmer Analyst at Cognizant.

I build production-grade data and ML systems: MCP servers for LLM tool use, transformer-based NLP-to-SQL, cost-sensitive risk models on AWS, and end-to-end batch pipelines on Airflow + PySpark + dbt.

📍 Fairfax, VA · Open to Data Engineer, ML Engineer, AI Engineer, Data Scientist roles (US-based) 📧 rushikeshsware@gmail.com · 💼 LinkedIn · 🤗 Hugging Face · 🌐 Portfolio


Flagship Projects

Project Stack Result
MCP-Forest Python, Model Context Protocol, SQLite, Docker, httpx . Live demo → MCP server · ask in plain English · 17 tools · 165 countries + 2,779 regions · 24 yrs of Global Forest Watch data
NYC Taxi Data Pipeline Airflow, PySpark, dbt, Postgres, Docker ~100M trips/year · incremental dbt marts · daily DAG
NLP-to-SQL Transformer (BART) PyTorch, BART, ONNX, FastAPI · Live demo → 45.6% exact-match on Spider · 60% inference speedup
Cybersecurity Risk Triage (KEV) XGBoost, MLflow, R, AWS · 250K+ CVE/CVSS records AUC 0.9957 · 100% KEV recall
Public Health Analytics on AWS Glue, S3, Lambda, EMR, Spark, Redshift, Tableau 20 years · 1999–2018 HHS data
MNIST CNN Classifier PyTorch, CNN, Gradio · Live demo → >97% test accuracy

Tech Stack

Languages Python · SQL · R · Java Data & ML PyTorch · scikit-learn · XGBoost · Transformers (BART) · spaCy · Pandas · NumPy · MLflow LLM Systems Claude API · Model Context Protocol (MCP) · RAG · FAISS · ChromaDB · LangChain Data Engineering Apache Airflow · Apache Spark / PySpark · dbt · Hadoop · PostgreSQL · MySQL · MongoDB · BigQuery Cloud & Infra AWS (Glue, S3, Lambda, EMR, Redshift) · GCP (BigQuery) · Docker · Kubernetes · GitHub Actions BI & Viz Tableau · Power BI · Streamlit · Matplotlib · Plotly


Experience

Programmer Analyst — Cognizant (2 years) Production data pipelines and ETL in Python, SQL, GCP/BigQuery. CI/CD with GitHub Actions. Reproducible, scalable systems for enterprise clients.

MS Data Analytics Engineering — George Mason University (Dec 2025, GPA 3.90) Capstone: production MCP server delivering 24 years of Global Forest Watch data through natural conversation with Claude.


📬 Best way to reach me: rushikeshsware@gmail.com

Pinned Loading

  1. nyc-taxi-data-pipeline nyc-taxi-data-pipeline Public

    End-to-end data engineering pipeline for NYC Yellow Taxi trips. Airflow + PySpark + dbt + Postgres + Docker. ~100M rows/year.

    Python

  2. nlp-sql-transformer nlp-sql-transformer Public

    BART transformer fine-tuned for natural-language-to-SQL on the Spider benchmark. 45.6% exact-match, ONNX-optimized, deployed on Hugging Face Spaces.

    Jupyter Notebook 1

  3. cve-kev-risk-ml cve-kev-risk-ml Public

    Predicts which CVEs will become CISA Known Exploited Vulnerabilities. Cost-sensitive XGBoost on 250K+ records: AUC 0.9957, full KEV recall.

    R

  4. public-health-aws-pipeline public-health-aws-pipeline Public

    20-year analysis of US drug-overdose mortality (1999-2018) using AWS Glue, S3, Spark, Redshift, and Tableau. Demographic disparities and synthetic-opioid trend modeling.

    Jupyter Notebook