- 👋 Hi, I’m @thyphan2025
- 👀 I’m interested in AI & Machine Learning.
- 🌱 I’m currently pursuing Master of Science in Data Analytics Engineering at George Mason University
- 😄 Pronouns: she/her
- ⚡ Fun fact: I love exploring different cultures, especially their amazing foods.
- ⭐ Motivation quote : "I have no special talents. I am only passionately curious." - Albert Einstein
- Building small passion projects to explore data workflows and new tools
- Reading Forecasting: Principles and Practice (3rd ed) by Rob J Hyndman and George Athanasopoulos
R, Shiny, ggplot2, plotly, leaflet, GitHub
- Contributed to a broader capstone project analyzing global terrorism trends, with a focus on Mexican cartel-related incidents.
- Collaborated with one team member to develop an interactive R Shiny dashboard for visualizing and analyzing Mexico 2025 terrorism and cartel-related data.
- Focused dashboard analysis on March–June 2025 due to incomplete reporting coverage for the remaining months of the year.
- Designed dynamic filtering and geospatial visualizations to support exploratory analysis and trend exploration.
Python, PySpark, Databricks
- Cleaned and reshaped a multi-state bridge dataset to examine material and design patterns and applied association rule mining to identify recurring relationships.
→ Bridge-Material-and-Design-Analysis
Power BI
- Explored multi-season influenza data to monitor trends, subtype distribution, and outbreak severity through an interactive dashboard.
→ Influenza Surveillance Dashboard Chicago
R, Time-Series Analysis, Interactive Plot, Forecasting
- Cleaned and analyzed multi-year air quality data to examine environmental risk patterns and forecast ozone trends using ARIMA model.
- Published interactive HTML report with code, Plotly visualizations, and a few static plots.
→ New York Air Quality Analysis
Python, Data Analysis, Machine Learning
- Analyzed electric vehicle adoption data to examine growth trends, geographic distribution, and vehicle characteristics across regions.
- Trained a Decision Tree Model to classify between Battery Electric Vehicles (BEVs) and Plug-in Hybrid Electric Vehicles (PHEVs)
- Utilized Synthetic Minority Oversampling Technique (SMOTE) to address class imbalance.
Python, SQL, R, NLP
- Cleaned and analyzed global incident data to identify geographic hotspots, severity patterns, and recurring risk signals affecting education infrastructure.
- Applied natural language processing (NLP) to extract sentiment and patterns from incident descriptions.
→ Education-in-Danger-Incidents
Python, PySpark, Spark MLib, Databricks
- Contributed code to the PySpark modeling workflow in Databricks, including feature engineering and evaluation using Python, PySpark and Spark MLlib.