Data science project exploring the relationship between NBA market size, spending, and team wins using regression and decision tree models in R
Exploring the relationship between big markets, spending, and NBA success
This project analyzes whether NBA teams in larger markets (wealthier owners, bigger cities, higher player salaries) perform better than small-market teams in terms of wins.
Using multiple datasets (salaries, salary cap, owner net worth, city population, and team wins), we applied statistical modeling and machine learning to test whether market size really matters — or if other factors are more predictive of success.
- Language: R
- Libraries: tidyverse, caret, ggplot2, rpart
- Techniques: Data Wrangling, Linear Regression, Regression Trees, Visualization, Model Evaluation
- Data Collection: Gathered datasets on NBA salaries, owner net worth, salary caps, city population, and win totals.
- Data Cleaning: Merged and standardized datasets, handled missing values.
- Exploratory Analysis: Visualized correlations between market-related variables and wins.
- Modeling:
- Linear Regression: Tested predictive power of market size, spending, and other factors.
- Regression Tree: Built interpretable model to see decision splits (e.g., high salary cap spending vs. wins).
- Evaluation: Compared R² and interpretability of regression vs. tree models.
- Market size (city population, owner net worth) had little effect on wins.
- Spending relative to the salary cap was the strongest predictor of team success.
- Regression model (R² ~0.72) outperformed regression tree (R² ~0.55), but trees highlighted clear spending thresholds for competitive teams.
- Takeaway: It’s not how big your market is — it’s how much your organization invests.
- Re-implement models in Python with Scikit-learn for portfolio variety.
- Build an interactive dashboard (Shiny in R or Streamlit in Python).
John Hankwitz – Senior at UMass Amherst studying Informatics (Data Science concentration) & Computer Science minor.
- [LinkedIn](#)