An end-to-end data science project predicting the success of SpaceX Falcon 9 first-stage landings. Built as the IBM Data Science Professional Certificate capstone, covering the full pipeline from data collection to machine learning classification.
SpaceX advertises Falcon 9 launches at $62M vs competitors at $165M+, largely because the first stage is reusable. Predicting whether the first stage will land successfully enables cost estimation for competing rocket companies and informs launch planning decisions.
SpaceX API + Web Scraping → Data Wrangling → EDA (SQL + Visualization) → Interactive Maps → ML Classification
| Notebook | Description |
|---|---|
Data Collection API.ipynb |
Pulls Falcon 9 launch data from the SpaceX REST API, extracts rocket, launchpad, payload, and core information |
Data Collection with Web Scraping.ipynb |
Scrapes historical Falcon 9 launch records from Wikipedia using BeautifulSoup |
Data Wrangling.ipynb |
Cleans and merges API and scraped data, handles missing values, engineers the landing outcome label |
EDA with SQL.ipynb |
Explores the dataset using SQL queries on IBM Db2 — launch sites, payload ranges, booster versions, and success rates |
EDA with Data Visualization.ipynb |
Visual EDA with Matplotlib and Seaborn — launch success trends, payload vs orbit, launch site analysis |
Interactive Visual Analytics with Folium.ipynb |
Interactive map of launch sites with Folium — success/fail markers, proximity analysis, and distance calculations |
Machine Learning Prediction.ipynb |
Trains and tunes Logistic Regression, SVM, Decision Tree, and KNN classifiers; selects best model using GridSearchCV and confusion matrix analysis |
spacex_dash_app.py |
Interactive Plotly Dash dashboard — pie chart of launch success by site, payload scatter plot with outcome coloring |
| Area | Tools |
|---|---|
| Data Collection | SpaceX REST API, BeautifulSoup, requests |
| Data Processing | pandas, numpy |
| SQL Analysis | IBM Db2, sqlite3 |
| Visualization | Matplotlib, Seaborn, Plotly |
| Interactive Maps | Folium |
| Dashboard | Plotly Dash |
| Machine Learning | scikit-learn (Logistic Regression, SVM, Decision Tree, KNN) |
- Launch site KSC LC-39A had the highest success rate among all sites
- Heavier payloads (5,000–10,000 kg) showed higher success rates in certain orbits
- Booster version is a strong predictor — later versions (v1.1, FT) significantly outperform earlier ones
- Decision Tree and SVM achieved the best classification accuracy on the test set
Gkeri Pepelasi