Skip to content

dracarrys/Capstone_project_DS

Repository files navigation

SpaceX Falcon 9 Launch Success Prediction — IBM Data Science Capstone

An end-to-end data science project predicting the success of SpaceX Falcon 9 first-stage landings. Built as the IBM Data Science Professional Certificate capstone, covering the full pipeline from data collection to machine learning classification.

Project Goal

SpaceX advertises Falcon 9 launches at $62M vs competitors at $165M+, largely because the first stage is reusable. Predicting whether the first stage will land successfully enables cost estimation for competing rocket companies and informs launch planning decisions.

Pipeline Overview

SpaceX API + Web Scraping → Data Wrangling → EDA (SQL + Visualization) → Interactive Maps → ML Classification

Notebooks

Notebook Description
Data Collection API.ipynb Pulls Falcon 9 launch data from the SpaceX REST API, extracts rocket, launchpad, payload, and core information
Data Collection with Web Scraping.ipynb Scrapes historical Falcon 9 launch records from Wikipedia using BeautifulSoup
Data Wrangling.ipynb Cleans and merges API and scraped data, handles missing values, engineers the landing outcome label
EDA with SQL.ipynb Explores the dataset using SQL queries on IBM Db2 — launch sites, payload ranges, booster versions, and success rates
EDA with Data Visualization.ipynb Visual EDA with Matplotlib and Seaborn — launch success trends, payload vs orbit, launch site analysis
Interactive Visual Analytics with Folium.ipynb Interactive map of launch sites with Folium — success/fail markers, proximity analysis, and distance calculations
Machine Learning Prediction.ipynb Trains and tunes Logistic Regression, SVM, Decision Tree, and KNN classifiers; selects best model using GridSearchCV and confusion matrix analysis
spacex_dash_app.py Interactive Plotly Dash dashboard — pie chart of launch success by site, payload scatter plot with outcome coloring

Tech Stack

Area Tools
Data Collection SpaceX REST API, BeautifulSoup, requests
Data Processing pandas, numpy
SQL Analysis IBM Db2, sqlite3
Visualization Matplotlib, Seaborn, Plotly
Interactive Maps Folium
Dashboard Plotly Dash
Machine Learning scikit-learn (Logistic Regression, SVM, Decision Tree, KNN)

Key Findings

  • Launch site KSC LC-39A had the highest success rate among all sites
  • Heavier payloads (5,000–10,000 kg) showed higher success rates in certain orbits
  • Booster version is a strong predictor — later versions (v1.1, FT) significantly outperform earlier ones
  • Decision Tree and SVM achieved the best classification accuracy on the test set

Author

Gkeri Pepelasi

About

End-to-end data science capstone predicting SpaceX Falcon 9 landing success — API collection, SQL EDA, Folium maps, Plotly Dash, and ML classification

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors