Course project for Math 8670. Data from kaggle.
Folder used to store various datasets used in the project.
Consumes bureau_balance and bureau csvs and productes applicant_default_count.csv, which contains the total default count for applicants with a history in the bureau data.
Consumes application_train.csv and applicant_default_count.csv and produces final_dataset.csv. This has code that removes unnecessary columns, creates new features, and imputes missing data.
Initial look at data. Produces sweetwiz report (report.html).
Sweetviz report for clean data. Created in data_pipeline.ipynb.
Uses python mito library to generate a variety of visualizations.
Contains various visualization describing relationship of continuous variables and default rate.
Implementation of weighted random forest model.
Implementation of balanced random forest model. Also contains model explanations.
Implementation of explainable boosting machine. Also contains model explanations.
Notebook that was converted to the slides (project_presentation.slides.html) used in class presentation.
Notebook that was converted into the final report (project_report.html).