Bankruptcy-Prediction

Since Lehman Brother bankruptcy catastrophe event during 2008 global financial crisis, estimating the advanced risk of corporate bankruptcies has been of large importance to creditors and investors. Despite being a relatively new research topic, in recent years, artificial intelligence and machine learning methods have achieved promising results in corporate bankruptcy prediction settings. In this research summer project, I created a new interesting machine learning model for predicting upcoming bankruptcies using around 46 years US Corporate Bankruptcy Dataset. After thorough cleaning and missing value imputation as well as feature engineering, our ﬁnal dataset finally contains 23320 observations with 210 features related to ﬁnancial, management statements from 93837 observations with 15 features. I performed my analysis based on nine different machine learning techniques (Logistic Regression, KNN, SVM, Naïve Bayes, Decision Tree, Random Forest, AdaBoost, XgBoost, CatBoost) on the dataset. For evaluation, I have used Accuracy Scores, ROC-AUC Curve, Confusion Matrix, Precision, Recall as well as F Score and Cumulative Gain Chart. The best models came out to be Random Forest, XgBoost, AdaBoost, CatBoost and Decision Trees. After applying an Ensemble Voting Method on top 5 algorithms, it votes for Random Forest and the boosting algorithms to be the best two predictors on both training and testing data of bankruptcy cases, thus reducing over fitting problem. To crosscheck over fitting, we used the cross-validation method to find CV mean scores of algorithms. Again, Random Forest & Gradient Boosting algorithms topped the list. Both of the algorithms yields an overall accuracy of ∼93%, training data accuracy of ∼99% and class independent test accuracy of∼92% on the balanced imputed and feature engineered dataset, which I over sampled using SMOTE along with dimension reductions using PCA even before performing the Machine Learning models to achieve better accuracies. My final model is finally able to correctly predict all 8033 bankrupt ﬁrms correctly and 17208 non-bankrupt ﬁrms correctly out of 17217. Only 9 corporations out of all, have been misclassified as bankrupt when they are actually not. The results, I obtained from KNN, SVM, Naïve Bayes and Logistic Regression take lots of time to train and do not perform good on test datasets as well as training the model, despite having few good accuracies, when dataset is really large. Hence these models are not recommended for any kind of Corporate Bankruptcy or financial predictions. Furthermore, I found that our model assigns importance to few of the individual components of their ratios, in particular, components related to asset, liquidity, profitability and productivity.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bankruptcy-Prediction

About

Uh oh!

Releases

Packages

Shuvamjoy34/Bankruptcy-Prediction

Folders and files

Latest commit

History

Repository files navigation

Bankruptcy-Prediction

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages