Skip to content

Since Lehman Brother bankruptcy catastrophe event during 2008 global financial crisis, estimating the advanced risk of corporate bankruptcies has been of large importance to creditors and investors. Despite being a relatively new research topic, in recent years, artificial intelligence and machine learning methods have achieved promising results…

Notifications You must be signed in to change notification settings

Shuvamjoy34/Bankruptcy-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Bankruptcy-Prediction

Since Lehman Brother bankruptcy catastrophe event during 2008 global financial crisis, estimating the advanced risk of corporate bankruptcies has been of large importance to creditors and investors. Despite being a relatively new research topic, in recent years, artificial intelligence and machine learning methods have achieved promising results in corporate bankruptcy prediction settings. In this research summer project, I created a new interesting machine learning model for predicting upcoming bankruptcies using around 46 years US Corporate Bankruptcy Dataset. After thorough cleaning and missing value imputation as well as feature engineering, our final dataset finally contains 23320 observations with 210 features related to financial, management statements from 93837 observations with 15 features. I performed my analysis based on nine different machine learning techniques (Logistic Regression, KNN, SVM, Naïve Bayes, Decision Tree, Random Forest, AdaBoost, XgBoost, CatBoost) on the dataset. For evaluation, I have used Accuracy Scores, ROC-AUC Curve, Confusion Matrix, Precision, Recall as well as F Score and Cumulative Gain Chart. The best models came out to be Random Forest, XgBoost, AdaBoost, CatBoost and Decision Trees. After applying an Ensemble Voting Method on top 5 algorithms, it votes for Random Forest and the boosting algorithms to be the best two predictors on both training and testing data of bankruptcy cases, thus reducing over fitting problem. To crosscheck over fitting, we used the cross-validation method to find CV mean scores of algorithms. Again, Random Forest & Gradient Boosting algorithms topped the list. Both of the algorithms yields an overall accuracy of ∼93%, training data accuracy of ∼99% and class independent test accuracy of∼92% on the balanced imputed and feature engineered dataset, which I over sampled using SMOTE along with dimension reductions using PCA even before performing the Machine Learning models to achieve better accuracies. My final model is finally able to correctly predict all 8033 bankrupt firms correctly and 17208 non-bankrupt firms correctly out of 17217. Only 9 corporations out of all, have been misclassified as bankrupt when they are actually not. The results, I obtained from KNN, SVM, Naïve Bayes and Logistic Regression take lots of time to train and do not perform good on test datasets as well as training the model, despite having few good accuracies, when dataset is really large. Hence these models are not recommended for any kind of Corporate Bankruptcy or financial predictions. Furthermore, I found that our model assigns importance to few of the individual components of their ratios, in particular, components related to asset, liquidity, profitability and productivity.

About

Since Lehman Brother bankruptcy catastrophe event during 2008 global financial crisis, estimating the advanced risk of corporate bankruptcies has been of large importance to creditors and investors. Despite being a relatively new research topic, in recent years, artificial intelligence and machine learning methods have achieved promising results…

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published