Register for our upcoming hands-on workshop on LangChain | June 17th | Virtual

Default Rate Prediction Models for Self- employment in Korea using Ridge, Random Forest and Deep Neural Network

Author(s): Dongsuk Hong, Hanjong Baeck

Abstract

This study introduces machine learning (ML) and deep learning (DL) models for predicting self-employment default rates using credit information. Most preceding studies regarding corporate credit risk often focus on bankruptcy prediction models, which involve and target list companies, where they utilize financial information as the main variables and also use macro-economic information as auxiliary variables. However, bankruptcy prediction models are difficult to apply to cases where financial information is insufficient, such as small-and-medium enterprise (SME) and self-employment businesses. In addition, there hardly exist studies on the prediction of corporate default rates by industry and also very limited. We hereby used micro-level variables that were processed by analysis of credit information such as loans and overdue history of individual businesses in the Korean manufacturing sector from April 2014 through June 2019, together with typical macro-economic ones, such that we reach to achieve performance enhancement in predicting default rates. We then evaluated the effect of the algorithms such as Ridge, Random Forest (RF), and Deep Neural Network (DNN) make on the performance of the proposed model, i.e. default-rates prediction model for self-employment. In this study, the DNN model is implemented for two purposes, where a submodel for the selection of credit information variables, and it also works for cascading to the final model that predicts default rates by receiving the selected input variables. Each consists of 2 and 3 hidden layers, respectively, and each layer again consists of 5 nodes. The activation function, solver and learning rate were determined through hyper-parameter tuning. As a result, when the credit information variable was used together with the macro-economic variable, the prediction performance was increased by 3.48% points (R2=0.981), compared to the Ridge model using only macro-economic variables, and the DNN performance of the final model was increased by 4.74% points (R2=0.993).