NUK - logo
E-resources
Full text
Peer reviewed
  • Financial distress predicti...
    Qian, Hongyi; Wang, Baohui; Yuan, Minghe; Gao, Songfeng; Song, You

    Expert systems with applications, 03/2022, Volume: 190
    Journal Article

    Corporate financial distress prediction research has been ongoing for more than half a century, during which many models have emerged, among which ensemble learning algorithms are the most accurate. Most of the state-of-the-art methods of recent years are based on gradient boosted decision trees. However, most of them do not consider using feature importance for feature selection, and a few of them use the feature importance method with bias, which may not reflect the true importance of features. To solve this problem, a heuristic algorithm based on permutation importance (PIMP) is proposed to modify the biased feature importance measure in this paper. This method ranks and filters the features used by machine learning models, which not only improves accuracy but also makes the results more interpretable. Based on financial data from 4,167 listed companies in China between 2001 and 2019, the experiment shows that compared with using the random forest (RF) wrapper method alone, the bias in feature importance is indeed corrected by combining the PIMP method. After the redundant features are removed, the performance of most machine learning models is improved. The PIMP method is a promising addition to the existing financial distress prediction methods. Moreover, compared with traditional statistical learning models and other machine learning models, the proposed PIMP-XGBoost offers higher prediction accuracy and clearer interpretation, making it suitable for commercial use. •The model Combines a corrected feature selection measure and XGBoost.•Permutation importance can correct the bias of feature importance.•The model is validated on Chinese listed companies datasets over five metrics.•The model is proved to outperform several benchmark techniques.•The feature importance and partial dependence plot enhance model interpretation.