Multivariable data imputation for the analysis of incomplete credit data

2020
Missing data significantly reduce the accuracy and usability of credit scoring models, especially in multi-variate missing cases. Most credit scoring models address this problem by deleting the missing instances from the dataset or imputing missing values with the mean, mode, or regression values. However, these methods often result in a significant loss of information or a bias. We proposed a novel method called BNII to impute missing values, which can be helpful for intelligent credit scoring systems. The proposed BNII algorithm consisted of two stages: the preparatory stage and the imputation stage. In the first stage, a Bayesian network with all of the attributes in the original dataset was constructed from the complete dataset so that both the network structure that implied the dependencies between variables and the parameters at each variable's conditional distributions could be learned. In the second stage, multivariables with missing values were iteratively imputed using Bayesian network models from the first stage. The algorithm was found to be monotonically convergent. The most significant advantages of the method include, it exploits the inherent probability-dependent relationship between variables, but without a specific probability distribution hypothesis, and it is suitable for multi-variate missing cases. Three datasets were used for experiments: one was the real dataset from a famous P2P financial company in China, and the other two were benchmark datasets provided by UCI. The experimental results showed that BNII performed significantly better than the other well-known imputation techniques. This suggested that the proposed method can be used to improve the performance of a credit scoring system and to be extended to other expert and intelligent systems. (C) 2019 Elsevier Ltd. All rights reserved.
EXPERT SYSTEMS WITH APPLICATIONS
卷号:141
ISSN:0957-4174
收录类型
SSCI
发表日期
2020
学科领域
循证管理学
国家
中国
语种
英语
DOI
10.1016/j.eswa.2019.112926
其他关键词
MISSING VALUE IMPUTATION; MODEL; PREDICTION; ALGORITHM; CLASSIFICATION; VALUES
EISSN
1873-6793
资助机构
National Natural Science Foundation of ChinaNational Natural Science Foundation of China (NSFC) [71871090, 71301047]; Science Foundation of Ministry of Education of ChinaMinistry of Education, China [18YJAZH038]; Hunan Provincial Science & Technology Major Project [2018GK1020]; Xinjiang Uygur Autonomous Region research fund; Deakin University ASL 2019 fund
资助信息
This research was supported by the National Natural Science Foundation of China (Nos. 71871090, 71301047), the Science Foundation of Ministry of Education of China (18YJAZH038), the Hunan Provincial Science & Technology Major Project (2018GK1020), Xinjiang Uygur Autonomous Region research fund and Deakin University ASL 2019 fund. We thank LetPub (www.letpub.com) for its linguistic assistance during the preparation of this manuscript.
被引频次(WOS)
6
被引更新日期
2022-01
来源机构
Hunan University Deakin University Chinese Academy of Sciences Xinjiang Technical Institute of Physics & Chemistry, CAS
关键词
Bayesian network Credit scoring Data missing Data mining