中国科学技术大学学报 ›› 2018, Vol. 48 ›› Issue (6): 447-457.DOI: 10.3969/j.issn.0253-2778.2018.06.003

• 论著 • 上一篇    下一篇

基于模拟退火半监督学习的信用预测研究

张杰,李琳,朱阁   

  1. 武汉理工大学计算机科学与技术学院,湖北武汉 430070
  • 收稿日期:2017-09-09 修回日期:2018-04-10 接受日期:2018-04-10 出版日期:2018-06-30 发布日期:2018-04-10
  • 通讯作者: 李琳
  • 作者简介:张杰,男1993年生,硕士生.研究方向:机器学习.E-mail: icecream@whut.edu.cn
  • 基金资助:
    国家社会科学基金(15BGL048),武汉理工大学科研基金(2017II39GX),武汉理工大学研究生优秀学位论文培育项目(2016-YS-068),湖北省科技支撑计划(研发与示范)2015BAA072)资助.

Simulated annealing based semi-supervised support vector machine for credit prediction

ZHANG Jie,LI Lin, ZHU Ge   

  1. School of Computer Science & Technology, Wuhan University of Technology,Wuhan 430070, China
  • Received:2017-09-09 Revised:2018-04-10 Accepted:2018-04-10 Online:2018-06-30 Published:2018-04-10

摘要: 金融机构结合消费者和商业信息来为企业进行信用打分.我国的企业特别是小微企业信用信息少,造成了只有少量企业拥有信用信息,而大量企业没有信用信息的局面.半监督支持向量机可以利用标记数据和未标记数据进行学习,同时可以克服信用数据类别不均衡和样本信息不足等问题.由于半监督支持向量机的参数对算法效果有较大影响,实际参数选取往往根据经验所得.为此提出了一种利用模拟退火(SA)优化基于确定性退火半监督支持向量机(DAS3VM)参数的SAS3VM算法.该算法在少量有标记信用数据的基础上,利用大量无标记信用数据辅助学习,使用模拟退火寻找最优参数.最后在两组企业信用数据集和三组个人信用数据集上进行对比实验,结果表明,半监督学习方法(DAS3VM和SAS3VM)优于监督学习方法,SAS3VM在准确率上比DAS3VM最大提升了13.108%.

关键词: 半监督学习, 确定性退火, 模拟退火, 信用预测

Abstract: In the mid-1990s financial institutions began to combine consumer and business information to create scores for business credits. Enterprises in China, especially small and micro enterprises, have less credit information, resulting in the situation where only a small number of enterprises have credit information, while a large number of enterprises have none. However, semi-supervised support vector machines (S3VM) can learn from labeled data and unlabeled data and solve the problems of imbalanced credit data categories and insufficient sample information. The parameters of S3VM have a great influence on the effect of the algorithm, and the actual parameter selection is often based on experience. An SAS3VM algorithm was proposed to optimize the parameters of deterministic annealing based semi-supervised support vector machine (DAS3VM) with simulated annealing. Based on the small number of labeled credit data, the algorithm takes advantage of the unlabeled credit data to help study and use the simulate annealing to find the optimal parameters. Experiments were conducted on two categories of enterprise credit data and three categories of personal credit data. The results show that semi-supervised learning (DAS3VM and SAS3VM) performs better than supervised learning. The maximum accuracy of SAS3VM has been increased by 13.108% compared with DAS3VM.

Key words: semi-supervised learning, deterministic annealing, simulated annealing, credit prediction

中图分类号: