中国科学技术大学学报 ›› 2020, Vol. 50 ›› Issue (7): 968-976.DOI: 10.3969/j.issn.0253-2778.2020.07.014

• 论著 • 上一篇    下一篇

不均衡数据情形的基于聚焦损失的CGAN的集成分类方法

崔文泉,余厚莹,侯晓天   

  1. 中国科学技术大学管理学院统计与金融系,安徽合肥 230026
  • 收稿日期:2020-05-25 修回日期:2020-06-27 接受日期:2020-06-27 出版日期:2020-07-31 发布日期:2020-06-27
  • 通讯作者: 崔文泉
  • 作者简介:崔文泉(通讯作者),男,1964年生,博士/副教授.研究方向:数理统计.E-mail: wqcui@ustc.edu.cn
  • 基金资助:
    国家自然科学基金(71873128)资助.

Focused loss-based for imbalanced data scenarios integrated classification methods for CGAN

  1. CUI Wenquan, YU Houying, HOU Xiaotian
  • Received:2020-05-25 Revised:2020-06-27 Accepted:2020-06-27 Online:2020-07-31 Published:2020-06-27

摘要: 针对非均衡数据的情形,基于条件生成对抗网络(conditional generative adversarial networks,CGAN),利用梯度提升树研究了聚焦损失的CGAN的集成分类方法.该方法首先通过CGAN降低不均衡率,通过聚焦损失的权值均衡结合GBDT算法,适当增加对少数类样本的关注度进而进一步提升分类器的分类性能.对方法的性质进行了研究,获得了若干理论成果.证明了:在一定条件下,由CGAN产生的经验条件分布收敛于相应总体的条件分布;聚集损失的CGAN方法其经验风险收敛到期望风险;该方法的估计量会收敛到使得期望风险最小化的函数.实验结果显示了聚焦损失的CGAN方法具有良好的表现.

关键词: 非均衡数据, 条件生成对抗网络, 聚焦损失, 集成学习

Abstract: For the case of imbalanced data, an integrated classification method for CGAN-focal-loss was investigated based on conditional generative adversarial networks (CGAN) using gradient boosting trees. The method first reduces the imbalance rate by CGAN, and further improves the classification performance of the classifier by increasing the focus on a few classes of samples through the weight balancing of the focused loss combined with the GBDT algorithm. The properties of the method were investigated and several theoretical results were obtained. It was proved that the empirical conditional distribution generated by CGAN converges to the conditional distribution of the corresponding aggregate under certain conditions; that the empirical risk of the CGAN method with focused loss converges to the expected risk; and that the estimator of the method converges to the function that minimizes the expected risk. The experimental results show the good performance of the CGAN-focal-loss method.

Key words: imbalanced data, conditional generative adversarial networks(CGAN), focal loss, ensemble learning

中图分类号: