中国科学技术大学学报 ›› 2015, Vol. 45 ›› Issue (1): 61-68.DOI: 10.3969/j.issn.0253-2778.2015.01.010

• 论著 • 上一篇    

一种基于类别不平衡数据的层次分类模型

施培蓓,刘贵全,汪中,卫兵   

  1. 1.合肥师范学院公共计算机教学部,安徽合肥 230601;2.中国科学技术大学计算机学院,安徽合肥 230027; 3.中国电子科技集团公司第三十八研究所数字技术部,安徽合肥 230088
  • 收稿日期:2014-06-09 修回日期:2014-07-29 接受日期:2014-07-29 出版日期:2014-07-29 发布日期:2014-07-29
  • 通讯作者: 施培蓓
  • 作者简介:施培蓓(通讯作者),女,1984年生,硕士/讲师. 研究方向:数据挖掘、机器学习. E-mail: pb_shi@163.com
  • 基金资助:
    国家科技支撑计划(2012BAH17B03),安徽省自然科学基金(1408085MF131),安徽省高等学校自然科学项目(KJ2013B212),合肥师范学院魂芯DSP产业化研究院开放课题资助.

A hierarchical classification model for class-imbalanced data

SHI Peibei, LIU Guiquan, WANG Zhong, WEI Bing   

  1. 1.Department of Public Computer Teaching, Hefei Normal University, Hefei 230601, China; 2.School of Computer Science and Technology, University of Science and Technology of China,Hefei 230027, China; 3.Department of Digital Technology, No.38 Research Institute of CETC, Hefei 230088, China
  • Received:2014-06-09 Revised:2014-07-29 Accepted:2014-07-29 Online:2014-07-29 Published:2014-07-29

摘要: 传统的机器学习方法在处理类别不平衡数据时分类性能较低,为此提出一种基于类别不平衡数据的层次分类模型.层次分类模型采用AdaBoost方法为基准分类器,以分类器误报率和特征建立数学模型,并证明层次分类模型的参数可以计算得到.首先以层次分类树为结构建立模型,接着针对层次分类树的结构模型进行分类代价计算,得到模型的代价与每层特征之间的定量数学描述,然后将该分类代价转换为优化问题并给出优化问题的求解过程,同时给出层次分类模型的计算结果.在UCI数据集上进行大量测试,以AUC和F-Measure为评价标准,相比于现有的不平衡分类方法,层次分类模型具有更优的分类性能.

关键词: 机器学习, 类别不平衡, 层次分类, 特征, 评价标准

Abstract: Traditional machine learning methods have lower classification performance when dealing with class imbalanced data. A hierarchical classification model for class imbalanced data was thus proposed. With an AdaBoost classifier as its basis classifier, the model builds mathematical models by the features and false positive rates of the classifier, and demonstrates that parameters of the hierarchical classification model could be calculated. First, the hierarchical classification tree was as the structure, and then the classification cost of the hierarchical classification tree mode was obtained as well as a quantitative and mathematical description of the features of each layer. Finally, the classification cost could be converted to a optimization problem, and the solving process of the optimization problem was given. Meanwhile, results of the hierarchical classification are presented. Experiments have been conducted on UCI dataset, and the results show that the proposed method has higher AUC and F-measure compared to many existing class-imbalanced learning methods.

Key words: machine learning, class-imbalanced, hierarchical classification, feature, evaluation criteria

中图分类号: