中国科学技术大学学报 ›› 2019, Vol. 49 ›› Issue (12): 965-973.DOI: 10.3969/j.issn.0253-2778.2019.12.003

• 原创论文 • 上一篇    下一篇

一种处理非均衡数据的非迭代核逻辑回归方法

崔文泉   

  1. 中国科学技术大学管理学院统计与金融系,安徽合肥 230026
  • 收稿日期:2019-04-14 修回日期:2019-05-22 出版日期:2019-12-31 发布日期:2019-12-31
  • 通讯作者: 崔文泉
  • 作者简介:崔文泉(通讯作者),男, 1964年生,博士/副教授. 研究方向:数理统计. E-mail: wqcui@ustc.edu.cn
  • 基金资助:
    国家自然科学基金(71873128),安徽省自然科学基金(1308085MA02)资助.

A non-iterative approach to kernel logistic regression for imbalanced data

CUI Wenquan   

  1. Department of Statistics and Finance, School of Management, University of Science and of Technology of China,Hefei 230026, China
  • Received:2019-04-14 Revised:2019-05-22 Online:2019-12-31 Published:2019-12-31

摘要: 针对严重非均衡数据提出一种非迭代核逻辑回归的学习方法.该方法是对经典处理核逻辑回归的迭代加权最小二乘方法的一种改进,不仅减轻了由于迭代所造成的运算负担,而且在模型训练中利用了基准的类别占比信息,避免了使用诸如欠抽样、过抽样、代价敏感学习等通常处理非均衡数据的方式所导致的问题,使得在数据规模大的非均衡数据情形下,可以方便快捷地对核逻辑回归进行建模,构造具有稳健性的修正最小二乘逻辑回归分类器.理论研究表明,所提方法具有一定的优良性质,模拟研究及实证分析显示其分类效果良好.

关键词: 核逻辑回归, 非迭代方法, 非均衡数据, 迭代加权最小二乘, 稳健

Abstract: A non-iterative kernel logistic regression learning method for severely imbalanced data was proposed. The method is an improvement on the iterative weighted least squares method for classical kernel logistic regression. It not only reduces the computational burden caused by iteration, but also utilizes the knowledge of the ratio of the benchmark category, and can avoid problems normally encountered when processing imbalanced data such as undersampling, oversampling and cost-sensitive learning. Thus, this method enables the efficient and fast modelling of kernel based logistic regression in the case of large-scale imbalanced data, through the construction of a robust modified least square logistic classifier. Theoretical research indicates that the proposed method has some excellent properties, and simulation research and empirical studies show that its classification effect is good.

Key words: kernel logistic regression, non-iterative approach, imbalanced data, iterative re-weighted least squares, robustness