中国科学技术大学学报 ›› 2017, Vol. 47 ›› Issue (8): 686-694.DOI: 10.3969/j.issn.0253-2778.2017.08.008

• 论著 • 上一篇    下一篇

一种基于Fisher比率和预测风险准则的电信客户流失预测分步特征选择方法

徐子伟,王鹏,陈宗海   

  1. 中国科学技术大学自动化系,安徽合肥 230027)
  • 收稿日期:2016-03-18 修回日期:2016-11-07 出版日期:2017-08-31 发布日期:2017-08-31

A two-stage feature selection method based on Fisher’s ratio and prediction risk for telecom customer churn prediction

XU Ziwei, WANG Peng, CHEN Zonghai   

  1. Department of Automation, University of Science and Technology of China, Hefei, 230027, China
  • Received:2016-03-18 Revised:2016-11-07 Online:2017-08-31 Published:2017-08-31
  • Contact: CHEN Zonghai
  • About author:XU Ziwei, male, born in 1986, PhD candidate. Research field: Prediction control. E-mail: xziwei@mail.ustc.edu.cn
  • Supported by:
    Supported by the National Natural Science Foundation of China ( 61375079).

摘要: 电信客户流失预测是电信运营商客户关系管理系统的一个重要问题,其目的是预测具有较高流失风险的客户.电信客户流失预测模型的构建过程包括数据预处理、不均衡处理、特征选择和分类器的训练与评估.针对电信数据集中存在的特征维度过高问题,结合过滤式特征选择和嵌入式特征选择方法的优点,提出了一种基于Fisher比率和预测风险准则的分步特征提取方法.结合真实数据集的实验结果表明,该方法能够减少特征维度,提高分类器的预测效果.

关键词: 大数据, 流失预测, 分步特征选择, Spark

Abstract: Telecom customer churn prediction is crucial to the customer relationship management systems of telecom operators. It aims to predict a particular customer who is at a high risk of churning. The predicting process includes the steps of data pre-processing, imbalance processing, feature selection, classifier training and evaluation. A two-stage feature selection method based on fisher’s ratio and prediction risk was proposed, which took advantage of the filter feature selection method and wrapper feature selection method to solve the high dimensionality problem of telecom customer churn prediction. The method was evaluated on a real-world dataset, and the experimental results verify that it is able to reduce feature dimensionality and improve the performance of classifiers.

Key words: big data, churn prediction, two stage feature selection, Spark

中图分类号: