中国科学技术大学学报 ›› 2016, Vol. 46 ›› Issue (3): 173-179.DOI: 10.3969/j.issn.0253-2778.2016.03.001

• 论著 •    

一种新的软聚类投票法及其并行化实现

张静静,杨燕*,王红军,韩晓涛,邓强   

  1. 西南交通大学信息科学与技术学院, 成都 611756
  • 收稿日期:2015-08-27 修回日期:2015-09-29 接受日期:2015-09-29 出版日期:2015-09-29 发布日期:2015-09-29
  • 通讯作者: 杨燕
  • 作者简介:张静静,女,1989年生,硕士. 研究方向:数据挖掘、云计算. E-mail: youyouzhangjing@yeah.net.
  • 基金资助:
    国家自然科学基金项目(Nos. 61134002,61170111,61572407)资助.

A novel voting method and parallel implementation for soft clustering

ZHANG Jingjing,YANG Yan*,WANG Hongjun, HAN Xiaotao,DENG Qiang   

  1. School of Information Science and Technology, Southwest Jiaotong University, Chengdu 611756
  • Received:2015-08-27 Revised:2015-09-29 Accepted:2015-09-29 Online:2015-09-29 Published:2015-09-29

摘要: 聚类集成作为数据挖掘的重要应用工具,得到了广泛的认可和研究.本文在投票法的基础上提出一种新的软聚类投票 (VMSC)算法.算法首先求取平均隶属度矩阵,然后进行迭代优化.该算法能够消除噪声点影响,具有很好的稳定性.Spark云计算平台能够高效处理大数据.为了提出的算法处理大数据,在Spark云计算平台上实现并行的VMSC算法.VMSC算法实验用12组UCI数据集进行验证,并与sCSPA、sMCLA、sHGBF及SVCE等软聚类算法进行对比.结果表明,VMSC算法对软聚类算法具有较好的集成效果.在Spark云计算平台上对VMSC算法并行实现.实验表明,该算法具有较理想的并行效果,能够有效处理大数据.

关键词: 软聚类集成, 投票, 云计算, 大数据

Abstract: As an important tool of Data Mining, clustering ensemble has been widely recognized and studied. This paper proposes a novel voting method for Soft Clustering(VMSC). The ensemble process consists of two steps: calculating the average degree of membership matrix as the input of the second step, and iterative optimization. This method deals well with eliminating the influences of noise and has good stability. The cloud computing platform of Spark handles big data efficiently. The VMSC algorithm was parallelizod to make it suitable for big data on Spark Cloud Computing platform. In the VMSC experiments, 12 UCI datasets were used to test it, and its results were compared with 4 other soft clustering ensemble algorithms: sCSPA, sMCLA, sHGBF and SVCE. The experiments indicate that the VMSC algorithm has a better integration effect. And the parallel experiments show that its parallel implementation manages big data efficiently.

Key words: soft clustering ensemble, voting, cloud computing, big data

中图分类号: