Journal of University of Science and Technology of China ›› 2016, Vol. 46 ›› Issue (3): 173-179.DOI: 10.3969/j.issn.0253-2778.2016.03.001

• Original Paper •    

A novel voting method and parallel implementation for soft clustering

ZHANG Jingjing,YANG Yan*,WANG Hongjun, HAN Xiaotao,DENG Qiang   

  1. School of Information Science and Technology, Southwest Jiaotong University, Chengdu 611756
  • Received:2015-08-27 Revised:2015-09-29 Accepted:2015-09-29 Online:2015-09-29 Published:2015-09-29

Abstract: As an important tool of Data Mining, clustering ensemble has been widely recognized and studied. This paper proposes a novel voting method for Soft Clustering(VMSC). The ensemble process consists of two steps: calculating the average degree of membership matrix as the input of the second step, and iterative optimization. This method deals well with eliminating the influences of noise and has good stability. The cloud computing platform of Spark handles big data efficiently. The VMSC algorithm was parallelizod to make it suitable for big data on Spark Cloud Computing platform. In the VMSC experiments, 12 UCI datasets were used to test it, and its results were compared with 4 other soft clustering ensemble algorithms: sCSPA, sMCLA, sHGBF and SVCE. The experiments indicate that the VMSC algorithm has a better integration effect. And the parallel experiments show that its parallel implementation manages big data efficiently.

Key words: soft clustering ensemble, voting, cloud computing, big data

CLC Number: