Journal of University of Science and Technology of China ›› 2017, Vol. 47 ›› Issue (4): 350-357.DOI: 10.3969/j.issn.0253-2778.2017.04.010

• Original Paper • Previous Articles     Next Articles

Ensembles of classifier chains for multi-label classification based on Spark

WANG Jin, WANG Hong, XIA Cuiping, OUYANG Weihua, CHEN Qiaosong, DENG Xin   

  1. Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
  • Received:2016-08-28 Revised:2016-12-08 Online:2017-04-30 Published:2017-04-30

Abstract: With the wide application of data mining technology, multi-label learning has become a hot topic in the data mining domain. Although ensembles of classifier chains (ECC) algorithm is a multi-label learning method which is effective and accurate, its complexity of time and space is so high that it cannot adapt to the large-scale multi-label classification tasks. A new algorithm named Spark ensembles of classifier chains(S-ECC) was proposed based on Spark platform on which a parallel implementation was conducted of each step of the sequential ECC algorithm. The test results in stand-alone and cluster environments show that S-ECC has a good adaptability to large-scale data with a high speedup, and that it is no less capable than the traditional sequential program.

Key words: multi-label learning, ECC, Apache Spark, MapReduce

CLC Number: