Journal of University of Science and Technology of China ›› 2017, Vol. 47 ›› Issue (4): 358-368.DOI: 10.3969/j.issn.0253-2778.2017.04.011

• Original Paper • Previous Articles    

BDAP: A data mining platform based on Spark

BU Yao, WU Bin, CHEN Yufeng, BAI Demeng   

  1. 1. Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia, Beijing University of Posts and Telecommunications, Beijing 100876, China;
    2. School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China;
    3. State Grid Shandong Electric Power Research Institute, Jinan 250000, China
  • Received:2016-08-28 Revised:2017-12-08 Online:2017-04-30 Published:2017-04-30

Abstract: Large data processing system has become a hot spot research issue in the field of large data. First of all, The data analysis platform architecture and the function was analyzed, dividing it into the data source layer, data absorption layer, data storage layer, data platform layer, security and monitoring layer, equipment layer and application layer. Platform includes multiple data preprocessing and algorithm modules. The platform architecture provided a foundation for the big data analysis. The platform comprehensively features which can be freely combined. The coupling degree between the modules is low, which is convenient for maintenance and further development. From the user's point of view, the adjustment of parameters, the establishment of the process, monitoring, and data mining process are all visual, and workflow and scheduling stream technology are available. Terms of performance, the BDAP algorithm works better than Hive and MLlib. Finally, an example illustrates the application scenarios of this data mining platform. After analyzing the circuit fault and meteorological data, faults can be predicted and classified. Also video mining can be used to get useful information.

Key words: big data analysis framework, Hadoop, Storm, Spark, batch processing, data mining

CLC Number: