Journal of University of Science and Technology of China ›› 2016, Vol. 46 ›› Issue (1): 66-75.DOI: 10.3969/j.issn.0253-2778.2016.01.009

• Original Paper • Previous Articles    

Spark/Shark-based OLAP system for smart grid applications

WANG Yaling, LIU Yue, HONG Jianguang, CUI Wei LI Yanhu, SU Yipeng, HUANG Gaopan, ZHANG Mingming, LIU Wantao   

  1. 1. State Grid Information & Telecommunication Group Co. Ltd., Beijing 100761, China; 2. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China; 3. State Grid Zhejiang Electric Power Company, Hangzhou 310007, China; 4. State Grid Jiangsu Electric Power Company Information &Telecommunication branch, Nanjing 210029, China
  • Received:2015-08-27 Revised:2015-09-29 Accepted:2015-09-29 Online:2015-09-29 Published:2015-09-29

Abstract: The OLAP queries on electricity consumption information in Smart Grid have some prominent features: huge amounts of data, involving multiple tables in a joint operation, complex SQL structure, etc. Faced with this kind of applications, traditional RDBMS always leads to poor scalability, low write throughput, and unacceptable query performance, etc. A Spark/Shark-Based OLAP system for electricity consumption information in smart grid was designed. The system used distributed file system HDFS for data storage, and makes use of Shark to parse the SQL queries and Spark to execute them. However, Shark does not support fine-grained index, which hinders further improvement of query performance. To overcome this limitation, a Trie tree based fine-grained index technique TrieIndex and data re-organization scheme for better query performance was proposed. The experiment results with real electricity consumption information data and query show that the write throughput of the system is 12 times faster than that of RDBMS, and the query efficiency of the system is 10 times greater than that of original Shark.

Key words: Spark, OLAP, power big data, index, Trie tree

CLC Number: