Journal of University of Science and Technology of China ›› 2016, Vol. 46 ›› Issue (3): 188-199.DOI: 10.3969/j.issn.0253-2778.2016.03.003

• Original Paper • Previous Articles    

BDCode: An erasure code algorithm for big data storage systems

YIN Chao, WANG Jianzong, LV Haitao, CUI Zongmin, CHENG Lianglun, LI Tongfang, LIU Yan   

  1. 1. School of Information Science and Technology, Jiujiang University, Jiujiang 332005, China; 2.School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China; 3.Ping An Technology (Shenzhen) Co., Ltd, Shenzhen 518029, China
  • Received:2015-08-27 Revised:2015-12-01 Accepted:2015-12-01 Online:2015-12-01 Published:2015-12-01

Abstract: An optimized algorithm, based on erasure coding technology towards the big data storage system that contains a lot of data, was proposed. By studying existing coding technologies and big data systems, this algorithm, named BDCode (big data code) can not only protect system reliability, but also improve the security and the utilization of storage space. Due to the high reliability and space saving rate of coding technology, coding mechanisms were introduced into big data systems. The storage nodes are divided into many virtual nodes to realize load balancing. By setting different virtual nodes’ storage groups for different codec servers , ensure system availability. And by using the parallel decoding computing of the nodes and the block of data, we can be ensured the recovery efficiency of the system can be proved when data is corrupted. Additionally, different users setting different coding parameters can improve the robustness of big data storage systems. We configured various data block m and calibration block k to improve the utilization rate in the quantitative experiments. The results show that parallel decoding speed can be nearly two times faster than the past serial decoding speed. The encoding efficiency with BDCode coding is, on average, 36.1% higher than using CRS and 58.2% higer than using RS coding. The decoding rate by using BDCode averages 19.3% higher than using CRS and 33.1% higher than using RS.

Key words: distributed storage system, erasure coding, big data, robustness, availability, cloud storage

CLC Number: