Journal of University of Science and Technology of China ›› 2019, Vol. 49 ›› Issue (10): 842-850.DOI: 10.3969/j.issn.0253-2778.2019.10.010

• Original Paper • Previous Articles     Next Articles

Parallel ISOMAP algorithm based on Spark

SHI Lukui, GUO Linlin, FANG Zizhe, ZHANG Jun   

  1. 1. School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China;
  • Received:2019-05-15 Revised:2019-08-26 Accepted:2019-08-26 Online:2019-10-31 Published:2019-08-26

Abstract: To reduce the dimension of the nonlinear high-dimensional data in the big data environment, a parallel ISOMAP algorithm based on Spark is proposed, where a Spark-based parallel block Davidson method is designed and implemented to quickly solve eigenvalues and eigenvectors of the large scale matrices. Simultaneously, a row-block matrix multiplication strategy based on RDD partition is proposed for the difficulty of computation and transmission of the large scale matrices, which converts the matrix rows in each partition into block matrices. The row-block matrices are not restricted by the map operator to RDD calculation one by one, and can treat operations at the matrix level by using linear algebraic Library in Spark. The experimental results show that the row-block matrix multiplication strategy effectively improves the efficiency of matrix operations; the parallel block Davidson method can quickly solve the eigenvalues and eigenvectors of the large scale matrices and effectively improve the performance of parallel ISOMAP algorithm; and the parallel ISOMAP algorithm can adapt to dimensionality reduction in the big data environment.

Key words: ISOMAP, row-block matrix, block Davidson method, Spark

CLC Number: