Journal of University of Science and Technology of China ›› 2019, Vol. 49 ›› Issue (12): 974-984.DOI: 10.3969/j.issn.0253-2778.2019.12.004

Previous Articles     Next Articles

A new random projection-based ensemble classifier for high-dimensional data

CUI Wenquan   

  1. Department of Statistics and Finance, Shool of Management, University of Science and of Technology of China, Hefei 230026, China
  • Received:2019-04-14 Revised:2019-05-23 Online:2019-12-31 Published:2019-12-31

Abstract: A decision tree ensemble method based on random projection(projection forest, PJForest) was proposed to solve the classification problem of high-dimensional data. This method used the decision tree as the base classifier and reduced the dimensionality of the data by using a series of random projections. Then based on dimensionally reduced data, a series of decision trees were constructed, and then the ensemble classifier was constructed through ensemble learning. Using appropriate random projection to reduce the dimensionality of the data can preserve the information contained in the geometric structure of the data. Moreover, perturbation of raw data through random projection can enrich the diversity of decision trees. After proper ensemble learning, it can effectively overcome the influence of noise and improve the generalization ability of PJForest. The limiting property of PJForest generalization error was proved and the convergence rate of generalization error under certain conditions was obtained. Many simulation studies were conducted and empirical studies on real life data were empirically analyzed. The simulation results showed that the method of PJForest can effectively classify high dimensional data with a large amount of noises, and has better properties than current classification methods such as random forest, Xgboost.

Key words: decision tree, diversity, high-dimensional classification, ensemble learning, random projection