Journal of University of Science and Technology of China ›› 2018, Vol. 48 ›› Issue (4): 331-340.DOI: 10.3969/j.issn.0253-2778.2018.04.009

• Original Paper • Previous Articles     Next Articles

Mechanism analysis of the accelerator for k-nearest neighbor algorithm based on data partition

SONG Yunsheng,WANG Jie, LIANG Jiye,   

  1. 1. School of Computer & Information Technology, Shanxi University, Taiyuan 030006;
  • Received:2017-09-19 Revised:2018-04-11 Online:2018-04-30 Published:2018-04-30

Abstract: Due to its absence of hypotheses for the underlying distributions of data, simple execution and strong generation ability, k-nearest neighbor classification algorithm (kNN) is widely used in face recognition, text classification, emotional analysis and other fields. kNN does not need the training process, but it only stores the training instances until the unlabeled instance appears, and executes the predicted process. However, kNN needs to compute the similarity between the unlabeled instance and all the training instances, hence it is difficult to deal with large-scale data. To overcome this difficulty, #br##br# the process of computing the nearest neighbors is converted to a constrained optimization problem, and an estimation is given of difference on the value of the objective function under the optimal solution with and without data partition. The theoretical analysis of this estimation indicates that data partition using clustering can reduce this difference, and the k-nearest neighbor algorithm based on clustering can have a strong generation ability. Experiment results on public datasets show that the k-nearest neighbor algorithm based on clustering can largely obtain the same nearest neighbors of raw kNN, thus obtaining higher classification accuracy.

Key words: k-nearest neighbor, data partition, local information, instance subset, clustering

CLC Number: