Journal of University of Science and Technology of China ›› 2020, Vol. 50 ›› Issue (8): 1048-1057.DOI: 10.3969/j.issn.0253-2778.2020.08.002

• Original Paper • Previous Articles     Next Articles

Research and practice of plagiarism detection in program code assignments by college students

YU Jun, LI Yajie, CHENG Lilei, LIAN Shun, TAN Chang, DING Decheng, Liu Qi   

  1. 1. School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China; 2. USTC iFLYTEK Co., Ltd., Hefei 230088, China; 3. Nanjing Qiancui Intelligent Technology Service Co., Ltd, Nanjing 210019, China
  • Received:2020-06-05 Revised:2020-06-24 Accepted:2020-06-24 Online:2020-08-31 Published:2020-06-24

Abstract: The programming ability of students directly reflects the learning effect of technical courses. The proportion of program code assignments are increasing in teaching evaluation. The low cost of plagiarism of program code homework leads to the widespread plagiarism in colleges and universities, which seriously affects the cultivation of students’ ability and the effect of teaching. To this end, a method for homework plagiarism detection is proposed by combining the artificial intelligence algorithm with data processing analysis technology to detect similarities in students’ homework intelligently and automatically, and analyze the overall situation of plagiarism. First, the complex situation of the program code assignments submitted by students is analyzed, and the data pre-processing process is designed. Then, the similarity detection algorithm for program code assignments based on KR and Winnowing is specifically proposed. Compared with the traditional detection methods, the accuracy of similarity detection in students’ homework is improved by such means as code formatting. In the practice of large-scale homework detection, the research optimization algorithm increases the differentiation of similarity results in different students’ homework. To verify the validity and practicability of the core similarity calculation part of this paper, a relevant simulation experiment process (including the comparison with JPlag detection system), was designed and the similarity calculation results were given under different plagiarism types on the same experimental data set. Finally, based on iFLYTEK’s Bosi intelligent online learning platform, the research has been applied in real scenarios. The experimental results and practical application results show that the proposed detection method has high validity and application value in the detection of similarities in program code assignments by college students.

Key words: plagiarism detection for program code, similarity detection, online wisdom education

CLC Number: