中国科学技术大学学报 ›› 2020, Vol. 50 ›› Issue (8): 1048-1057.DOI: 10.3969/j.issn.0253-2778.2020.08.002

• 论著 • 上一篇    下一篇

高教程序代码作业抄袭检测的方法研究与实践

于俊,李雅洁,程礼磊,连顺,谭昶,丁德成,刘淇   

  1. 1.中国科学技术大学计算机科学与技术学院,安徽合肥 230027;2.科大讯飞股份有限公司,安徽合肥 230088;
  • 收稿日期:2020-06-05 修回日期:2020-06-24 接受日期:2020-06-24 出版日期:2020-08-31 发布日期:2020-06-24
  • 通讯作者: 刘淇
  • 作者简介:于俊,男,1981年生,博士生/工程师. 研究方向:数据挖掘、用户画像、知识图谱. E-mail: ustcyujun@163.com
  • 基金资助:
    国家自然科学基金(61922073),中央高校基本科研业务费专项(WK2150110021)资助.

Research and practice of plagiarism detection in program code assignments by college students

YU Jun, LI Yajie, CHENG Lilei, LIAN Shun, TAN Chang, DING Decheng, Liu Qi   

  1. 1. School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China; 2. USTC iFLYTEK Co., Ltd., Hefei 230088, China; 3. Nanjing Qiancui Intelligent Technology Service Co., Ltd, Nanjing 210019, China
  • Received:2020-06-05 Revised:2020-06-24 Accepted:2020-06-24 Online:2020-08-31 Published:2020-06-24

摘要: 学生的编程水平直接反映技术类课程的学习效果,因此教学考察中程序代码作业的比重也越来越大.由于程序代码作业抄袭成本低,导致抄袭现象不同程度地存在于各高校教学中,严重影响了学生能力的培养和教师教学的效果,打击学生学习的积极性乃至损坏学风.为此以智能且自动化方式找出学生作业的相似之处,分析学生抄袭的总体情况为目的,将人工智能算法和数据处理分析技术相结合,提出一种学生作业抄袭检测方法.首先,分析学生提交的程序代码作业的复杂情况,设计作业数据预处理流程.然后,具体提出了基于KR和Winnowing的程序代码作业相似度检测算法,与传统检测方法相比通过代码格式化等改进手段提升了学生作业相似检测的精准度,并在大批量作业检测实践中,研究优化算法增加了不同学生之间作业相似结果的区分度.为了验证相似度计算部分的有效性和实用性,进一步设计了相关的模拟实验流程(包括与JPlag检测系统的对比),给出在相同实验数据集上不同抄袭类型下的相似度计算结果.最后,依托于科大讯飞博思智慧在线学习平台对该研究进行了真实场景的实际应用.实验结果以及实际应用都表明,该程序代码作业抄袭检测方法,对高校学生程序代码作业相似度检测有效,具有很高的应用价值.

关键词: 程序代码抄袭检测, 相似度检测, 在线智慧教育

Abstract: The programming ability of students directly reflects the learning effect of technical courses. The proportion of program code assignments are increasing in teaching evaluation. The low cost of plagiarism of program code homework leads to the widespread plagiarism in colleges and universities, which seriously affects the cultivation of students’ ability and the effect of teaching. To this end, a method for homework plagiarism detection is proposed by combining the artificial intelligence algorithm with data processing analysis technology to detect similarities in students’ homework intelligently and automatically, and analyze the overall situation of plagiarism. First, the complex situation of the program code assignments submitted by students is analyzed, and the data pre-processing process is designed. Then, the similarity detection algorithm for program code assignments based on KR and Winnowing is specifically proposed. Compared with the traditional detection methods, the accuracy of similarity detection in students’ homework is improved by such means as code formatting. In the practice of large-scale homework detection, the research optimization algorithm increases the differentiation of similarity results in different students’ homework. To verify the validity and practicability of the core similarity calculation part of this paper, a relevant simulation experiment process (including the comparison with JPlag detection system), was designed and the similarity calculation results were given under different plagiarism types on the same experimental data set. Finally, based on iFLYTEK’s Bosi intelligent online learning platform, the research has been applied in real scenarios. The experimental results and practical application results show that the proposed detection method has high validity and application value in the detection of similarities in program code assignments by college students.

Key words: plagiarism detection for program code, similarity detection, online wisdom education

中图分类号: