论文标题
在入门编程课程作业中测量窃
Measuring Plagiarism in Introductory Programming Course Assignments
论文作者
论文摘要
在编程任务中衡量窃是教育程序的重要任务。本文讨论了窃的方法及其在C ++编写的入门编程课程作业中的检测。公开可用的一小部分作业。开发了一个通用框架来计算解决方案对之间的相似性,该框架使用三种基于令牌的相似性方法作为特征,并预测解决方案是否被窃。还测量了每个功能的重要性,作为回报,这对每种使用方法的有效性进行了排名。最后,与原始数据相比,人为生成的数据集改善了结果。在原始数据集和合成数据集上,我们的F1得分为0.955和0.971。
Measuring plagiarism in programming assignments is an essential task to the educational procedure. This paper discusses the methods of plagiarism and its detection in introductory programming course assignments written in C++. A small corpus of assignments is made publically available. A general framework to compute the similarity between a solution pair is developed that uses the three token-based similarity methods as features and predicts if the solution is plagiarized. The importance of each feature is also measured, which in return ranks the effectiveness of each method in use. Finally, the artificially generated dataset improves the results compared to the original data. We achieved an F1 score of 0.955 and 0.971 on original and synthetic datasets.