Approach based on compiling optimization and disassembling to detect program similarity
-
摘要: 提出了基于编译优化和反汇编的程序相似性检测方法,能够检测出标识符重命名、增加冗余语句、等价的控制结构替换等12种学生常用的抄袭手段.基于该方法,设计和实现了一个程序相似性检测系统BuaaSim,采用编译优化和反汇编技术将源程序转化为汇编指令集合,删除和替换汇编指令中对程序本质特征影响不大的易变元素,使用一个与指令顺序无关的决策函数计算程序相似度;还给出一个简单有效的聚类算法,从程序集合中聚类出相似的程序子集.通过与著名的JPlag系统针对两份典型的抄袭样本集进行评测对比,表明本文方法的检测效果具有明显的优势.Abstract: An approach based on compiling optimization and disassembling was proposed to detect similarity in computer programs. It can detect 12 modification strategies that are often used by students, such as renaming identifiers, adding redundant statements and replacing control structures with equivalent structures. The implemented software, called BuaaSim, translates source code into assembly instructions with the help of compiler and disassembler, removes and replaces those easily changed elements in the assembly instructions, and applies a decision function to calculate the similarity, which doesn-t depend on the order of assembly instructions. A simple clustering algorithm was also introduced to find all groups of similar programs. By using two sets of plagiarized transcripts as testing programs, the comparative evaluation shows that BuaaSim has more advantages than JPlag, a famous similarity detection system.
-
Key words:
- plagiarism /
- program similarity /
- similarity detection /
- compiling optimization
-
[1] Georgina C,Mike J.Source-code plagiarism:A UK academic perspective .Research Report RR-422,Department of Computer Science, University of Warwick, 2006 [2] Sheard J, Dick M, Markham S, et al. Cheating and plagiarism:perceptions and practices of first year it students Proceedings of the 7th Annual SIGCSE Conference on Innovation and Technology in Computer Science Education.New York:Association for Computing Machinery,2002:183-187 [3] Aiken A. Moss:a system for detecting software plagiarism . 2006-09 .http://theory.stanford.edu/~aiken/moss/ [4] Prechelt L, Malpohl G, Philippsen M. Finding plagiarisms among a set of programs with JPlag[J]. Journal of Universal Computer Science,2002,8(11):1016-1038 [5] Emeric K,Moritz K. JPlag:a system that finds similarities among multiple sets of source code files .2005 . http://www.ipd.uni-karlsruhe.de/jplag/ [6] Gitchell D, Tran N.Sim:A utility for detecting similarity in computer programs The Proceedings of the Thirtieth SIGCSE Technical Symposium on Computer Science Education.New York:Association for Computing Machinery,1999:266-270 [7] Wise M J.YAP3:improved detection of similarities in computer program and other texts Proceedings of the Twenty-Seventh SIGCSE Technical Symposium on Computer Science Education.New York:Association for Computing Machinery,1996,28(1):130-134 [8] Jones E L. Metrics based plagiarism monitoring Proceedings of the Sixth Annual CCSC Northeastern Conference on The Journal of Computing in Small Colleges. USA:Consortium for Computing Sciences in Colleges,2001,16(4):253-261 [9] Verco K L,Wise M J. Plagiarism à la mode:A comparison of automated systems for detecting suspected plagiarism[J]. The Computer Journal, 1996, 39(9):741-750
点击查看大图
计量
- 文章访问数: 2950
- HTML全文浏览量: 219
- PDF下载量: 1857
- 被引次数: 0