Open Access Open Access  Restricted Access Subscription Access

Towards Reliable Code Plagiarism Detection: A Survey on Software Clone Detection

Sanjay B. Ankali, Dr. S G Gollagi, Dr. Bahubali M. Akiwate

Abstract


Despite substantial study over the past three decades resulting in the development of more than 250 clone detection technologies, there is no one framework that can accurately and reliably identify all four major types of clones. The lack of comprehensive, reliable, and language-neutral code clone detection has a significant negative influence on online learning systems like Coursera, which are unable to assess the proficiency of students in coding projects and assignments they submit to the online platforms. This survey paper can contribute to building more reliable code plagiarism detection by presenting various tools and techniques to find the same language and cross-language clone types with respect to the clone types they detect and the languages they work on. The paper highlights 3 major issues in terms of language agnostic nature and accuracy a) Most of the proposed techniques work only on a specific language like C, CPP, Java, or Python for detecting clones. b) Only 8 proposed works accurately classify all 4 basic clone types. c) 98% of the clone detection in the past is based on regular clones ignoring micro clones. The summary of the paper can provide proper directions in building a more reliable code plagiarism detection tool.


Full Text:

PDF

References


Wang, W. L. (2020). Detecting Code Clones with Graph Neural Network and Flow-Augmented Abstract Syntax Tree. 27th International Conference on Software Analysis, Evolution and Reengineering (SANER) (pp. 261-271). IEEE.

Krinke, J. (2001). Identifying similar code with program dependence graphs. Proceedings of the 8th Working Conference on Reverse Engineering (WCRE’01), , (pp. 301– 309). Stuttgart, Germany.

I. D. Baxter, A. Y. (1998). Clone detection using abstract syntax trees. Proceedings of the 14th International Conference on Software Maintenance (ICSM ’98), , (pp. Bethesda, Maryland, USA, 1998, pp. 368–). Bethesda, Maryland, USA.

Godfrey, C. K. (2006). clones considered harmful. Reverse Engineering(WCRE’06) (pp. 19-28). Benevento, Italy: IEEE.

Ducasse, S. R. (1999). A language independent approach for detecting duplicated code. International Conference on Software Maintenance-1999 (ICSM'99) (pp. 109-118). IEEE.

Chanchal Kumar Roy, J. R. (2007). A survey on software clone detection research. Queen’s School of Computing TR , 64-68.

Chanchal K. Roy, J. R. (2009). Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Science of Computer Programming , 470-495.

Dhavleesh Rattan, R. B. (2013). Software clone detection: A systematic review. Information and Software Technology , 1165-1199.

Ain, Q. U. (2019). A systematic review on code clone detection. . IEEE access , 86121-86144.

Kim, S. S. (2017). VUDDY: a scalable approach for vulnerable code clone discovery. In Security and Privacy (SP), 2017 IEEE Symposium (pp. 595-614). San Jose, CA, USA: IEEE.


Refbacks

  • There are currently no refbacks.