Code Clone Detection: A Literature Review

被引:0
|
作者
Chen Q.-Y. [1 ]
Li S.-P. [1 ]
Yan M. [1 ]
Xia X. [2 ]
机构
[1] College of Computer Science and Technology, Zhejiang University, Hangzhou
[2] Faculty of Information Technology, Monash University, Melbourne, 3800, VIC
来源
Ruan Jian Xue Bao/Journal of Software | 2019年 / 30卷 / 04期
关键词
Clone detection; Code clone; Code representation;
D O I
10.13328/j.cnki.jos.005711
中图分类号
学科分类号
摘要
Code clone refers to more than two duplicate or similar code fragments existing in a software system. Code clone is a common phenomenon during software development which can facilitate development and has positive impacts on software system. However, research shows that code clone will also do harm to the development and maintenance of software system, including but not limited to the decline of stability, redundancy of source code repository, and propagation of software defects. Code clone is one of the most active research areas in software engineering. Therefore, various detection techniques are proposed to automatically detect code clone in software systems, which help improve software quality. There are a lot of achievements in this area, and these techniques can be categorized to text-based, lexis-based, syntax-based, and semantic-based categories. Current techniques have obtained effective results in text-based clone detection, but still challenges in detecting other types of code clone. More advanced and unified theoretic and technical guidelines are needed to improve code clone detection techniques. Therefore, in this paper, a literature review for code detection is presented especially from the perspective of source code representation. In summary, the contributions of this study are: (1) current code clone detection techniques are concluded and classified from the perspective of code representation; (2) the model validation and performance measures in model evaluation are concluded; and (3) the key issues of code clone research are summarized from three aspects: scientific, practical, and technical difficulties. The possible solutions to the problems and the future development of the research are elaborated, focusing on data annotation, characterization methods, model construction, and engineering practice. © Copyright 2019, Institute of Software, the Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:962 / 980
页数:18
相关论文
共 88 条
  • [1] Kamiya T., Kusumoto S., Inoue K., CCFinder: A multilinguistic token-based code clone detection system for large scale source code, IEEE Trans. on Software Engineering, 28, 7, pp. 654-670, (2002)
  • [2] Roy C.K., Cordy J.R., A survey on software clone detection research, Queen's School of Computing TR, 541, 115, pp. 64-68, (2007)
  • [3] Rieger M., Ducasse S., Lanza M., Insights into system-wide code duplication, Proc. of the 11th Working Conf. on Reverse Engineering, pp. 100-109, (2004)
  • [4] Sheneamer A., Kalita J., A survey of software clone detection techniques, Int'l Journal of Computer Applications, 137, 10, pp. 1-21, (2016)
  • [5] Chen W.-K., Li B., Gupta R., Code compaction of matching single-entry multiple-exit regions, Proc. of the Int'l Static Analysis Symp., pp. 401-417, (2003)
  • [6] Kim M., Sazawal V., Notkin D., Murphy G., An empirical study of code clone genealogies, ACM SIGSOFT Software Engineering Notes, 30, pp. 187-196, (2005)
  • [7] Aversano L., Cerulo L., Di Penta M., How clones are maintained: An empirical study, Proc. of the European Conf. on Software Maintenance and Reengineering, pp. 81-90, (2007)
  • [8] Koschke R., Survey of research on software clones, Proc. of the Schloss Dagstuhl-Leibniz-Zentrum für Informatik, (2007)
  • [9] Mondal M., Rahman M.S., Saha R.K., Roy C.K., Krinke J., Schneider K.A., An empirical study of the impacts of clones in software maintenance, Proc. of the 19th IEEE Int'l Conf. on Program Comprehension, pp. 242-245, (2011)
  • [10] Rattan D., Bhatia R., Singh M., Software clone detection: A systematic review, Information and Software Technology, 55, 7, pp. 1165-1199, (2013)