How are functionally similar code clones syntactically different? An empirical study and a benchmark

被引:12
|
作者
Wagner, Stefan [1 ]
Abdulkhaleq, Asim [1 ]
Bogicevic, Ivan [1 ]
Ostberg, Jan-Peter [1 ]
Ramadani, Jasmin [1 ]
机构
[1] Univ Stuttgart, Inst Software Technol, Stuttgart, Germany
来源
关键词
Code clone; Functionally similar clone; Empirical study; Benchmark;
D O I
10.7717/peerj-cs.49
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Background. Today, redundancy in source code, so-called "clones'' caused by copy &paste can be found reliably using clone detection tools. Redundancy can arise also independently, however, not caused by copy&paste. At present, it is not clear how only functionally similar clones (FSC) differ from clones created by copy&paste. Our aim is to understand and categorise the syntactical differences in FSCs that distinguish them from copy&paste clones in a way that helps clone detection research. Methods. We conducted an experiment using known functionally similar programs in Java and C from coding contests. We analysed syntactic similarity with traditional detection tools and explored whether concolic clone detection can go beyond syntax. We ran all tools on 2,800 programs and manually categorised the differences in a random sample of 70 program pairs. Results. We found no FSCs where complete files were syntactically similar. We could detect a syntactic similarity in a part of the files in <16% of the program pairs. Concolic detection found 1 of the FSCs. The differences between program pairs were in the categories algorithm, data structure, OO design, I/O and libraries. We selected 58 pairs for an openly accessible benchmark representing these categories. Discussion. The majority of differences between functionally similar clones are beyond the capabilities of current clone detection approaches. Yet, our benchmark can help to drive further clone detection research.
引用
收藏
页数:26
相关论文
共 41 条
  • [21] How well do professional developers test with code coverage visualizations? An empirical study
    Lawrance, J
    Clarke, S
    Burnett, M
    Rothermel, G
    2005 IEEE SYMPOSIUM ON VISUAL LANGUAGE AND HUMAN-CENTRIC COMPUTING, PROCEEDINGS, 2005, : 53 - 60
  • [22] Bug-proneness and late propagation tendency of code clones: A Comparative study on different clone types
    Mondal, Manishankar
    Roy, Chanchal K.
    Schneider, Kevin A.
    JOURNAL OF SYSTEMS AND SOFTWARE, 2018, 144 : 41 - 59
  • [23] How Do Developers Adapt Code Snippets to Their Contexts? An Empirical Study of Context-Based Code Snippet Adaptations
    Zhang, Tanghaoran
    Lu, Yao
    Yu, Yue
    Mao, Xinjun
    Zhang, Yang
    Zhao, Yuxin
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2024, 50 (11) : 2712 - 2731
  • [24] How are Deep Learning Models Similar?: An Empirical Study on Clone Analysis of Deep Learning Software
    Wu, Xiongfei
    Qin, Liangyu
    Yu, Bing
    Xie, Xiaofei
    Ma, Lei
    Xue, Yinxing
    Liu, Yang
    Zhao, Jianjun
    2020 IEEE/ACM 28TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC, 2020, : 172 - 183
  • [25] Mitochondrial disease and bipolar disorder: how similar or different is the biology and genetics? A family study
    Duong, A.
    Beaulieu, M.
    Andreazza, A. C.
    BIPOLAR DISORDERS, 2018, 20 : 46 - 46
  • [26] How do beer prices vary across different pubs? An empirical study
    Shakina, Ekaterina
    Cabras, Ignazio
    INTERNATIONAL JOURNAL OF CONTEMPORARY HOSPITALITY MANAGEMENT, 2022, 34 (05) : 1984 - 2003
  • [27] How Do Infrastructure-as-Code Practitioners Update their Provider Dependencies? An Empirical Study on the AWS Provider
    Begoug, Mahi
    Ouni, Ali
    SERVICE-ORIENTED COMPUTING, ICSOC 2024, PT II, 2025, 15405 : 373 - 388
  • [28] How Does Modern Code Review Impact Software Design Degradation? An In-depth Empirical Study
    Uchoa, Anderson
    Barbosa, Caio
    Oizumi, Willian
    Blenilio, Publio
    Lima, Rafael
    Garcia, Alessandro
    Bezerra, Carla
    2020 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2020), 2020, : 511 - 522
  • [29] Investigating the Impact of Code Smells on System's Quality: An Empirical Study on Systems of Different Application Domains
    Fontana, Francesca Arcelli
    Ferme, Vincenzo
    Marino, Alessandro
    Walter, Bartosz
    Martenka, Pawel
    2013 29TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE (ICSM), 2013, : 260 - 269
  • [30] How to effectively communicate your code of ethics: An empirical study using a cluster randomized control trial experiment
    Gomez-Alatorre, Eugenio
    Cunado, Juncal
    Ferrero, Ignacio
    BUSINESS AND SOCIETY REVIEW, 2022, 127 (01) : 69 - 96