How are functionally similar code clones syntactically different? An empirical study and a benchmark

被引:12
|
作者
Wagner, Stefan [1 ]
Abdulkhaleq, Asim [1 ]
Bogicevic, Ivan [1 ]
Ostberg, Jan-Peter [1 ]
Ramadani, Jasmin [1 ]
机构
[1] Univ Stuttgart, Inst Software Technol, Stuttgart, Germany
来源
关键词
Code clone; Functionally similar clone; Empirical study; Benchmark;
D O I
10.7717/peerj-cs.49
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Background. Today, redundancy in source code, so-called "clones'' caused by copy &paste can be found reliably using clone detection tools. Redundancy can arise also independently, however, not caused by copy&paste. At present, it is not clear how only functionally similar clones (FSC) differ from clones created by copy&paste. Our aim is to understand and categorise the syntactical differences in FSCs that distinguish them from copy&paste clones in a way that helps clone detection research. Methods. We conducted an experiment using known functionally similar programs in Java and C from coding contests. We analysed syntactic similarity with traditional detection tools and explored whether concolic clone detection can go beyond syntax. We ran all tools on 2,800 programs and manually categorised the differences in a random sample of 70 program pairs. Results. We found no FSCs where complete files were syntactically similar. We could detect a syntactic similarity in a part of the files in <16% of the program pairs. Concolic detection found 1 of the FSCs. The differences between program pairs were in the categories algorithm, data structure, OO design, I/O and libraries. We selected 58 pairs for an openly accessible benchmark representing these categories. Discussion. The majority of differences between functionally similar clones are beyond the capabilities of current clone detection approaches. Yet, our benchmark can help to drive further clone detection research.
引用
收藏
页数:26
相关论文
共 41 条
  • [31] How to teach Bayesian reasoning: An empirical study comparing four different probability training courses
    Steib, Nicole
    Buechter, Theresa
    Eichler, Andreas
    Binder, Karin
    Krauss, Stefan
    Boecherer-Linder, Katharina
    Vogel, Markus
    Hilbert, Sven
    LEARNING AND INSTRUCTION, 2025, 95
  • [32] HOW DIFFERENT LIFESTYLES AFFECT VALUE APPRAISALS AND PURCHASE OF ICT PRODUCTS: A COMPARATIVE EMPIRICAL STUDY
    Pan, Yu
    Wang, Fenghua
    Liu, Dan
    Gao, Li
    Yuan, Yufei
    JOURNAL OF ELECTRONIC COMMERCE RESEARCH, 2018, 19 (03): : 280 - 300
  • [33] How Long Do Vulnerabilities Live in the Code? A Large-Scale Empirical Measurement Study on FOSS Vulnerability Lifetimes
    Alexopoulos, Nikolaos
    Brack, Manuel
    Wagner, Jan Philipp
    Grube, Tim
    Muehlhaeuser, Max
    PROCEEDINGS OF THE 31ST USENIX SECURITY SYMPOSIUM, 2022, : 359 - 376
  • [34] Similar spaces, different usage : A comparative study on how residents in the capitals of Finland and Denmark use cemeteries as recreational landscapes
    Nordh, Helena
    Olafsson, Anton Stahl
    Kajosaari, Anna
    Praestholm, Soren
    Liu, Yu
    Rossi, Saana
    Gentin, Sandra
    URBAN FORESTRY & URBAN GREENING, 2022, 73
  • [35] How is public discussion as reflected in WeChat articles different from scholarly research in China? An empirical study of metaverse
    Zhang, Yang
    Xie, Yinghua
    Li, Longfei
    Liang, Yian
    Yu, Houqiang
    SCIENTOMETRICS, 2024, 129 (01) : 473 - 495
  • [36] How are C-suite executives different? A comparative empirical study of the survival of American chief information officers
    Dawson, Gregory S.
    Ho, Man-Wai
    Kauffman, Robert J.
    DECISION SUPPORT SYSTEMS, 2015, 74 : 88 - 101
  • [37] How is public discussion as reflected in WeChat articles different from scholarly research in China? An empirical study of metaverse
    Yang Zhang
    Yinghua Xie
    Longfei Li
    Yian Liang
    Houqiang Yu
    Scientometrics, 2024, 129 : 473 - 495
  • [38] How is motivational interviewing (un)related to self-determination theory: An empirical study from different healthcare settings
    Abildsnes, Eirik
    Elin Andresen, Nina
    Storbaekken, Solveig
    Beate Samdal, Gro
    Mildestvedt, Thomas
    Meland, Eivind
    SCANDINAVIAN JOURNAL OF PSYCHOLOGY, 2021, 62 (05) : 709 - 716
  • [39] An Empirical Study of How Household Energy Consumption Is Affected by Co-Owning Different Technological Means to Produce Renewable Energy and the Production Purpose
    Roth, Lucas
    Lowitzsch, Jens
    Yildiz, Oezguer
    ENERGIES, 2021, 14 (13)
  • [40] How Do Different Locations, Floors and Aspects Influence Indoor Radon Concentrations? An Empirical Study Using Neural Networks for a University Campus in Northwestern Turkey
    Atik, S.
    Yetis, H.
    Denizli, H.
    Evrendilek, F.
    INDOOR AND BUILT ENVIRONMENT, 2013, 22 (04) : 650 - 658