Threshold-Free Code Clone Detection for a Large-Scale Heterogeneous Java']Java Repository

被引:0
|
作者
Keivanloo, Iman [1 ]
Zhang, Feng [2 ]
Zou, Ying [1 ]
机构
[1] Queens Univ, Dept Elect & Comp Engn, Kingston, ON, Canada
[2] Queens Univ, Sch Comp, Kingston, ON, Canada
关键词
clone detection; clone search; clustering; unsupervised learning; large-scale repository; threshold-free;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Code clones are unavoidable entities in software ecosystems. A variety of clone-detection algorithms are available for finding code clones. For Type-3 clone detection at method granularity (i. e., similar methods with changes in statements), dissimilarity threshold is one of the possible configuration parameters. Existing approaches use a single threshold to detect Type-3 clones across a repository. However, our study shows that to detect Type-3 clones at method granularity on a large-scale heterogeneous repository, multiple thresholds are often required. We find that the performance of clone detection improves if selecting different thresholds for various groups of clones in a heterogeneous repository (i.e., various applications). In this paper, we propose a threshold-free approach to detect Type-3 clones at method granularity across a large number of applications. Our approach uses an unsupervised learning algorithm, i.e., k -means, to determine true and false clones. We use a clone benchmark with 330,840 tagged clones from 24,824 open source Java projects for our study. We observe that our approach improves the performance significantly by 12% in terms of Fmeasure. Furthermore, our threshold-free approach eliminates the concern of practitioners about possible misconfiguration of Type-3 clone detection tools.
引用
收藏
页码:201 / 210
页数:10
相关论文
共 50 条
  • [1] Jcluster: an efficient Java']Java parallel environment on a large-scale heterogeneous cluster
    Zhang, Bao-Yin
    Yang, Guang-Wen
    Zheng, Wei-Min
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2006, 18 (12): : 1541 - 1557
  • [2] Large-scale image deblurring in Java']Java
    Wendykier, Piotr
    Nagy, James G.
    [J]. COMPUTATIONAL SCIENCE - ICCS 2008, PT 1, 2008, 5101 : 721 - 730
  • [3] Large-scale characterization of Java']Java streams
    Rosales, Eduardo
    Basso, Matteo
    Rosa, Andrea
    Binder, Walter
    [J]. SOFTWARE-PRACTICE & EXPERIENCE, 2023, 53 (09): : 1763 - 1792
  • [4] Java']Java for large-scale scientific computations?
    Krall, A
    Tomsich, P
    [J]. LARGE-SCALE SCIENTIFIC COMPUTING, 2001, 2179 : 228 - 235
  • [5] STUBBER: Compiling Source Code into Bytecode without Dependencies for Java']Java Code Clone Detection
    Schafer, Andre
    Amme, Wolfram
    Heinze, Thomas S.
    [J]. 2021 IEEE 15TH INTERNATIONAL WORKSHOP ON SOFTWARE CLONES, IWSC 2021, 2021, : 29 - 35
  • [6] Java']Java communications for large-scale parallel computing
    Getov, V
    Philippsen, M
    [J]. LARGE-SCALE SCIENTIFIC COMPUTING, 2001, 2179 : 33 - 45
  • [7] Method-level Code Clone Detection for Java']Java through Hybrid Approach
    Kodhai, Egambaram
    Kanmani, Selvadurai
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2017, 14 (06) : 914 - 922
  • [8] A large-scale empirical study of code smells in Java']JavaScript projects
    Johannes, David
    Khomh, Foutse
    Antoniol, Giuliano
    [J]. SOFTWARE QUALITY JOURNAL, 2019, 27 (03) : 1271 - 1314
  • [9] SourcererJBF: A Java']Java Build Framework For Large-Scale Compilation
    Misu, Md Rakib Hossain
    Achar, Rohan
    Lopes, Cristina V.
    [J]. ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2024, 33 (03)
  • [10] APINetworks Java']Java. A Java']Java approach to the efficient treatment of large-scale complex networks
    Munoz-Caro, Camelia
    Nino, Alfonso
    Reyes, Sebastian
    Castillo, Miriam
    [J]. COMPUTER PHYSICS COMMUNICATIONS, 2016, 207 : 549 - 552