ZC3: Zero-Shot Cross-Language Code Clone Detection

被引:0
|
作者
Li, Jia [1 ]
Tao, Chongyang [2 ]
Jin, Zhi [1 ]
Liu, Fang [3 ]
Li, Jia [1 ]
Li, Ge [1 ]
机构
[1] Peking Univ, MoE, Key Lab High Confidence Software Technol, Beijing, Peoples R China
[2] Peking Univ, Sch Comp Sci, Beijing, Peoples R China
[3] Beihang Univ, Sch Comp Sci, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
code clone detection; zero-shot learning; crosslanguage; deep neural;
D O I
10.1109/ASE56229.2023.00210
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Developers introduce code clones to improve programming productivity. Many existing studies have achieved impressive performance in monolingual code clone detection. However, during software development, more and more developers write semantically equivalent programs with different languages to support different platforms and help developers translate projects from one language to another. Considering that collecting cross-language parallel data, especially for low-resource languages, is expensive and time-consuming, how designing an effective cross-language model that does not rely on any parallel data is a significant problem. In this paper, we propose a novel method named ZC(3) for Zero-shot Cross-language Code Clone detection. ZC(3) designs the contrastive snippet prediction to form an isomorphic representation space among different programming languages. Based on this, ZC(3) exploits domain-aware learning and cycle consistency learning to further constrain the model to generate representations that are aligned among different languages meanwhile are diacritical for different types of clones. To evaluate our approach, we conduct extensive experiments on four representative cross-language clone detection datasets. Experimental results show that ZC(3) outperforms the state-of-the-art baselines by 67.12%, 51.39%, 14.85%, and 53.01% on the MAP score, respectively. We further investigate the representational distribution of different languages and discuss the effectiveness of our method.
引用
收藏
页码:875 / 887
页数:13
相关论文
共 50 条
  • [1] Cross-Lingual Transfer in Zero-Shot Cross-Language Entity Linking
    Schumacher, Elliot
    Mayfield, James
    Dredze, Mark
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 583 - 595
  • [2] ZERO-SHOT PRONUNCIATION LEXICONS FOR CROSS-LANGUAGE ACOUSTIC MODEL TRANSFER
    Wiesner, Matthew
    Adams, Oliver
    Yarowsky, David
    Trmal, Jan
    Khudanpur, Sanjeev
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 1048 - 1054
  • [3] Cross-Language Code Similarity and Applications in Clone Detection and Code Search
    Mathew, George Varghese
    [J]. ProQuest Dissertations and Theses Global, 2022,
  • [4] C4: Contrastive Cross-Language Code Clone Detection
    Tao, Chenning
    Zhan, Qi
    Hu, Xing
    Xia, Xin
    [J]. 30TH IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2022), 2022, : 413 - 424
  • [5] TCCCD: Triplet-Based Cross-Language Code Clone Detection
    Fang, Yong
    Zhou, Fangzheng
    Xu, Yijia
    Liu, Zhonglin
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (21):
  • [6] Multi-Task Neural Sequence Labeling for Zero-Shot Cross-Language Boilerplate Removal
    Wu, Yu-Hao
    Chang, Chia-Hui
    [J]. 2021 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2021), 2021, : 326 - 334
  • [7] Structural and Nominal Cross-Language Clone Detection
    Nichols, Lawton
    Emre, Mehmet
    Hardekopf, Ben
    [J]. FUNDAMENTAL APPROACHES TO SOFTWARE ENGINEERING (FASE 2019), 2019, 11424 : 247 - 263
  • [8] Transfer language selection for zero-shot cross-lingual abusive language detection
    Eronen, Juuso
    Ptaszynski, Michal
    Masui, Fumito
    Arata, Masaki
    Leliwa, Gniewosz
    Wroczynski, Michal
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (04)
  • [9] LICCA: A Tool for Cross-Language Clone Detection
    Vislayski, Tijana
    Rakic, Gordana
    Cardozo, Nicolas
    Budimac, Zoran
    [J]. 2018 25TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2018), 2018, : 512 - 516
  • [10] Improving Cross-Language Code Clone Detection via Code Representation Learning and Graph Neural Networks
    Mehrotra, Nikita
    Sharma, Akash
    Jindal, Anmol
    Purandare, Rahul
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (11) : 4846 - 4868