Cross-Language Source Code Re-Use Detection Using Latent Semantic Analysis

被引:1
|
作者
Flores, Enrique [1 ]
Barron-Cedeno, Alberto [2 ]
Moreno, Lidia [1 ]
Rosso, Paolo [1 ]
机构
[1] Univ Politecn Valencia, E-46022 Valencia, Spain
[2] HBKU, Qatar Comp Res Inst, Doha, Qatar
关键词
Cross-Language Re-Use Detection; Source Code; Plagiarism; Latent Semantic Analysis;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Nowadays, Internet is the main source to get information from blogs, encyclopedias, discussion forums, source code repositories, and more resources which are available just one click away. The temptation to re-use these materials is very high. Even source codes are easily available through a simple search on the Web. There is a need of detecting potential instances of source code re-use. Source code re-use detection has usually been approached comparing source codes in their compiled version. When dealing with cross-language source code re-use, traditional approaches can deal only with the programming languages supported by the compiler. We assume that a source code is a piece of text, with its syntax and structure, so we aim at applying models for free text re-use detection to source code. In this paper we compare a Latent Semantic Analysis (LSA) approach with previously used text re-use detection models for measuring cross-language similarity in source code. The LSA-based approach shows slightly better results than the other models, being able to distinguish between re-used and related source codes with a high performance.
引用
收藏
页码:1708 / 1725
页数:18
相关论文
共 50 条
  • [41] CLCD-I: Cross-Language Clone Detection by Using Deep Learning with InferCode
    Yahya, Mohammad A. A.
    Kim, Dae-Kyoo
    COMPUTERS, 2023, 12 (01)
  • [42] Cross-Language Plagiarism Detection using Word Embedding and Inverse Document Frequency (IDF)
    Aljuaid, Hanan
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (02) : 232 - 237
  • [43] Cross-Language Taint Analysis: Generating Caller-Sensitive Native Code Specification for Java']Java
    Kan, Shuangxiang
    Gao, Yuhao
    Zhong, Zexin
    Sui, Yulei
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2024, 50 (06) : 1518 - 1533
  • [44] Graph-Based Similarity Analysis: A New Approach to Cross-Language Plagiarism Detection
    Franco-Salvador, Marc
    Gupta, Parth
    Rosso, Paolo
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2013, (50): : 21 - 28
  • [45] A study of a cross-language perception based on cortical analysis using biomimetic STRFs
    Park, Sangwook
    Han, David K.
    Elhilali, Mounya
    INTERSPEECH 2019, 2019, : 1971 - 1975
  • [46] GPTCloneBench: A comprehensive benchmark of semantic clones and cross-language clones using GPT-3 model and SemanticCloneBench
    Alam, Ajmain I.
    Roy, Palash R.
    Al-Omari, Farouq
    Roy, Chanchal K.
    Roy, Banani
    Schneider, Kevin A.
    2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION, ICSME, 2023, : 1 - 13
  • [47] Using heuristics to estimate an appropriate number of latent topics in source code analysis
    Grant, Scott
    Cordy, James R.
    Skillicorn, David B.
    SCIENCE OF COMPUTER PROGRAMMING, 2013, 78 (09) : 1663 - 1678
  • [48] Providing a Source Code Security Analysis Model Using Semantic Web Techniques
    EkramiFard, Ala
    Kahani, Mohsen
    SECOND INTERNATIONAL CONGRESS ON TECHNOLOGY, COMMUNICATION AND KNOWLEDGE (ICTCK 2015), 2015, : 33 - 37
  • [49] Natural Language Understanding and Multimodal Discourse Analysis for Interpreting Extremist Communications and the Re-Use of These Materials Online
    Wignell, Peter
    Chai, Kevin
    Tan, Sabine
    O'Halloran, Kay
    Lange, Rebecca
    TERRORISM AND POLITICAL VIOLENCE, 2021, 33 (01) : 71 - 95
  • [50] Member checking: A feminist participatory analysis of the use of preliminary results pamphlets in cross-cultural, cross-language research
    Caretta, Martina Angela
    QUALITATIVE RESEARCH, 2016, 16 (03) : 305 - 318