Cross-Language Source Code Re-Use Detection Using Latent Semantic Analysis

被引:1
|
作者
Flores, Enrique [1 ]
Barron-Cedeno, Alberto [2 ]
Moreno, Lidia [1 ]
Rosso, Paolo [1 ]
机构
[1] Univ Politecn Valencia, E-46022 Valencia, Spain
[2] HBKU, Qatar Comp Res Inst, Doha, Qatar
关键词
Cross-Language Re-Use Detection; Source Code; Plagiarism; Latent Semantic Analysis;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Nowadays, Internet is the main source to get information from blogs, encyclopedias, discussion forums, source code repositories, and more resources which are available just one click away. The temptation to re-use these materials is very high. Even source codes are easily available through a simple search on the Web. There is a need of detecting potential instances of source code re-use. Source code re-use detection has usually been approached comparing source codes in their compiled version. When dealing with cross-language source code re-use, traditional approaches can deal only with the programming languages supported by the compiler. We assume that a source code is a piece of text, with its syntax and structure, so we aim at applying models for free text re-use detection to source code. In this paper we compare a Latent Semantic Analysis (LSA) approach with previously used text re-use detection models for measuring cross-language similarity in source code. The LSA-based approach shows slightly better results than the other models, being able to distinguish between re-used and related source codes with a high performance.
引用
收藏
页码:1708 / 1725
页数:18
相关论文
共 50 条
  • [21] Using latent semantic analysis to identify similarities in source code to support program understanding
    Maletic, JI
    Marcus, A
    12TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2000, : 46 - 53
  • [22] An Experimental Comparison of Explicit Semantic Analysis Implementations for Cross-Language Retrieval
    Sorg, Philipp
    Cimiano, Philipp
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2010, 5723 : 36 - +
  • [23] Improving Cross-Language Code Clone Detection via Code Representation Learning and Graph Neural Networks
    Mehrotra, Nikita
    Sharma, Akash
    Jindal, Anmol
    Purandare, Rahul
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (11) : 4846 - 4868
  • [24] Cross-lingual latent semantic analysis for language modeling
    Kim, W
    Khudanpur, S
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 257 - 260
  • [25] Cross-Language Document Retrieval by using Non-linear Semantic Mapping
    Banchs, Rafael E.
    Jussa, Marta R. Costa
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2009, (43): : 169 - 176
  • [26] ZC3: Zero-Shot Cross-Language Code Clone Detection
    Li, Jia
    Tao, Chongyang
    Jin, Zhi
    Liu, Fang
    Li, Jia
    Li, Ge
    2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE, 2023, : 875 - 887
  • [27] GraphBinMatch: Graph-based Similarity Learning for Cross-Language Binary and Source Code Matching
    TehraniJamsaz, Ali
    Chen, Hanze
    Jannesari, Ali
    2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW 2024, 2024, : 506 - 515
  • [28] Cross-Language Aphasia Detection using Optimal Transport Domain Adaptation
    Balagopalan, Aparna
    Novikova, Jekaterina
    McDermott, Matthew B. A.
    Nestor, Bret
    Naumann, Tristan
    Ghassemi, Marzyeh
    MACHINE LEARNING FOR HEALTH WORKSHOP, VOL 116, 2019, 116 : 202 - 219
  • [29] Cross-language Plagiarism Detection Using BabelNet's Statistical Dictionary
    Franco-Salvador, Marc
    Gupta, Parth
    Rosso, Paolo
    COMPUTACION Y SISTEMAS, 2012, 16 (04): : 383 - 390
  • [30] A systematic study of knowledge graph analysis for cross-language plagiarism detection
    Franco-Salvador, Marc
    Rosso, Paolo
    Montes-y-Gomez, Manuel
    INFORMATION PROCESSING & MANAGEMENT, 2016, 52 (04) : 550 - 570