Cross-Language Source Code Re-Use Detection Using Latent Semantic Analysis

被引:1
|
作者
Flores, Enrique [1 ]
Barron-Cedeno, Alberto [2 ]
Moreno, Lidia [1 ]
Rosso, Paolo [1 ]
机构
[1] Univ Politecn Valencia, E-46022 Valencia, Spain
[2] HBKU, Qatar Comp Res Inst, Doha, Qatar
关键词
Cross-Language Re-Use Detection; Source Code; Plagiarism; Latent Semantic Analysis;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Nowadays, Internet is the main source to get information from blogs, encyclopedias, discussion forums, source code repositories, and more resources which are available just one click away. The temptation to re-use these materials is very high. Even source codes are easily available through a simple search on the Web. There is a need of detecting potential instances of source code re-use. Source code re-use detection has usually been approached comparing source codes in their compiled version. When dealing with cross-language source code re-use, traditional approaches can deal only with the programming languages supported by the compiler. We assume that a source code is a piece of text, with its syntax and structure, so we aim at applying models for free text re-use detection to source code. In this paper we compare a Latent Semantic Analysis (LSA) approach with previously used text re-use detection models for measuring cross-language similarity in source code. The LSA-based approach shows slightly better results than the other models, being able to distinguish between re-used and related source codes with a high performance.
引用
下载
收藏
页码:1708 / 1725
页数:18
相关论文
共 50 条
  • [1] On the Mono-and Cross-Language Detection of Text Re-Use and Plagiarism
    Barron Cedeno, Alberto
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2013, (50): : 103 - 105
  • [2] Towards the Detection of Cross-Language Source Code Reuse
    Flores, Enrique
    Barron-Cedeno, Alberto
    Rosso, Paolo
    Moreno, Lidia
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2011, 6716 : 250 - 253
  • [3] Query Expansion in Cross-Language Information Retrieval Using Latent Semantic Analysis
    Bi Jianting
    Su Yidan
    ICCSE 2008: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION: ADVANCED COMPUTER TECHNOLOGY, NEW EDUCATION, 2008, : 220 - 224
  • [4] Detection of Software Security Weaknesses Using Cross-Language Source Code Representation (CLaSCoRe)
    Zaharia, Sergiu
    Rebedea, Traian
    Trausan-Matu, Stefan
    APPLIED SCIENCES-BASEL, 2023, 13 (13):
  • [5] Flowchart-Based Cross-Language Source Code Similarity Detection
    Zhang, Feng
    Li, Guofan
    Liu, Cong
    Song, Qian
    SCIENTIFIC PROGRAMMING, 2020, 2020
  • [6] Cross-Language Automatic Plagiarism Detector Using Latent Semantic Analysis and Self-Organizing Map
    Ratna, Anak Agung Putri
    Nabhastala, Paskalis Nandana Yestha
    Ibrahim, Ihsan
    Ekadiyanto, F. Astha
    Salman, Muhammad
    Herusaktiawan, Muhammad Yusuf Irfan
    Purnamasari, Prima Dewi
    AIVR 2018: 2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND VIRTUAL REALITY, 2018, : 83 - 87
  • [7] An Approach to Source-Code Plagiarism Detection and Investigation Using Latent Semantic Analysis
    Cosma, Georgina
    Joy, Mike
    IEEE TRANSACTIONS ON COMPUTERS, 2012, 61 (03) : 379 - 394
  • [8] TF-IDF-INSPIRED DETECTION FOR CROSS-LANGUAGE SOURCE CODE PLAGIARISM AND COLLUSION
    Karnalim, Oscar
    COMPUTER SCIENCE-AGH, 2020, 21 (01): : 113 - 136
  • [9] Cross-Language Code Similarity and Applications in Clone Detection and Code Search
    Mathew, George Varghese
    ProQuest Dissertations and Theses Global, 2022,
  • [10] Evaluating Cross-Language Explicit Semantic Analysis and Cross Querying
    Anderka, Maik
    Lipka, Nedim
    Stein, Benno
    MULTILINGUAL INFORMATION ACCESS EVALUATION I: TEXT RETRIEVAL EXPERIMENTS, 2010, 6241 : 50 - 57