Cross-Language Source Code Re-Use Detection Using Latent Semantic Analysis

被引:1
|
作者
Flores, Enrique [1 ]
Barron-Cedeno, Alberto [2 ]
Moreno, Lidia [1 ]
Rosso, Paolo [1 ]
机构
[1] Univ Politecn Valencia, E-46022 Valencia, Spain
[2] HBKU, Qatar Comp Res Inst, Doha, Qatar
关键词
Cross-Language Re-Use Detection; Source Code; Plagiarism; Latent Semantic Analysis;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Nowadays, Internet is the main source to get information from blogs, encyclopedias, discussion forums, source code repositories, and more resources which are available just one click away. The temptation to re-use these materials is very high. Even source codes are easily available through a simple search on the Web. There is a need of detecting potential instances of source code re-use. Source code re-use detection has usually been approached comparing source codes in their compiled version. When dealing with cross-language source code re-use, traditional approaches can deal only with the programming languages supported by the compiler. We assume that a source code is a piece of text, with its syntax and structure, so we aim at applying models for free text re-use detection to source code. In this paper we compare a Latent Semantic Analysis (LSA) approach with previously used text re-use detection models for measuring cross-language similarity in source code. The LSA-based approach shows slightly better results than the other models, being able to distinguish between re-used and related source codes with a high performance.
引用
收藏
页码:1708 / 1725
页数:18
相关论文
共 50 条
  • [31] DeleSmell: Code smell detection based on deep learning and latent semantic analysis
    Zhang, Yang
    Ge, Chuyan
    Hong, Shuai
    Tian, Ruili
    Dong, Chunhao
    Liu, Jingjing
    KNOWLEDGE-BASED SYSTEMS, 2022, 255
  • [32] Language model adaptation in Tamil language using cross-lingual latent semantic analysis with document aligned corpora
    Selvam, M.
    Natarajan, A. M.
    CURRENT SCIENCE, 2010, 98 (07): : 922 - 929
  • [33] Recovering documentation-to-source-code traceability links using latent semantic indexing
    Marcus, A
    Maletic, JI
    25TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, PROCEEDINGS, 2003, : 125 - 135
  • [34] Improve Representation for Cross-Language Clone Detection by Pretrain Using Tree Autoencoder
    Ling, Huading
    Zhang, Aiping
    Yin, Changchun
    Li, Dafang
    Chang, Mengyu
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 33 (03): : 1561 - 1577
  • [35] Parkinson's Disease Detection Method Based on Cross-Language Acoustic Analysis
    Ji W.
    Wang C.
    Wu D.
    Li Y.
    Zheng H.
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2024, 46 (02): : 546 - 554
  • [36] Analysis and Re-Use of Videos in Educational Digital Libraries with Automatic Scene Detection
    Baraldi, Lorenzo
    Grana, Costantino
    Cucchiara, Rita
    DIGITAL LIBRARIES ON THE MOVE, IRCDL 2015, 2016, 612 : 155 - 164
  • [37] Experiments on the Indonesian Plagiarism Detection using Latent Semantic Analysis
    Soleman, Sidik
    Purwarianti, Ayu
    2014 2ND INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICOICT), 2014,
  • [38] Signature Based Intrusion Detection using Latent Semantic Analysis
    Lassez, Jean-Louis
    Rossi, Ryan
    Sheel, Stephen
    Mukkamala, Srinivas
    2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 1068 - 1074
  • [39] Use of semantic analysis latent in an attempt to optimize the acquisition by exposure to a foreign language
    Zampa, Virginie
    ALSIC-APPRENTISSAGE DES LANGUES ET SYSTEMS D INFORMATION ET DE COMMUNICATION, 2005, 8 (02): : 135 - 146
  • [40] Cross-language Speech Attribute Detection and Phone Recognition for Tibetan Using Deep Learning
    Wang, Hui
    Zhao, Yue
    Xu, Yanmin
    Xu, Xiaona
    Suo, Xingmei
    Ji, Qiang
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 474 - +