Cross-Language Source Code Re-Use Detection Using Latent Semantic Analysis

被引：1

作者：

Flores, Enrique ^{[1
]}

Barron-Cedeno, Alberto ^{[2
]}

Moreno, Lidia ^{[1
]}

Rosso, Paolo ^{[1
]}

机构：

[1] Univ Politecn Valencia, E-46022 Valencia, Spain

[2] HBKU, Qatar Comp Res Inst, Doha, Qatar

来源：

JOURNAL OF UNIVERSAL COMPUTER SCIENCE | 2015年 / 21卷 / 13期

关键词：

Cross-Language Re-Use Detection; Source Code; Plagiarism; Latent Semantic Analysis;

D O I：

暂无

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Nowadays, Internet is the main source to get information from blogs, encyclopedias, discussion forums, source code repositories, and more resources which are available just one click away. The temptation to re-use these materials is very high. Even source codes are easily available through a simple search on the Web. There is a need of detecting potential instances of source code re-use. Source code re-use detection has usually been approached comparing source codes in their compiled version. When dealing with cross-language source code re-use, traditional approaches can deal only with the programming languages supported by the compiler. We assume that a source code is a piece of text, with its syntax and structure, so we aim at applying models for free text re-use detection to source code. In this paper we compare a Latent Semantic Analysis (LSA) approach with previously used text re-use detection models for measuring cross-language similarity in source code. The LSA-based approach shows slightly better results than the other models, being able to distinguish between re-used and related source codes with a high performance.

引用

页码：1708 / 1725

页数：18

共 50 条

[21] Using latent semantic analysis to identify similarities in source code to support program understanding
Maletic, JI
Marcus, A
12TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2000, : 46 - 53
[22] An Experimental Comparison of Explicit Semantic Analysis Implementations for Cross-Language Retrieval
Sorg, Philipp
Cimiano, Philipp
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2010, 5723 : 36 - +
[23] Improving Cross-Language Code Clone Detection via Code Representation Learning and Graph Neural Networks
Mehrotra, Nikita
Sharma, Akash
Jindal, Anmol
Purandare, Rahul
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (11) : 4846 - 4868
[24] Cross-lingual latent semantic analysis for language modeling
Kim, W
Khudanpur, S
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 257 - 260
[25] Cross-Language Document Retrieval by using Non-linear Semantic Mapping
Banchs, Rafael E.
Jussa, Marta R. Costa
PROCESAMIENTO DEL LENGUAJE NATURAL, 2009, (43): : 169 - 176
[26] ZC3: Zero-Shot Cross-Language Code Clone Detection
Li, Jia
Tao, Chongyang
Jin, Zhi
Liu, Fang
Li, Jia
Li, Ge
2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE, 2023, : 875 - 887
[27] GraphBinMatch: Graph-based Similarity Learning for Cross-Language Binary and Source Code Matching
TehraniJamsaz, Ali
Chen, Hanze
Jannesari, Ali
2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW 2024, 2024, : 506 - 515
[28] Cross-Language Aphasia Detection using Optimal Transport Domain Adaptation
Balagopalan, Aparna
Novikova, Jekaterina
McDermott, Matthew B. A.
Nestor, Bret
Naumann, Tristan
Ghassemi, Marzyeh
MACHINE LEARNING FOR HEALTH WORKSHOP, VOL 116, 2019, 116 : 202 - 219
[29] Cross-language Plagiarism Detection Using BabelNet's Statistical Dictionary
Franco-Salvador, Marc
Gupta, Parth
Rosso, Paolo
COMPUTACION Y SISTEMAS, 2012, 16 (04): : 383 - 390
[30] A systematic study of knowledge graph analysis for cross-language plagiarism detection
Franco-Salvador, Marc
Rosso, Paolo
Montes-y-Gomez, Manuel
INFORMATION PROCESSING & MANAGEMENT, 2016, 52 (04) : 550 - 570

← 1 2 3 4 5 →