Source Code Authorship Identification Using Deep Neural Networks

被引:13
|
作者
Kurtukova, Anna [1 ]
Romanov, Aleksandr [1 ]
Shelupanov, Alexander [1 ]
机构
[1] Tomsk State Univ Control Syst & Radioelect, Fac Secur, Tomsk 634050, Russia
来源
SYMMETRY-BASEL | 2020年 / 12卷 / 12期
关键词
source code; authorship; symmetry; software engineering; machine learning; deanonymization; neural networks;
D O I
10.3390/sym12122044
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Many open-source projects are developed by the community and have a common basis. The more source code is open, the more the project is open to contributors. The possibility of accidental or deliberate use of someone else's source code as a closed functionality in another project (even a commercial) is not excluded. This situation could create copyright disputes. Adding a plagiarism check to the project lifecycle during software engineering solves this problem. However, not all code samples for comparing can be found in the public domain. In this case, the methods of identifying the source code author can be useful. Therefore, identifying the source code author is an important problem in software engineering, and it is also a research area in symmetry. This article discusses the problem of identifying the source code author and modern methods of solving this problem. Based on the experience of researchers in the field of natural language processing (NLP), the authors propose their technique based on a hybrid neural network and demonstrate its results both for simple cases of determining the authorship of the code and for those complicated by obfuscation and using of coding standards. The results show that the author's technique successfully solves the essential problems of analogs and can be effective even in cases where there are no obvious signs indicating authorship. The average accuracy obtained for all programming languages was 95% in the simple case and exceeded 80% in the complicated ones.
引用
下载
收藏
页码:1 / 17
页数:17
相关论文
共 50 条
  • [31] Language Identification Using Deep Convolutional Recurrent Neural Networks
    Bartz, Christian
    Herold, Tom
    Yang, Haojin
    Meinel, Christoph
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT VI, 2017, 10639 : 880 - 889
  • [32] Wireless Technology Identification Using Deep Convolutional Neural Networks
    Bitar, Naim
    Muhammad, Siraj
    Refai, Hazem H.
    2017 IEEE 28TH ANNUAL INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR, AND MOBILE RADIO COMMUNICATIONS (PIMRC), 2017,
  • [33] Identification of perceived sentences using deep neural networks in EEG
    Valle, Carlos
    Mendez-Orellana, Carolina
    Herff, Christian
    Rodriguez-Fernandez, Maria
    Journal of Neural Engineering, 2024, 21 (05)
  • [34] Multiple Authors Identification from Source Code Using Deep Learning Model
    Omi, Abdul Mannan
    Hossain, Monir
    Islam, Md Nahidul
    Mittra, Tanni
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ELECTRONICS, COMMUNICATIONS AND INFORMATION TECHNOLOGY 2021 (ICECIT 2021), 2021,
  • [35] Parametric System Identification Using Deep Convolutional Neural Networks
    Genc, Sahika
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 2112 - 2119
  • [36] Overview of the PAN@FIRE 2020 Task on the Authorship Identification of SOurce COde
    Fadel, Ali
    Musleh, Husam
    Tuffaha, Ibraheem
    Al-Ayyoub, Mahmoud
    Jararweh, Yaser
    Benkhelifa, Elhadj
    PROCEEDINGS OF THE 12TH ANNUAL MEETING OF THE FORUM FOR INFORMATION RETRIEVAL EVALUATION (FIRE 2020), 2020, : 4 - 8
  • [37] The significance of user-defined identifiers in java source code authorship identification
    Department of Information and Communication Systems Engineering, University of the Aegean, Samos, 83200, Greece
    不详
    不详
    Comput Syst Sci Eng, 2 (123-132):
  • [38] On Improving Authorship Attribution of Source Code
    Tennyson, Matthew F.
    DIGITAL FORENSICS AND CYBER CRIME, ICDF2C 2012, 2013, 114 : 58 - 65
  • [39] A Practical Black-Box Attack on Source Code Authorship Identification Classifiers
    Liu, Qianjun
    Ji, Shouling
    Liu, Changchang
    Wu, Chunming
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2021, 16 : 3620 - 3633
  • [40] Deep-Sea Debris Identification Using Deep Convolutional Neural Networks
    Xue, Bing
    Huang, Baoxiang
    Chen, Ge
    Li, Haitao
    Wei, Weibo
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 8909 - 8921