Source Code Authorship Identification Using Deep Neural Networks

被引:13
|
作者
Kurtukova, Anna [1 ]
Romanov, Aleksandr [1 ]
Shelupanov, Alexander [1 ]
机构
[1] Tomsk State Univ Control Syst & Radioelect, Fac Secur, Tomsk 634050, Russia
来源
SYMMETRY-BASEL | 2020年 / 12卷 / 12期
关键词
source code; authorship; symmetry; software engineering; machine learning; deanonymization; neural networks;
D O I
10.3390/sym12122044
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Many open-source projects are developed by the community and have a common basis. The more source code is open, the more the project is open to contributors. The possibility of accidental or deliberate use of someone else's source code as a closed functionality in another project (even a commercial) is not excluded. This situation could create copyright disputes. Adding a plagiarism check to the project lifecycle during software engineering solves this problem. However, not all code samples for comparing can be found in the public domain. In this case, the methods of identifying the source code author can be useful. Therefore, identifying the source code author is an important problem in software engineering, and it is also a research area in symmetry. This article discusses the problem of identifying the source code author and modern methods of solving this problem. Based on the experience of researchers in the field of natural language processing (NLP), the authors propose their technique based on a hybrid neural network and demonstrate its results both for simple cases of determining the authorship of the code and for those complicated by obfuscation and using of coding standards. The results show that the author's technique successfully solves the essential problems of analogs and can be effective even in cases where there are no obvious signs indicating authorship. The average accuracy obtained for all programming languages was 95% in the simple case and exceeded 80% in the complicated ones.
引用
收藏
页码:1 / 17
页数:17
相关论文
共 50 条
  • [1] Code authorship identification using convolutional neural networks
    Abuhamad, Mohammed
    Rhim, Ji-su
    AbuHmed, Tamer
    Ullah, Sana
    Kang, Sanggil
    Nyang, DaeHun
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 95 : 104 - 115
  • [2] Complex Cases of Source Code Authorship Identification Using a Hybrid Deep Neural Network
    Kurtukova, Anna
    Romanov, Aleksandr
    Shelupanov, Alexander
    Fedotova, Anastasia
    [J]. FUTURE INTERNET, 2022, 14 (10):
  • [3] Trademark Design Code Identification Using Deep Neural Networks
    Showkatramani, Girish J.
    Khatri, Nidhi
    Landicho, Arlene
    Layog, Darwin
    [J]. 2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 61 - 65
  • [4] Source code defect detection using deep convolutional neural networks
    Wang, Xiaomeng
    Guan, Zhibin
    Xin, Wei
    Wang, Jiajie
    [J]. Qinghua Daxue Xuebao/Journal of Tsinghua University, 2021, 61 (11): : 1267 - 1272
  • [5] Authorship Identification using Recurrent Neural Networks
    Gupta, Shriya T. P.
    Sahoo, Jajati Keshari
    Roul, Rajendra Kumar
    [J]. PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON INFORMATION SYSTEM AND DATA MINING (ICISDM 2019), 2019, : 133 - 137
  • [6] A probabilistic approach to source code authorship identification
    Kothari, Jay
    Shevertalov, Maxim
    Stehle, Edward
    Mancoridis, Spiros
    [J]. INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY, PROCEEDINGS, 2007, : 243 - +
  • [7] Source Code Classification Using Neural Networks
    Gilda, Shlok
    [J]. PROCEEDINGS OF 2017 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE), 2017,
  • [8] Implementing Deep Convolutional Neural Networks for QR Code-Based Printed Source Identification
    Tsai, Min-Jen
    Lee, Ya-Chu
    Chen, Te-Ming
    [J]. ALGORITHMS, 2023, 16 (03)
  • [9] Are Deep Neural Networks the Best Choice for Modeling Source Code?
    Hellendoorn, Vincent J.
    Devanbu, Premkumar
    [J]. ESEC/FSE 2017: PROCEEDINGS OF THE 2017 11TH JOINT MEETING ON FOUNDATIONS OF SOFTWARE ENGINEERING, 2017, : 763 - 773
  • [10] Authorship Identification of a Russian-Language Text Using Support Vector Machine and Deep Neural Networks
    Romanov, Aleksandr
    Kurtukova, Anna
    Shelupanov, Alexander
    Fedotova, Anastasia
    Goncharov, Valery
    [J]. FUTURE INTERNET, 2021, 13 (01): : 1 - 16