Towards Improving Multiple Authorship Attribution of Source Code

被引：1

作者：

Hao, Pengnan ^{[1
]}

Li, Zhen ^{[1
]}

Liu, Cui ^{[1
]}

Wen, Yu ^{[1
]}

Liu, Fanming ^{[1
]}

机构：

[1] Hebei Univ, Dept Cyber Secur & Comp, Baoding, Hebei, Peoples R China

来源：

2022 IEEE 22ND INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY, QRS | 2022年

关键词：

Multiple authorship attribution; Siamese network; machine learning; IDENTIFICATION;

D O I：

10.1109/QRS57517.2022.00059

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Source code authorship attribution addresses the problems of copyright infringement disputes and plagiarism detection. However, most software projects are collaborative development projects. It is necessary to study multiple authorship attribution. Existing methods are not reliable in the domain of multiple authorship attribution. The reasons are as follows: i) It is a challenge to divide the code boundaries of different authors in a sample; ii) code segments belonging to different authors in a sample are usually small or incomplete. This paper proposes a method to address these challenges. We first divide the code sample into multiple lines, then integrate the code lines with similar author styles into code segments using Siamese networks. Finally, we use a path-based code representation and machine learning to identify authors. Experimental results show the method achieves an accuracy of 87.35% on C/C++ dataset and 91.35% on Java dataset, which performs better than existing methods.

引用

页码：516 / 526

页数：11

共 50 条

[1] On Improving Authorship Attribution of Source Code
Tennyson, Matthew F.
[J]. DIGITAL FORENSICS AND CYBER CRIME, ICDF2C 2012, 2013, 114 : 58 - 65
[2] Comparing techniques for authorship attribution of source code
Burrows, Steven
Uitdenbogerd, Alexandra L.
Turpin, Andrew
[J]. SOFTWARE-PRACTICE & EXPERIENCE, 2014, 44 (01): : 1 - 32
[3] Analysis of Source Code Authorship Attribution Problem
Bogdanova, Alina
Farina, Mirko
Kholmatova, Zamira
Kruglov, Artem
Romanov, Vitaly
Succi, Giancarlo
[J]. 2022 INTERNATIONAL CONFERENCE ON COMPUTERS AND ARTIFICIAL INTELLIGENCE TECHNOLOGIES, CAIT, 2022, : 109 - 115
[4] Language and Obfuscation Oblivious Source Code Authorship Attribution
Zafar, Sarim
Sarwar, Muhammad Usman
Salem, Saeed
Malik, Muhammad Zubair
[J]. IEEE ACCESS, 2020, 8 (08): : 197581 - 197596
[5] A Bayesian Ensemble Classifier for Source Code Authorship Attribution
Tennyson, Matthew F.
Mitropoulos, Francisco J.
[J]. SIMILARITY SEARCH AND APPLICATIONS, 2014, 8821 : 265 - 276
[6] Misleading Authorship Attribution of Source Code using Adversarial Learning
Quiring, Erwin
Maier, Alwin
Rieck, Konrad
[J]. PROCEEDINGS OF THE 28TH USENIX SECURITY SYMPOSIUM, 2019, : 479 - 496
[7] Source code authorship attribution using n-grams
Burrows, Steven
Tahaghoghi, S.M.M.
[J]. ADCS 2007 - Proceedings of the Twelfth Australasian Document Computing Symposium, 2007, : 32 - 39
[8] Application of Information Retrieval Techniques for Source Code Authorship Attribution
Burrows, Steven
Uitdenbogerd, Alexandra L.
Turpin, Andrew
[J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2009, 5463 : 699 - 713
[9] The effect of time drift in source code authorship attribution: Time drifting in source code - Stylochronometry
Petrik, Juraj
Chuda, Daniela
[J]. ACM International Conference Proceeding Series, 2021, : 87 - 92
[10] Choosing a Profile Length in the SCAP Method of Source Code Authorship Attribution
Tennyson, Matthew F.
Mitropoulos, Francisco J.
[J]. IEEE SOUTHEASTCON 2014, 2014,

← 1 2 3 4 5 →