Misleading Authorship Attribution of Source Code using Adversarial Learning

被引:0
|
作者
Quiring, Erwin [1 ]
Maier, Alwin [1 ]
Rieck, Konrad [1 ]
机构
[1] Tech Univ Carolo Wilhelmina Braunschweig, Braunschweig, Germany
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present a novel attack against authorship attribution of source code. We exploit that recent attribution methods rest on machine learning and thus can be deceived by adversarial examples of source code. Our attack performs a series of semantics-preserving code transformations that mislead learning-based attribution but appear plausible to a developer. The attack is guided by Monte-Carlo tree search that enables us to operate in the discrete domain of source code. In an empirical evaluation with source code from 204 programmers, we demonstrate that our attack has a substantial effect on two recent attribution methods, whose accuracy drops from over 88% to 1% under attack. Furthermore, we show that our attack can imitate the coding style of developers with high accuracy and thereby induce false attributions. We conclude that current approaches for authorship attribution are inappropriate for practical application and there is a need for resilient analysis techniques.
引用
收藏
页码:479 / 496
页数:18
相关论文
共 50 条
  • [1] Machine Learning Approaches for Authorship Attribution using Source Code Stylometry
    Frankel, Sophia F.
    Ghosh, Krishnendu
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 3298 - 3304
  • [2] On Improving Authorship Attribution of Source Code
    Tennyson, Matthew F.
    [J]. DIGITAL FORENSICS AND CYBER CRIME, ICDF2C 2012, 2013, 114 : 58 - 65
  • [3] Source code authorship attribution using n-grams
    Burrows, Steven
    Tahaghoghi, S.M.M.
    [J]. ADCS 2007 - Proceedings of the Twelfth Australasian Document Computing Symposium, 2007, : 32 - 39
  • [4] Comparing techniques for authorship attribution of source code
    Burrows, Steven
    Uitdenbogerd, Alexandra L.
    Turpin, Andrew
    [J]. SOFTWARE-PRACTICE & EXPERIENCE, 2014, 44 (01): : 1 - 32
  • [5] Analysis of Source Code Authorship Attribution Problem
    Bogdanova, Alina
    Farina, Mirko
    Kholmatova, Zamira
    Kruglov, Artem
    Romanov, Vitaly
    Succi, Giancarlo
    [J]. 2022 INTERNATIONAL CONFERENCE ON COMPUTERS AND ARTIFICIAL INTELLIGENCE TECHNOLOGIES, CAIT, 2022, : 109 - 115
  • [6] Adversarial Authorship Attribution in Open-Source Projects
    Matyukhina, Alina
    Stakhanova, Natalia
    Dalla Preda, Mila
    Perley, Celine
    [J]. PROCEEDINGS OF THE NINTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY (CODASPY '19), 2019, : 291 - 302
  • [7] Android Authorship Attribution Using Source Code-Based Features
    Aydogan, Emre
    Sen, Sevil
    [J]. IEEE ACCESS, 2024, 12 : 6569 - 6589
  • [8] Source Code Authorship Attribution Using Hybrid Approach of Program Dependence Graph and Deep Learning Model
    Ullah, Farhan
    Wang, Junfeng
    Jabbar, Sohail
    Al-Turjman, Fadi
    Alazab, Mamoun
    [J]. IEEE ACCESS, 2019, 7 : 141987 - 141999
  • [9] Language and Obfuscation Oblivious Source Code Authorship Attribution
    Zafar, Sarim
    Sarwar, Muhammad Usman
    Salem, Saeed
    Malik, Muhammad Zubair
    [J]. IEEE ACCESS, 2020, 8 (08): : 197581 - 197596
  • [10] Towards Improving Multiple Authorship Attribution of Source Code
    Hao, Pengnan
    Li, Zhen
    Liu, Cui
    Wen, Yu
    Liu, Fanming
    [J]. 2022 IEEE 22ND INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY, QRS, 2022, : 516 - 526