Misleading Authorship Attribution of Source Code using Adversarial Learning

被引:0
|
作者
Quiring, Erwin [1 ]
Maier, Alwin [1 ]
Rieck, Konrad [1 ]
机构
[1] Tech Univ Carolo Wilhelmina Braunschweig, Braunschweig, Germany
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present a novel attack against authorship attribution of source code. We exploit that recent attribution methods rest on machine learning and thus can be deceived by adversarial examples of source code. Our attack performs a series of semantics-preserving code transformations that mislead learning-based attribution but appear plausible to a developer. The attack is guided by Monte-Carlo tree search that enables us to operate in the discrete domain of source code. In an empirical evaluation with source code from 204 programmers, we demonstrate that our attack has a substantial effect on two recent attribution methods, whose accuracy drops from over 88% to 1% under attack. Furthermore, we show that our attack can imitate the coding style of developers with high accuracy and thereby induce false attributions. We conclude that current approaches for authorship attribution are inappropriate for practical application and there is a need for resilient analysis techniques.
引用
收藏
页码:479 / 496
页数:18
相关论文
共 50 条
  • [11] A Bayesian Ensemble Classifier for Source Code Authorship Attribution
    Tennyson, Matthew F.
    Mitropoulos, Francisco J.
    [J]. SIMILARITY SEARCH AND APPLICATIONS, 2014, 8821 : 265 - 276
  • [12] Deep Metric Learning for Code Authorship Attribution and Verification
    White, Riley
    Sprague, Nathan
    [J]. 20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 1089 - 1093
  • [13] Application of Information Retrieval Techniques for Source Code Authorship Attribution
    Burrows, Steven
    Uitdenbogerd, Alexandra L.
    Turpin, Andrew
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2009, 5463 : 699 - 713
  • [14] The effect of time drift in source code authorship attribution: Time drifting in source code - Stylochronometry
    Petrik, Juraj
    Chuda, Daniela
    [J]. ACM International Conference Proceeding Series, 2021, : 87 - 92
  • [15] Choosing a Profile Length in the SCAP Method of Source Code Authorship Attribution
    Tennyson, Matthew F.
    Mitropoulos, Francisco J.
    [J]. IEEE SOUTHEASTCON 2014, 2014,
  • [16] Source Code Authorship Attribution Using Long Short-Term Memory Based Networks
    Alsulami, Bander
    Dauber, Edwin
    Harang, Richard
    Mancoridis, Spiros
    Greenstadt, Rachel
    [J]. COMPUTER SECURITY - ESORICS 2017, PT I, 2018, 10492 : 65 - 82
  • [17] SHIELD: Thwarting Code Authorship Attribution
    The Department of Computer Science, Loyola University, Chicago, United States
    不详
    不详
    [J]. arXiv,
  • [18] An Identification of Source and Attribution of Authorship
    McKeown, Simon
    [J]. EUROPEAN JOURNAL OF SCANDINAVIAN STUDIES, 2021, 51 (02) : 319 - 334
  • [19] Code Authorship Attribution: Methods and Challenges
    Kalgutkar, Vaibhavi
    Kaur, Ratinder
    Gonzalez, Hugo
    Stakhanova, Natalia
    Matyukhina, Alina
    [J]. ACM COMPUTING SURVEYS, 2019, 52 (01)
  • [20] Authorship attribution of source code by using back propagation neural network based on particle swarm optimization
    Yang, Xinyu
    Xu, Guoai
    Li, Qi
    Guo, Yanhui
    Zhang, Miao
    [J]. PLOS ONE, 2017, 12 (11):