Android Authorship Attribution Using Source Code-Based Features

被引:0
|
作者
Aydogan, Emre [1 ]
Sen, Sevil [1 ]
机构
[1] Hacettepe Univ, Dept Comp Engn, Wireless Networks & Intelligent Secure Syst WISE L, TR-06800 Ankara, Turkiye
关键词
Android; authorship attribution; mobile malware; metadata; obfuscation; source code-based; BINARY CODE; ROBUST;
D O I
10.1109/ACCESS.2024.3351945
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the widespread use of mobile devices, Android has become the most popular operating system, and new applications being uploaded to the Android market every day. However, due to the ease of modifying and repackaging Android binaries, Android applications can easily be modified and imitated by other developers and released in third-party Android markets. Therefore, determining the original developers of Android applications is a challenging problem known as authorship attribution. This study explores the distinctive features of Android applications to identify their authors. Software developers generally leave a footprint that reflects their writing styles in their applications. Therefore, this footprint, which can be extracted from either the source code or the binary code, can help identify the authors of software applications. Since obtaining the source code of applications in the wild can be impractical, especially when dealing with malware, researchers prefer to focus on the binaries of applications. Therefore, this study proposes an approach that identifies Android developers by deriving a wide range of features from different parts of Android applications, such as smali files, libraries, manifest files, and metadata information. Moreover, other features such as configuration, dex code, resource-based, and string-related features are inherited from other studies in Android authorship attribution and fused with the proposed feature set. The proposed approach was evaluated on benign and malware datasets and compared with those of other studies. The results show that the proposed features increase the accuracy by showing 82.5% and 95.6% in the market and malware datasets, respectively. The results demonstrate the positive impact of the proposed features on Android authorship attribution.
引用
收藏
页码:6569 / 6589
页数:21
相关论文
共 50 条
  • [1] On Improving Authorship Attribution of Source Code
    Tennyson, Matthew F.
    [J]. DIGITAL FORENSICS AND CYBER CRIME, ICDF2C 2012, 2013, 114 : 58 - 65
  • [2] Misleading Authorship Attribution of Source Code using Adversarial Learning
    Quiring, Erwin
    Maier, Alwin
    Rieck, Konrad
    [J]. PROCEEDINGS OF THE 28TH USENIX SECURITY SYMPOSIUM, 2019, : 479 - 496
  • [3] Source code authorship attribution using n-grams
    Burrows, Steven
    Tahaghoghi, S.M.M.
    [J]. ADCS 2007 - Proceedings of the Twelfth Australasian Document Computing Symposium, 2007, : 32 - 39
  • [4] Comparing techniques for authorship attribution of source code
    Burrows, Steven
    Uitdenbogerd, Alexandra L.
    Turpin, Andrew
    [J]. SOFTWARE-PRACTICE & EXPERIENCE, 2014, 44 (01): : 1 - 32
  • [5] Code Authorship Attribution using content-based and non-content-based features
    Bayrami, Parinaz
    Rice, Jacqueline E.
    [J]. 2021 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2021,
  • [6] Analysis of Source Code Authorship Attribution Problem
    Bogdanova, Alina
    Farina, Mirko
    Kholmatova, Zamira
    Kruglov, Artem
    Romanov, Vitaly
    Succi, Giancarlo
    [J]. 2022 INTERNATIONAL CONFERENCE ON COMPUTERS AND ARTIFICIAL INTELLIGENCE TECHNOLOGIES, CAIT, 2022, : 109 - 115
  • [7] Machine Learning Approaches for Authorship Attribution using Source Code Stylometry
    Frankel, Sophia F.
    Ghosh, Krishnendu
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 3298 - 3304
  • [8] Source Code Authorship Attribution Using Long Short-Term Memory Based Networks
    Alsulami, Bander
    Dauber, Edwin
    Harang, Richard
    Mancoridis, Spiros
    Greenstadt, Rachel
    [J]. COMPUTER SECURITY - ESORICS 2017, PT I, 2018, 10492 : 65 - 82
  • [9] Language and Obfuscation Oblivious Source Code Authorship Attribution
    Zafar, Sarim
    Sarwar, Muhammad Usman
    Salem, Saeed
    Malik, Muhammad Zubair
    [J]. IEEE ACCESS, 2020, 8 (08): : 197581 - 197596
  • [10] Towards Improving Multiple Authorship Attribution of Source Code
    Hao, Pengnan
    Li, Zhen
    Liu, Cui
    Wen, Yu
    Liu, Fanming
    [J]. 2022 IEEE 22ND INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY, QRS, 2022, : 516 - 526