Decompiled APK based malicious code classification

被引:13
|
作者
Mateless, Roni [1 ]
Rejabek, Daniel [2 ]
Margalit, Oded [2 ]
Moskovitch, Robert [1 ]
机构
[1] Ben Gurion Univ Negev, Beer Sheva, Israel
[2] IBM Corp, Cyber Secur Ctr Excellence, Petah Tiqwa, Israel
关键词
Android malware; Malicious code; Source code analysis; ANDROID MALWARE DETECTION;
D O I
10.1016/j.future.2020.03.052
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Due to the increasing growth in the variety of Android malware, it is important to distinguish between the unique types of each. In this paper, we introduce the use of a decompiled source code for malicious code classification. This decompiled source code provides deeper analysis opportunities and understanding of the nature of malware. Malicious code differs from text due to syntax rules of compilers and the effort of attackers to evade potential detection. Hence, we adapt Natural Language Processing-based techniques under some constraints for malicious code classification. First, the proposed methodology decompiles the Android Package Kit files, then API calls, keywords, and non-obfuscated tokens are extracted from the source code and categorized to stop-tokens, feature-tokens, and long-tail-tokens. We also introduce the use of generalized N-tokens to represent tokens that are typically less frequent. Our approach was evaluated, in comparison to the use of API calls and permissions for features, as a baseline, and their combination, as well as in comparison to the use of neural network architectures based on decompiled Android Package Kits. A rigorous evaluation of comprehensive public real-world Android malware datasets, including 24,553 apps that were categorized to 71 families for the malicious families classification, and 60,000 apps for malicious code detection was performed. Our approach outperformed the baselines in both tasks. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页码:135 / 147
页数:13
相关论文
共 50 条
  • [1] Malicious Code Classification Method Based on Deep Forest
    Lu, Xi-Dong
    Duan, Zhe-Min
    Qian, Ye-Kui
    Zhou, Wei
    [J]. Ruan Jian Xue Bao/Journal of Software, 2020, 31 (05): : 1454 - 1464
  • [2] E-APK: Energy pattern detection in decompiled android applications
    Gregorio, Nelson
    Bispo, Joao
    Fernandes, Joao Paulo
    de Medeiros, Sergio Queiroz
    [J]. JOURNAL OF COMPUTER LANGUAGES, 2023, 76
  • [3] Lightweight Malicious Code Classification Method Based on Improved SqueezeNet
    Li, Li
    Kong, Youran
    Zhang, Qing
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 78 (01): : 551 - 567
  • [4] Visualization Feature and CNN Based Homology Classification of Malicious Code
    CHU Qianfeng
    LIU Gongshen
    ZHU Xinyu
    [J]. Chinese Journal of Electronics, 2020, 29 (01) : 154 - 160
  • [5] Visualization Feature and CNN Based Homology Classification of Malicious Code
    Chu, Qianfeng
    Liu, Gongshen
    Zhu, Xinyu
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2020, 29 (01) : 154 - 160
  • [6] Malicious Code Family Classification Method Based on Vision Transformer
    Chen, Shi
    Liu, Ying
    Hu, Wei
    Liu, Jianyi
    Gao, Yating
    Lin, Bingjie
    [J]. 2022 IEEE 10TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATION AND NETWORKS (ICICN 2022), 2022, : 704 - 709
  • [7] Malicious code classification based on opcode sequences and textCNN network
    Wang, Qianhui
    Qian, Quan
    [J]. JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2022, 67
  • [8] Classification and Analysis of Malicious Code Detection Techniques Based on the APT Attack
    Lee, Kyungroul
    Lee, Jaehyuk
    Yim, Kangbin
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (05):
  • [9] Improved capsule networks based on Nash equilibrium for malicious code classification
    Wang, Meng
    Zhang, Yahao
    Wen, Weiping
    [J]. COMPUTERS & SECURITY, 2024, 136
  • [10] CNN- and GAN-based classification of malicious code families: A code visualization approach
    Wang, Ziyue
    Wang, Weizheng
    Yang, Yaoqi
    Han, Zhaoyang
    Xu, Dequan
    Su, Chunhua
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (12) : 12472 - 12489