Towards a fair comparison and realistic evaluation framework of android malware detectors based on static analysis and machine learning

被引:15
|
作者
Molina-Coronado, Borja [1 ]
Mori, Usue [2 ]
Mendiburu, Alexander [1 ]
Miguel-Alonso, Jose [1 ]
机构
[1] Univ Basque Country UPV EHU, Dept Comp Architecture & Technol, Ps Manuel Lardizabal 1, Donostia San Sebastian 20018, Gipuzkoa, Spain
[2] Univ Basque Country UPV EHU, Dept Comp Sci & Artificial Intelligence, Ps Manuel Lardizabal 1, Donostia San Sebastian 20018, Gipuzkoa, Spain
关键词
Android malware detection; Machine learning; Mobile security; Experimental analysis; Static analysis; OBFUSCATION; DISCOVERY; KNOWLEDGE; MODEL;
D O I
10.1016/j.cose.2022.102996
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As in other cybersecurity areas, machine learning (ML) techniques have emerged as a promising solution to detect Android malware. In this sense, many proposals employing a variety of algorithms and feature sets have been presented to date, often reporting impresive detection performances. However, the lack of reproducibility and the absence of a standard evaluation framework make these proposals difficult to compare. In this paper, we perform an analysis of 10 influential research works on Android malware detection using a common evaluation framework. We have identified five factors that, if not taken into account when creating datasets and designing detectors, significantly affect the trained ML models and their performances. In particular, we analyze the effect of (1) the presence of duplicated samples, (2) label (goodware/greyware/malware) attribution, (3) class imbalance, (4) the presence of apps that use evasion techniques and, (5) the evolution of apps. Based on this extensive experimentation, we conclude that the studied ML-based detectors have been evaluated optimistically, which justifies the good published results. Our findings also highlight that it is imperative to generate realistic experimental scenarios, taking into account the aforementioned factors, to foster the rise of better ML-based Android malware detection solutions. (c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] MPass: Bypassing Learning-based Static Malware Detectors
    Wang, Jialai
    Qu, Wenjie
    Rong, Yi
    Qiu, Han
    Li, Qi
    Li, Zongpeng
    Zhang, Chao
    2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [32] Hybrid machine learning model for malware analysis in android apps
    Bashir, Saba
    Maqbool, Farwa
    Khan, Farhan Hassan
    Abid, Asif Sohail
    PERVASIVE AND MOBILE COMPUTING, 2024, 97
  • [33] Analysis and Classification of Android Malware using Machine Learning Algorithms
    Tarar, Neha
    Sharma, Shweta
    Krishna, C. Rama
    PROCEEDINGS OF THE 2018 3RD INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT 2018), 2018, : 738 - 743
  • [34] Android malware analysis using multiple machine learning algorithms
    Sahani, Rahul Kumar
    Anand, Madhusudan
    Tagore, Arhit Bose
    Mehrotra, Shreyash
    Tabassum, Ruksana
    Raja, S. P.
    INTERNATIONAL JOURNAL OF ELECTRONIC SECURITY AND DIGITAL FORENSICS, 2024, 16 (06) : 752 - 774
  • [35] Study on Android Hybrid Malware Detection Based on Machine Learning
    Kuo, Wen-Chung
    Liu, Tsung-Ping
    Wang, Chun-Cheng
    2019 IEEE 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION SYSTEMS (ICCCS 2019), 2019, : 31 - 35
  • [36] A Review of Android Malware Detection Approaches Based on Machine Learning
    Liu, Kaijun
    Xu, Shengwei
    Xu, Guoai
    Zhang, Miao
    Sun, Dawei
    Liu, Haifeng
    IEEE ACCESS, 2020, 8 (08): : 124579 - 124607
  • [37] Evaluating Machine Learning Models for Android Malware Detection - A Comparison Study
    Rana, Md. Shohel
    Gudla, Charan
    Sung, Andrew H.
    PROCEEDINGS OF 2018 VII INTERNATIONAL CONFERENCE ON NETWORK, COMMUNICATION AND COMPUTING (ICNCC 2018), 2018, : 17 - 21
  • [38] Static and Dynamic Malware Analysis Using Machine Learning
    Raghuraman, Chandni
    Suresh, Sandhya
    Shivshankar, Suraj
    Chapaneri, Radhika
    FIRST INTERNATIONAL CONFERENCE ON SUSTAINABLE TECHNOLOGIES FOR COMPUTATIONAL INTELLIGENCE, 2020, 1045 : 793 - 806
  • [39] Static and Dynamic Malware Analysis Using Machine Learning
    Ijaz, Muhammad
    Durad, Muhammad Hanif
    Ismail, Maliha
    PROCEEDINGS OF 2019 16TH INTERNATIONAL BHURBAN CONFERENCE ON APPLIED SCIENCES AND TECHNOLOGY (IBCAST), 2019, : 687 - 691
  • [40] GAResNet: A Transfer Learning based Framework for Android Malware Detection
    Shen, Rui
    Zhu, Hui-juan
    Li, Chang
    Wei, Hua-hui
    2023 IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH, ICKG, 2023, : 263 - 268