Impact of datasets on machine learning based methods in Android malware detection: an empirical study

被引:5
|
作者
Ge, Xiuting [1 ]
Huang, Yifan [1 ]
Hui, Zhanwei [2 ]
Wang, Xiaojuan [2 ]
Cao, Xu [2 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Jiangsu, Peoples R China
[2] Mil Acad Sci, Beijing, Peoples R China
关键词
Class imbalance; Dataset quality; Concept drift; Machine learning; Android malware detection;
D O I
10.1109/QRS54544.2021.00019
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
For Android malware detection, machine learning-based (ML-based) methods show promising performance. However, limited studies are performed to investigate the impact of factors related to datasets on ML-based methods, while the performance of ML-based methods dramatically relies on datasets. To partially bridge the gap, we conduct an empirical study to investigate the impact of factors related to datasets on ML-based Android malware detection methods. By investigating dataset differences between real-world scenarios and experimental settings, we summarize three dataset factors (i.e., class imbalance, quality, and timelines) and assess the impact of these factors on ML-based Android malware detection methods. We conduct experiments on more than 11K benign and 17K malicious applications. The results show that these three dataset factors yield significant biases in the existing ML-based Android malware detection methods. Based on these results, we learn some lessons about assessing ML-based Android malware detection methods when taking dataset factors into account.
引用
收藏
页码:81 / 92
页数:12
相关论文
共 50 条
  • [1] Empirical Study on Intelligent Android Malware Detection based on Supervised Machine Learning
    Abdullah, Talal A. A.
    Ali, Waleed
    Abdulghafor, Rawad
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (04) : 215 - 224
  • [2] Study on Android Hybrid Malware Detection Based on Machine Learning
    Kuo, Wen-Chung
    Liu, Tsung-Ping
    Wang, Chun-Cheng
    [J]. 2019 IEEE 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION SYSTEMS (ICCCS 2019), 2019, : 31 - 35
  • [3] Android Malware Detection Based on Machine Learning
    Wang, Qing-Fei
    Fang, Xiang
    [J]. 2018 4TH ANNUAL INTERNATIONAL CONFERENCE ON NETWORK AND INFORMATION SYSTEMS FOR COMPUTERS (ICNISC 2018), 2018, : 434 - 436
  • [4] Malware Detection System Based on Machine Learning Methods for Android Operating Systems
    Utku, Anil
    Dogru, Ibrahim Alper
    [J]. 2017 25TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2017,
  • [5] An Android Malware Detection System Based on Machine Learning
    Wen, Long
    Yu, Haiyang
    [J]. GREEN ENERGY AND SUSTAINABLE DEVELOPMENT I, 2017, 1864
  • [6] On the Impact of Sample Duplication in Machine-Learning-Based Android Malware Detection
    Zhao, Yanjie
    Li, Li
    Wang, Haoyu
    Cai, Haipeng
    Bissyande, Tegawende F.
    Klein, Jacques
    Grundy, John
    [J]. ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2021, 30 (03)
  • [7] Android malware detection: An in-depth investigation of the impact of the use of imbalance datasets on the efficiency of machine learning models
    Degrees, Zakaria Sawadogo
    Dembele, Jean-Marie
    Degrees, Gervais Mendy
    Ouya, Samuel
    [J]. 2023 25TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY, ICACT, 2023, : 1460 - 1467
  • [8] A Review of Android Malware Detection Approaches Based on Machine Learning
    Liu, Kaijun
    Xu, Shengwei
    Xu, Guoai
    Zhang, Miao
    Sun, Dawei
    Liu, Haifeng
    [J]. IEEE ACCESS, 2020, 8 : 124579 - 124607
  • [9] An Android Malware Detection Leveraging Machine Learning
    Shatnawi, Ahmed S.
    Jaradat, Aya
    Yaseen, Tuqa Bani
    Taqieddin, Eyad
    Al-Ayyoub, Mahmoud
    Mustafa, Dheya
    [J]. WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
  • [10] Android Malware Detection Using Machine Learning
    Droos, Ayat
    Al-Mahadeen, Awss
    Al-Harasis, Tasnim
    Al-Attar, Rama
    Ababneh, Mohammad
    [J]. 2022 13TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2022, : 36 - 41