Predicting the Impact of Android Malicious Samples via Machine Learning

被引:12
|
作者
Qiu, Junyang [1 ]
Luo, Wei [1 ]
Pan, Lei [1 ]
Tai, Yonghang [2 ]
Zhang, Jun [3 ]
Xiang, Yang [3 ]
机构
[1] Deakin Univ, Sch Informat Technol, Geelong, Vic 3216, Australia
[2] Yunnan Normal Univ, Sch Phys & Elect Informat, Kunming 650500, Yunnan, Peoples R China
[3] Swinburne Univ Technol, Sch Software & Elect Engn, Melbourne, Vic 3122, Australia
关键词
Android malware; deep neural network; high impact malicious samples; low impact malicious samples; static analysis; SVM; NEURAL-NETWORKS;
D O I
10.1109/ACCESS.2019.2914311
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, Android malicious samples threaten billions of mobile end users' security or privacy. The community researchers have designed many methods to automatically and accurately identify Android malware samples. However, the rapid increase of Android malicious samples outpowers the capabilities of traditional Android malware detectors and classifiers with respect to the cyber security risk management needs. It is important to identify the small proportion of Android malicious samples that may produce high cyber-security or privacy impact. In this paper, we propose a light-weight solution to automatically identify the Android malicious samples with high security and privacy impact. We manually check a number of Android malware families and corresponding security incidents and define two impact metrics for Android malicious samples. Our investigation results in a new Android malware dataset with impact ground truth (low impact or high impact). This new dataset is employed to empirically investigate the intrinsic characteristics of low-impact as well as high-impact malicious samples. To characterize and capture Android malicious samples' pattern, reverse engineering is performed to extract semantic features to represent malicious samples. The leveraged features are parsed from both the AndroidManifest.xml files as well as the disassembled binary classes.dex codes. Then, the extracted features are embedded into numerical vectors. Furthermore, we train highly accurate support vector machine and deep neural network classifiers to categorize the candidate Android malicious samples into low impact or high impact. The empirical results validate the effectiveness of our designed light-weight solution. This method can be further utilized for identifying those high-impact Android malicious samples in the wild.
引用
收藏
页码:66304 / 66316
页数:13
相关论文
共 50 条
  • [31] Machine Learning Algorithm to Detect Malicious Codes
    Khan, Simon
    Majumder, Uttam
    CYBER SENSING 2017, 2017, 10185
  • [32] Malicious web content detection by machine learning
    Hou, Yung-Tsung
    Chang, Yimeng
    Chen, Tsuhan
    Laih, Chi-Sung
    Chen, Chia-Mei
    EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (01) : 55 - 60
  • [33] Malicious URL Detection based on Machine Learning
    Cho Do Xuan
    Hoa Dinh Nguyen
    Nikolaevich, Tisenko Victor
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (01) : 148 - 153
  • [34] Malicious URL Detection Using Machine Learning
    Hani, Dr Raed Bani
    Amoura, Motasem
    Ammourah, Mohammad
    Abu Khalil, Yazeed
    2024 15TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS, ICICS 2024, 2024,
  • [35] On the Impact of Sample Duplication in Machine-Learning-Based Android Malware Detection
    Zhao, Yanjie
    Li, Li
    Wang, Haoyu
    Cai, Haipeng
    Bissyande, Tegawende F.
    Klein, Jacques
    Grundy, John
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2021, 30 (03)
  • [36] Predicting parking occupancy via machine learning in the web of things
    Provoost, Jesper C.
    Kamilaris, Andreas
    Wismans, Luc J. J.
    Van der Drift, SanderJ.
    Van Keulen, Maurice
    INTERNET OF THINGS, 2020, 12
  • [37] Predicting Adverse Childhood Experiences via Machine Learning Ensembles
    Rao, Akash K.
    Trivedi, Gunjan Y.
    Bajpai, Anshika
    Chouhan, Gajraj Singh
    Trivedi, Riri G.
    Kumar, Anita
    Soundappan, Kathirvel
    Dutt, Varun
    Ramani, Hemalatha
    PROCEEDINGS OF THE 16TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS, PETRA 2023, 2023, : 773 - 779
  • [38] Predicting parallel application performance via machine learning approaches
    Singh, Karan
    Ipek, Engin
    McKee, Sally A.
    de Supinski, Bronis R.
    Schuiz, Martin
    Caruana, Rich
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2007, 19 (17): : 2219 - 2235
  • [39] Predicting special forces dropout via explainable machine learning
    Huijzer, Rik
    de Jonge, Peter
    Blaauw, Frank J.
    de Jong, Maurits Baatenburg
    de Wit, Age
    Den Hartigh, Ruud J. R.
    EUROPEAN JOURNAL OF SPORT SCIENCE, 2024, 24 (11) : 1564 - 1572
  • [40] Predicting a water infrastructure leakage index via machine learning
    Kiziloz, Burak
    Sisman, Eyup
    Oruc, Halil Nurullah
    UTILITIES POLICY, 2022, 75