Predicting the Impact of Android Malicious Samples via Machine Learning

被引：12

作者：

Qiu, Junyang ^{[1
]}

Luo, Wei ^{[1
]}

Pan, Lei ^{[1
]}

Tai, Yonghang ^{[2
]}

Zhang, Jun ^{[3
]}

Xiang, Yang ^{[3
]}

机构：

[1] Deakin Univ, Sch Informat Technol, Geelong, Vic 3216, Australia

[2] Yunnan Normal Univ, Sch Phys & Elect Informat, Kunming 650500, Yunnan, Peoples R China

[3] Swinburne Univ Technol, Sch Software & Elect Engn, Melbourne, Vic 3122, Australia

来源：

IEEE ACCESS | 2019年 / 7卷

关键词：

Android malware; deep neural network; high impact malicious samples; low impact malicious samples; static analysis; SVM; NEURAL-NETWORKS;

D O I：

10.1109/ACCESS.2019.2914311

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, Android malicious samples threaten billions of mobile end users' security or privacy. The community researchers have designed many methods to automatically and accurately identify Android malware samples. However, the rapid increase of Android malicious samples outpowers the capabilities of traditional Android malware detectors and classifiers with respect to the cyber security risk management needs. It is important to identify the small proportion of Android malicious samples that may produce high cyber-security or privacy impact. In this paper, we propose a light-weight solution to automatically identify the Android malicious samples with high security and privacy impact. We manually check a number of Android malware families and corresponding security incidents and define two impact metrics for Android malicious samples. Our investigation results in a new Android malware dataset with impact ground truth (low impact or high impact). This new dataset is employed to empirically investigate the intrinsic characteristics of low-impact as well as high-impact malicious samples. To characterize and capture Android malicious samples' pattern, reverse engineering is performed to extract semantic features to represent malicious samples. The leveraged features are parsed from both the AndroidManifest.xml files as well as the disassembled binary classes.dex codes. Then, the extracted features are embedded into numerical vectors. Furthermore, we train highly accurate support vector machine and deep neural network classifiers to categorize the candidate Android malicious samples into low impact or high impact. The empirical results validate the effectiveness of our designed light-weight solution. This method can be further utilized for identifying those high-impact Android malicious samples in the wild.

引用

页码：66304 / 66316

页数：13

共 50 条

[31] Machine Learning Algorithm to Detect Malicious Codes
Khan, Simon
Majumder, Uttam
CYBER SENSING 2017, 2017, 10185
[32] Malicious web content detection by machine learning
Hou, Yung-Tsung
Chang, Yimeng
Chen, Tsuhan
Laih, Chi-Sung
Chen, Chia-Mei
EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (01) : 55 - 60
[33] Malicious URL Detection based on Machine Learning
Cho Do Xuan
Hoa Dinh Nguyen
Nikolaevich, Tisenko Victor
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (01) : 148 - 153
[34] Malicious URL Detection Using Machine Learning
Hani, Dr Raed Bani
Amoura, Motasem
Ammourah, Mohammad
Abu Khalil, Yazeed
2024 15TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS, ICICS 2024, 2024,
[35] On the Impact of Sample Duplication in Machine-Learning-Based Android Malware Detection
Zhao, Yanjie
Li, Li
Wang, Haoyu
Cai, Haipeng
Bissyande, Tegawende F.
Klein, Jacques
Grundy, John
ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2021, 30 (03)
[36] Predicting parking occupancy via machine learning in the web of things
Provoost, Jesper C.
Kamilaris, Andreas
Wismans, Luc J. J.
Van der Drift, SanderJ.
Van Keulen, Maurice
INTERNET OF THINGS, 2020, 12
[37] Predicting Adverse Childhood Experiences via Machine Learning Ensembles
Rao, Akash K.
Trivedi, Gunjan Y.
Bajpai, Anshika
Chouhan, Gajraj Singh
Trivedi, Riri G.
Kumar, Anita
Soundappan, Kathirvel
Dutt, Varun
Ramani, Hemalatha
PROCEEDINGS OF THE 16TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS, PETRA 2023, 2023, : 773 - 779
[38] Predicting parallel application performance via machine learning approaches
Singh, Karan
Ipek, Engin
McKee, Sally A.
de Supinski, Bronis R.
Schuiz, Martin
Caruana, Rich
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2007, 19 (17): : 2219 - 2235
[39] Predicting special forces dropout via explainable machine learning
Huijzer, Rik
de Jonge, Peter
Blaauw, Frank J.
de Jong, Maurits Baatenburg
de Wit, Age
Den Hartigh, Ruud J. R.
EUROPEAN JOURNAL OF SPORT SCIENCE, 2024, 24 (11) : 1564 - 1572
[40] Predicting a water infrastructure leakage index via machine learning
Kiziloz, Burak
Sisman, Eyup
Oruc, Halil Nurullah
UTILITIES POLICY, 2022, 75

← 1 2 3 4 5 →