Dark Hazard: Learning-based, Large-scale Discovery of Hidden Sensitive Operations in Android Apps

被引：28

作者：

Pan, Xiaorui ^{[1
]}

Wang, Xueqiang ^{[1
]}

Duan, Yue ^{[2
]}

Wang, XiaoFeng ^{[1
]}

Yin, Heng ^{[2
]}

机构：

[1] Indiana Univ, Bloomington, IN 47405 USA

[2] Univ Calif Riverside, Riverside, CA 92521 USA

来源：

24TH ANNUAL NETWORK AND DISTRIBUTED SYSTEM SECURITY SYMPOSIUM (NDSS 2017) | 2017年

基金：

美国国家科学基金会;

关键词：

D O I：

10.14722/ndss.2017.23265

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Hidden sensitive operations (HSO) such as stealing privacy user data upon receiving an SMS message are increasingly utilized by mobile malware and other potentially-harmful apps (PHAs) to evade detection. Identification of such behaviors is hard, due to the challenge in triggering them during an app's runtime. Current static approaches rely on the trigger conditions or hidden behaviors known beforehand and therefore cannot capture previously unknown HSO activities. Also these techniques tend to be computationally intensive and therefore less suitable for analyzing a large number of apps. As a result, our understanding of real-world HSO today is still limited, not to mention effective means to mitigate this threat. In this paper, we present HSOMINER, an innovative machinelearning based program analysis technique that enables a large-scale discovery of unknown HSO activities. Our approach leverages a set of program features that characterize an HSO branch' and can be relatively easy to extract from an app. These features summarize a set of unique observations about an HSO condition, its paths and the relations between them, and are designed to be general for finding hidden suspicious behaviors. Particularly, we found that a trigger condition is less likely to relate to the path of its branch through data flows or shared resources, compared with a legitimate branch. Also, the behaviors exhibited by the two paths of an HSO branch tend to be conspicuously different (innocent on one side and sinister on the other). Most importantly, even though these individual features are not sufficiently accurate for capturing HSO on their own, collectively they are shown to be highly effective in identifying such behaviors. This differentiating power is harnessed by HSOMINER to classify Android apps, which achieves a high precision (>98%) and coverage (>94%), and is also efficient as discovered in our experiments. The new tool was further used in a measurement study involving 338,354 realworld apps, the largest one ever conducted on suspicious hidden operations. Our research brought to light the pervasiveness of HSO activities, which are present in 18.7% of the apps we analyzed, surprising trigger conditions (e.g., click on a certain region of a view) and behaviors (e.g., hiding operations in a dynamically generated receiver), which help better understand the problem and contribute to more effective defense against this new threat to the mobile platform.

引用

页数：15

共 50 条

[21] Machine learning-based dynamic analysis of Android apps with improved code coverage
Yerima, Suleiman Y.
Alzaylaee, Mohammed K.
Sezer, Sakir
EURASIP JOURNAL ON INFORMATION SECURITY, 2019, 2019 (1)
[22] Machine learning-based dynamic analysis of Android apps with improved code coverage
Suleiman Y. Yerima
Mohammed K. Alzaylaee
Sakir Sezer
EURASIP Journal on Information Security, 2019
[23] Large-scale Evaluation of Malicious Tor Hidden Service Directory Discovery
Wang, Chunmian
Ling, Zhen
Wu, Wenjia
Chen, Qi
Yang, Ming
Fu, Xinwen
IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2022), 2022, : 1709 - 1718
[24] Deep learning large-scale drug discovery and repurposing
Yu, Min
Li, Weiming
Yu, Yunru
Zhao, Yu
Xiao, Lizhi
Lauschke, Volker M.
Cheng, Yiyu
Zhang, Xingcai
Wang, Yi
NATURE COMPUTATIONAL SCIENCE, 2024, 4 (08): : 600 - 614
[25] Machine Learning-Based Online MPC for Large-Scale Charging Infrastructure Management
Mejdi, Lazher
Kardous, Faten
Grayaa, Khaled
IEEE ACCESS, 2024, 12 : 36896 - 36907
[26] A Two-Phase Learning-Based Swarm Optimizer for Large-Scale Optimization
Lan, Rushi
Zhu, Yu
Lu, Huimin
Liu, Zhenbing
Luo, Xiaonan
IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (12) : 6284 - 6293
[27] DeepCPI:A Deep Learning-based Framework for Large-scale in silico Drug Screening
Fangping Wan
Yue Zhu
Hailin Hu
Antao Dai
Xiaoqing Cai
Ligong Chen
Haipeng Gong
Tian Xia
Dehua Yang
Ming-Wei Wang
Jianyang Zeng
Genomics,Proteomics & Bioinformatics, 2019, 17 (05) : 478 - 495
[28] Large-Scale Crowdsourcing Subjective Quality Evaluation of Learning-Based Image Coding
Upenik, Evgeniy
Testolina, Michela
Ascenso, Joao
Pereira, Fernando
Ebrahimi, Touradj
2021 INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2021,
[29] Deep Learning-Based Sentimental Analysis for Large-Scale Imbalanced Twitter Data
Jamal, Nasir
Chen, Xianqiao
Aldabbas, Hamza
FUTURE INTERNET, 2019, 11 (09)
[30] DeepCPI:A Deep Learning-based Framework for Large-scale in silico Drug Screening
Fangping Wan
Yue Zhu
Hailin Hu
Antao Dai
Xiaoqing Cai
Ligong Chen
Haipeng Gong
Tian Xia
Dehua Yang
MingWei Wang
Jianyang Zeng
Genomics,Proteomics & Bioinformatics, 2019, (05) : 478 - 495

← 1 2 3 4 5 →