Dark Hazard: Learning-based, Large-scale Discovery of Hidden Sensitive Operations in Android Apps

被引：28

作者：

Pan, Xiaorui ^{[1
]}

Wang, Xueqiang ^{[1
]}

Duan, Yue ^{[2
]}

Wang, XiaoFeng ^{[1
]}

Yin, Heng ^{[2
]}

机构：

[1] Indiana Univ, Bloomington, IN 47405 USA

[2] Univ Calif Riverside, Riverside, CA 92521 USA

来源：

24TH ANNUAL NETWORK AND DISTRIBUTED SYSTEM SECURITY SYMPOSIUM (NDSS 2017) | 2017年

基金：

美国国家科学基金会;

关键词：

D O I：

10.14722/ndss.2017.23265

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Hidden sensitive operations (HSO) such as stealing privacy user data upon receiving an SMS message are increasingly utilized by mobile malware and other potentially-harmful apps (PHAs) to evade detection. Identification of such behaviors is hard, due to the challenge in triggering them during an app's runtime. Current static approaches rely on the trigger conditions or hidden behaviors known beforehand and therefore cannot capture previously unknown HSO activities. Also these techniques tend to be computationally intensive and therefore less suitable for analyzing a large number of apps. As a result, our understanding of real-world HSO today is still limited, not to mention effective means to mitigate this threat. In this paper, we present HSOMINER, an innovative machinelearning based program analysis technique that enables a large-scale discovery of unknown HSO activities. Our approach leverages a set of program features that characterize an HSO branch' and can be relatively easy to extract from an app. These features summarize a set of unique observations about an HSO condition, its paths and the relations between them, and are designed to be general for finding hidden suspicious behaviors. Particularly, we found that a trigger condition is less likely to relate to the path of its branch through data flows or shared resources, compared with a legitimate branch. Also, the behaviors exhibited by the two paths of an HSO branch tend to be conspicuously different (innocent on one side and sinister on the other). Most importantly, even though these individual features are not sufficiently accurate for capturing HSO on their own, collectively they are shown to be highly effective in identifying such behaviors. This differentiating power is harnessed by HSOMINER to classify Android apps, which achieves a high precision (>98%) and coverage (>94%), and is also efficient as discovered in our experiments. The new tool was further used in a measurement study involving 338,354 realworld apps, the largest one ever conducted on suspicious hidden operations. Our research brought to light the pervasiveness of HSO activities, which are present in 18.7% of the apps we analyzed, surprising trigger conditions (e.g., click on a certain region of a view) and behaviors (e.g., hiding operations in a dynamically generated receiver), which help better understand the problem and contribute to more effective defense against this new threat to the mobile platform.

引用

页数：15

共 50 条

[41] A Large-Scale Empirical Study on the Effects of Code Obfuscations on Android Apps and Anti-Malware Products
Hammad, Mahmoud
Garcia, Joshua
Malek, Sam
PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2018, : 421 - 431
[42] Large-Scale Pretraining Improves Sample Efficiency of Active Learning-Based Virtual Screening
Cao, Zhonglin
Sciabola, Simone
Wang, Ye
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2024, 64 (06) : 1882 - 1891
[43] Learning-Based Pareto Optimal Control of Large-Scale Systems With Unknown Slow Dynamics
Hesarkuchak, Saeed Tajik
Boker, Almuatazbellah
Reddy, Vasanth
Mili, Lamine
Eldardiry, Hoda
IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 838 - 843
[44] Enhanced distributed learning-based coordination of multiple approximate MPC for large-scale systems
Ren, Rui
Li, Shaoyuan
CHEMICAL ENGINEERING RESEARCH & DESIGN, 2025, 214 : 114 - 124
[45] On the machine learning-based smart beamforming for wireless virtualization with large-scale MIMO system
Sapavath, Naveen Naik
Safavat, Sunitha
Rawat, Danda B.
TRANSACTIONS ON EMERGING TELECOMMUNICATIONS TECHNOLOGIES, 2019, 30 (09):
[46] Deep Learning-Based Symbol-Level Precoding for Large-Scale Antenna System
Xie, Changxu
Du, Huiqin
Liu, Xialing
WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2021, 2021
[47] A large-scale study on the adoption of anti-debugging and anti-tampering protections in android apps
Berlato, Stefano
Ceccato, Mariano
JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2020, 52
[48] Toward Large-Scale Vulnerability Discovery using Machine Learning
Grieco, Gustavo
Grinblat, Guillermo Luis
Uzal, Lucas
Rawat, Sanjay
Feist, Josselin
Mounier, Laurent
CODASPY'16: PROCEEDINGS OF THE SIXTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY, 2016, : 85 - 96
[49] Cost-sensitive Learning for Large-scale Hierarchical Classification
Chen, Jianfu
Warren, David
PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 1351 - 1360
[50] CAS Landslide Dataset: A Large-Scale and Multisensor Dataset for Deep Learning-Based Landslide Detection
Xu, Yulin
Ouyang, Chaojun
Xu, Qingsong
Wang, Dongpo
Zhao, Bo
Luo, Yutao
SCIENTIFIC DATA, 2024, 11 (01)

← 1 2 3 4 5 →