Dark Hazard: Learning-based, Large-scale Discovery of Hidden Sensitive Operations in Android Apps

被引:28
|
作者
Pan, Xiaorui [1 ]
Wang, Xueqiang [1 ]
Duan, Yue [2 ]
Wang, XiaoFeng [1 ]
Yin, Heng [2 ]
机构
[1] Indiana Univ, Bloomington, IN 47405 USA
[2] Univ Calif Riverside, Riverside, CA 92521 USA
基金
美国国家科学基金会;
关键词
D O I
10.14722/ndss.2017.23265
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Hidden sensitive operations (HSO) such as stealing privacy user data upon receiving an SMS message are increasingly utilized by mobile malware and other potentially-harmful apps (PHAs) to evade detection. Identification of such behaviors is hard, due to the challenge in triggering them during an app's runtime. Current static approaches rely on the trigger conditions or hidden behaviors known beforehand and therefore cannot capture previously unknown HSO activities. Also these techniques tend to be computationally intensive and therefore less suitable for analyzing a large number of apps. As a result, our understanding of real-world HSO today is still limited, not to mention effective means to mitigate this threat. In this paper, we present HSOMINER, an innovative machinelearning based program analysis technique that enables a large-scale discovery of unknown HSO activities. Our approach leverages a set of program features that characterize an HSO branch' and can be relatively easy to extract from an app. These features summarize a set of unique observations about an HSO condition, its paths and the relations between them, and are designed to be general for finding hidden suspicious behaviors. Particularly, we found that a trigger condition is less likely to relate to the path of its branch through data flows or shared resources, compared with a legitimate branch. Also, the behaviors exhibited by the two paths of an HSO branch tend to be conspicuously different (innocent on one side and sinister on the other). Most importantly, even though these individual features are not sufficiently accurate for capturing HSO on their own, collectively they are shown to be highly effective in identifying such behaviors. This differentiating power is harnessed by HSOMINER to classify Android apps, which achieves a high precision (>98%) and coverage (>94%), and is also efficient as discovered in our experiments. The new tool was further used in a measurement study involving 338,354 realworld apps, the largest one ever conducted on suspicious hidden operations. Our research brought to light the pervasiveness of HSO activities, which are present in 18.7% of the apps we analyzed, surprising trigger conditions (e.g., click on a certain region of a view) and behaviors (e.g., hiding operations in a dynamically generated receiver), which help better understand the problem and contribute to more effective defense against this new threat to the mobile platform.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] A Large-Scale Empirical Study on the Effects of Code Obfuscations on Android Apps and Anti-Malware Products
    Hammad, Mahmoud
    Garcia, Joshua
    Malek, Sam
    PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2018, : 421 - 431
  • [42] Large-Scale Pretraining Improves Sample Efficiency of Active Learning-Based Virtual Screening
    Cao, Zhonglin
    Sciabola, Simone
    Wang, Ye
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2024, 64 (06) : 1882 - 1891
  • [43] Learning-Based Pareto Optimal Control of Large-Scale Systems With Unknown Slow Dynamics
    Hesarkuchak, Saeed Tajik
    Boker, Almuatazbellah
    Reddy, Vasanth
    Mili, Lamine
    Eldardiry, Hoda
    IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 838 - 843
  • [44] Enhanced distributed learning-based coordination of multiple approximate MPC for large-scale systems
    Ren, Rui
    Li, Shaoyuan
    CHEMICAL ENGINEERING RESEARCH & DESIGN, 2025, 214 : 114 - 124
  • [45] On the machine learning-based smart beamforming for wireless virtualization with large-scale MIMO system
    Sapavath, Naveen Naik
    Safavat, Sunitha
    Rawat, Danda B.
    TRANSACTIONS ON EMERGING TELECOMMUNICATIONS TECHNOLOGIES, 2019, 30 (09):
  • [46] Deep Learning-Based Symbol-Level Precoding for Large-Scale Antenna System
    Xie, Changxu
    Du, Huiqin
    Liu, Xialing
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2021, 2021
  • [47] A large-scale study on the adoption of anti-debugging and anti-tampering protections in android apps
    Berlato, Stefano
    Ceccato, Mariano
    JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2020, 52
  • [48] Toward Large-Scale Vulnerability Discovery using Machine Learning
    Grieco, Gustavo
    Grinblat, Guillermo Luis
    Uzal, Lucas
    Rawat, Sanjay
    Feist, Josselin
    Mounier, Laurent
    CODASPY'16: PROCEEDINGS OF THE SIXTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY, 2016, : 85 - 96
  • [49] Cost-sensitive Learning for Large-scale Hierarchical Classification
    Chen, Jianfu
    Warren, David
    PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 1351 - 1360
  • [50] CAS Landslide Dataset: A Large-Scale and Multisensor Dataset for Deep Learning-Based Landslide Detection
    Xu, Yulin
    Ouyang, Chaojun
    Xu, Qingsong
    Wang, Dongpo
    Zhao, Bo
    Luo, Yutao
    SCIENTIFIC DATA, 2024, 11 (01)