Dark Hazard: Learning-based, Large-scale Discovery of Hidden Sensitive Operations in Android Apps

被引:28
|
作者
Pan, Xiaorui [1 ]
Wang, Xueqiang [1 ]
Duan, Yue [2 ]
Wang, XiaoFeng [1 ]
Yin, Heng [2 ]
机构
[1] Indiana Univ, Bloomington, IN 47405 USA
[2] Univ Calif Riverside, Riverside, CA 92521 USA
基金
美国国家科学基金会;
关键词
D O I
10.14722/ndss.2017.23265
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Hidden sensitive operations (HSO) such as stealing privacy user data upon receiving an SMS message are increasingly utilized by mobile malware and other potentially-harmful apps (PHAs) to evade detection. Identification of such behaviors is hard, due to the challenge in triggering them during an app's runtime. Current static approaches rely on the trigger conditions or hidden behaviors known beforehand and therefore cannot capture previously unknown HSO activities. Also these techniques tend to be computationally intensive and therefore less suitable for analyzing a large number of apps. As a result, our understanding of real-world HSO today is still limited, not to mention effective means to mitigate this threat. In this paper, we present HSOMINER, an innovative machinelearning based program analysis technique that enables a large-scale discovery of unknown HSO activities. Our approach leverages a set of program features that characterize an HSO branch' and can be relatively easy to extract from an app. These features summarize a set of unique observations about an HSO condition, its paths and the relations between them, and are designed to be general for finding hidden suspicious behaviors. Particularly, we found that a trigger condition is less likely to relate to the path of its branch through data flows or shared resources, compared with a legitimate branch. Also, the behaviors exhibited by the two paths of an HSO branch tend to be conspicuously different (innocent on one side and sinister on the other). Most importantly, even though these individual features are not sufficiently accurate for capturing HSO on their own, collectively they are shown to be highly effective in identifying such behaviors. This differentiating power is harnessed by HSOMINER to classify Android apps, which achieves a high precision (>98%) and coverage (>94%), and is also efficient as discovered in our experiments. The new tool was further used in a measurement study involving 338,354 realworld apps, the largest one ever conducted on suspicious hidden operations. Our research brought to light the pervasiveness of HSO activities, which are present in 18.7% of the apps we analyzed, surprising trigger conditions (e.g., click on a certain region of a view) and behaviors (e.g., hiding operations in a dynamically generated receiver), which help better understand the problem and contribute to more effective defense against this new threat to the mobile platform.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Machine learning-based dynamic analysis of Android apps with improved code coverage
    Yerima, Suleiman Y.
    Alzaylaee, Mohammed K.
    Sezer, Sakir
    EURASIP JOURNAL ON INFORMATION SECURITY, 2019, 2019 (1)
  • [22] Machine learning-based dynamic analysis of Android apps with improved code coverage
    Suleiman Y. Yerima
    Mohammed K. Alzaylaee
    Sakir Sezer
    EURASIP Journal on Information Security, 2019
  • [23] Large-scale Evaluation of Malicious Tor Hidden Service Directory Discovery
    Wang, Chunmian
    Ling, Zhen
    Wu, Wenjia
    Chen, Qi
    Yang, Ming
    Fu, Xinwen
    IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2022), 2022, : 1709 - 1718
  • [24] Deep learning large-scale drug discovery and repurposing
    Yu, Min
    Li, Weiming
    Yu, Yunru
    Zhao, Yu
    Xiao, Lizhi
    Lauschke, Volker M.
    Cheng, Yiyu
    Zhang, Xingcai
    Wang, Yi
    NATURE COMPUTATIONAL SCIENCE, 2024, 4 (08): : 600 - 614
  • [25] Machine Learning-Based Online MPC for Large-Scale Charging Infrastructure Management
    Mejdi, Lazher
    Kardous, Faten
    Grayaa, Khaled
    IEEE ACCESS, 2024, 12 : 36896 - 36907
  • [26] A Two-Phase Learning-Based Swarm Optimizer for Large-Scale Optimization
    Lan, Rushi
    Zhu, Yu
    Lu, Huimin
    Liu, Zhenbing
    Luo, Xiaonan
    IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (12) : 6284 - 6293
  • [27] DeepCPI:A Deep Learning-based Framework for Large-scale in silico Drug Screening
    Fangping Wan
    Yue Zhu
    Hailin Hu
    Antao Dai
    Xiaoqing Cai
    Ligong Chen
    Haipeng Gong
    Tian Xia
    Dehua Yang
    Ming-Wei Wang
    Jianyang Zeng
    Genomics,Proteomics & Bioinformatics, 2019, 17 (05) : 478 - 495
  • [28] Large-Scale Crowdsourcing Subjective Quality Evaluation of Learning-Based Image Coding
    Upenik, Evgeniy
    Testolina, Michela
    Ascenso, Joao
    Pereira, Fernando
    Ebrahimi, Touradj
    2021 INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2021,
  • [29] Deep Learning-Based Sentimental Analysis for Large-Scale Imbalanced Twitter Data
    Jamal, Nasir
    Chen, Xianqiao
    Aldabbas, Hamza
    FUTURE INTERNET, 2019, 11 (09)
  • [30] DeepCPI:A Deep Learning-based Framework for Large-scale in silico Drug Screening
    Fangping Wan
    Yue Zhu
    Hailin Hu
    Antao Dai
    Xiaoqing Cai
    Ligong Chen
    Haipeng Gong
    Tian Xia
    Dehua Yang
    MingWei Wang
    Jianyang Zeng
    Genomics,Proteomics & Bioinformatics, 2019, (05) : 478 - 495