PermPress: Machine Learning-Based Pipeline to Evaluate Permissions in App Privacy Policies

被引:3
|
作者
Rahman, Muhammad Sajidur [1 ]
Naghavi, Pirouz [1 ]
Kojusner, Blas [1 ]
Afroz, Sadia [2 ]
Williams, Byron [1 ]
Rampazzi, Sara [1 ]
Bindschaedler, Vincent [1 ]
机构
[1] Univ Florida, Dept Comp & Informat Sci & Engn, Gainesville, FL 32603 USA
[2] Avast Software, Emeryville, CA 94608 USA
来源
IEEE ACCESS | 2022年 / 10卷
关键词
Privacy; Data privacy; Smart phones; Machine learning; Data collection; Predictive models; Mobile applications; Androids; Computer applications; Annotations; Privacy policy; android apps; data privacy; NLP; machine learning; annotated dataset; PERSONAL INFORMATION; RISK-ASSESSMENT; ANDROID APPS;
D O I
10.1109/ACCESS.2022.3199882
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Privacy laws and app stores (e.g., Google Play Store) require mobile apps to have transparent privacy policies to disclose sensitive actions and data collection, such as accessing the phonebook, camera, storage, GPS, and microphone. However, many mobile apps do not accurately disclose their sensitive data access that requires sensitive ('dangerous') permissions. Thus, analyzing discrepancies between apps' permissions and privacy policies facilitates the identification of compliance issues upon which privacy regulators and marketplace operators can act. In this paper, we propose PermPress - an automated machine-learning system to evaluate an Android app's permission-completeness, i.e., whether its privacy policy matches its dangerous permissions. PermPress combines machine learning techniques with human annotation of privacy policies to establish whether app policies contain permission-relevant information. PermPress leverages MPP-270, an annotated policy corpus, for establishing a gold standard dataset of permission completeness. This corpus shows that only 31% of apps disclose all dangerous permissions in privacy policies. By leveraging the annotated dataset and machine learning techniques, PermPress achieves an AUC score of 0.92 in predicting the permission-completeness of apps. A large-scale evaluation of 164,156 Android apps shows that, on average, 7% of apps do not disclose more than half of their declared dangerous permissions in privacy policies, whereas 60% of apps omit to disclose at least one dangerous permission-related data collection in privacy policies. Our investigation uncovers the non-transparent state of app privacy policies and highlights the need to standardize app privacy policies' compliance and completeness checking process.
引用
收藏
页码:89248 / 89269
页数:22
相关论文
共 50 条
  • [1] MACHINE LEARNING-BASED SEVERITY ASSESSMENT OF PIPELINE DENTS
    Tang, Huang
    Sun, Jialin
    Di Blasi, Martin
    [J]. PROCEEDINGS OF 2022 14TH INTERNATIONAL PIPELINE CONFERENCE, IPC2022, VOL 1, 2022,
  • [2] Machine Learning-Based Pipeline for High Accuracy Bioparticle Sizing
    Luo, Shaobo
    Zhang, Yi
    Nguyen, Kim Truc
    Feng, Shilun
    Shi, Yuzhi
    Liu, Yang
    Hutchinson, Paul
    Chierchia, Giovanni
    Talbot, Hugues
    Bourouina, Tarik
    Jiang, Xudong
    Liu, Ai Qun
    [J]. MICROMACHINES, 2020, 11 (12) : 1 - 12
  • [3] Machine Learning-Based Risk Model for Pipeline Integrity Management
    Zhang, Xiaoyue
    Tao, Chengcheng
    Huang, Ying
    [J]. COMPUTING IN CIVIL ENGINEERING 2023-RESILIENCE, SAFETY, AND SUSTAINABILITY, 2024, : 689 - 696
  • [4] Machine Learning-based Online Social Network Privacy Preservation
    Gao, Tianchong
    Li, Feng
    [J]. ASIA CCS'22: PROCEEDINGS OF THE 2022 ACM ASIA CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2022, : 467 - 478
  • [5] Security, Trust, and Privacy in Machine Learning-Based Internet of Things
    Meng, Weizhi
    Li, Wenjuan
    Han, Jinguang
    Su, Chunhua
    [J]. SECURITY AND COMMUNICATION NETWORKS, 2022, 2022
  • [6] Security, Trust, and Privacy in Machine Learning-Based Internet of Things
    Meng, Weizhi
    Li, Wenjuan
    Han, Jinguang
    Su, Chunhua
    [J]. SECURITY AND COMMUNICATION NETWORKS, 2022, 2022
  • [7] A Machine Learning-Based Pipeline for the Extraction of Insights from Customer Reviews
    Lakatos, Robert
    Bogacsovics, Gergo
    Harangi, Balazs
    Lakatos, Istvan
    Tiba, Attila
    Toth, Janos
    Szabo, Marianna
    Hajdu, Andras
    [J]. BIG DATA AND COGNITIVE COMPUTING, 2024, 8 (03)
  • [8] FASDetect as a machine learning-based screening app for FASD in youth with ADHD
    Ehrig, Lukas
    Wagner, Ann-Christin
    Wolter, Heike
    Correll, Christoph U. U.
    Geisel, Olga
    Konigorski, Stefan
    [J]. NPJ DIGITAL MEDICINE, 2023, 6 (01)
  • [9] FASDetect as a machine learning-based screening app for FASD in youth with ADHD
    Lukas Ehrig
    Ann-Christin Wagner
    Heike Wolter
    Christoph U. Correll
    Olga Geisel
    Stefan Konigorski
    [J]. npj Digital Medicine, 6
  • [10] A machine learning-based pipeline and web server ImmuneMirror for neoantigen prediction
    Dai, Wei
    Chuwdhury, Gulam Sarwar
    Guo, Yunshan
    Liu, Zhonghua
    [J]. CANCER RESEARCH, 2023, 83 (07)