PermPress: Machine Learning-Based Pipeline to Evaluate Permissions in App Privacy Policies

被引:3
|
作者
Rahman, Muhammad Sajidur [1 ]
Naghavi, Pirouz [1 ]
Kojusner, Blas [1 ]
Afroz, Sadia [2 ]
Williams, Byron [1 ]
Rampazzi, Sara [1 ]
Bindschaedler, Vincent [1 ]
机构
[1] Univ Florida, Dept Comp & Informat Sci & Engn, Gainesville, FL 32603 USA
[2] Avast Software, Emeryville, CA 94608 USA
关键词
Privacy; Data privacy; Smart phones; Machine learning; Data collection; Predictive models; Mobile applications; Androids; Computer applications; Annotations; Privacy policy; android apps; data privacy; NLP; machine learning; annotated dataset; PERSONAL INFORMATION; RISK-ASSESSMENT; ANDROID APPS;
D O I
10.1109/ACCESS.2022.3199882
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Privacy laws and app stores (e.g., Google Play Store) require mobile apps to have transparent privacy policies to disclose sensitive actions and data collection, such as accessing the phonebook, camera, storage, GPS, and microphone. However, many mobile apps do not accurately disclose their sensitive data access that requires sensitive ('dangerous') permissions. Thus, analyzing discrepancies between apps' permissions and privacy policies facilitates the identification of compliance issues upon which privacy regulators and marketplace operators can act. In this paper, we propose PermPress - an automated machine-learning system to evaluate an Android app's permission-completeness, i.e., whether its privacy policy matches its dangerous permissions. PermPress combines machine learning techniques with human annotation of privacy policies to establish whether app policies contain permission-relevant information. PermPress leverages MPP-270, an annotated policy corpus, for establishing a gold standard dataset of permission completeness. This corpus shows that only 31% of apps disclose all dangerous permissions in privacy policies. By leveraging the annotated dataset and machine learning techniques, PermPress achieves an AUC score of 0.92 in predicting the permission-completeness of apps. A large-scale evaluation of 164,156 Android apps shows that, on average, 7% of apps do not disclose more than half of their declared dangerous permissions in privacy policies, whereas 60% of apps omit to disclose at least one dangerous permission-related data collection in privacy policies. Our investigation uncovers the non-transparent state of app privacy policies and highlights the need to standardize app privacy policies' compliance and completeness checking process.
引用
收藏
页码:89248 / 89269
页数:22
相关论文
共 50 条
  • [41] Local learning-based feature weighting with privacy preservation
    Li, Yun
    Yang, Jun
    Ji, Wei
    [J]. NEUROCOMPUTING, 2016, 174 : 1107 - 1115
  • [42] Machine Learning-Based App for Self-Evaluation of Teacher-Specific Instructional Style and Tools
    Duzhin, Fedor
    Gustafsson, Anders
    [J]. EDUCATION SCIENCES, 2018, 8 (01)
  • [43] Learning-Based Privacy-Preserving Location Sharing
    Shen, Nan
    Chen, Xuan
    Liang, Shuang
    Yang, Jun
    Li, Tong
    Jia, Chunfu
    [J]. COMPUTATIONAL INTELLIGENCE AND INTELLIGENT SYSTEMS, (ISICA 2015), 2016, 575 : 672 - 682
  • [44] A machine learning-based credit lending eligibility prediction and suitable bank recommendation: an Android app for entrepreneurs
    Parvin, Jakia
    Chowdhury, Mahfuzulhoq
    [J]. INTERNATIONAL JOURNAL OF APPLIED MANAGEMENT SCIENCE, 2023, 15 (03) : 238 - 257
  • [45] Quality-Driven Machine Learning-based Data Science Pipeline Realization: a software engineering approach
    d'Aloisio, Giordano
    [J]. 2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS (ICSE-COMPANION 2022), 2022, : 291 - 293
  • [46] Learn2Write: Augmented Reality and Machine Learning-Based Mobile App to Learn Writing
    Opu, Md. Nahidul Islam
    Islam, Md. Rakibul
    Kabir, Muhammad Ashad
    Hossain, Md. Sabir
    Islam, Mohammad Mainul
    [J]. COMPUTERS, 2022, 11 (01)
  • [47] LidSonic for Visually Impaired: Green Machine Learning-Based Assistive Smart Glasses with Smart App and Arduino
    Busaeed, Sahar
    Mehmood, Rashid
    Katib, Iyad
    Corchado, Juan M.
    [J]. ELECTRONICS, 2022, 11 (07)
  • [48] The Explainability-Privacy-Utility Trade-Off for Machine Learning-Based Tabular Data Analysis
    Abbasi, Wisam
    Mori, Paolo
    Saracino, Andrea
    [J]. PROCEEDINGS OF THE 20TH INTERNATIONAL CONFERENCE ON SECURITY AND CRYPTOGRAPHY, SECRYPT 2023, 2023, : 511 - 519
  • [49] Research on Privacy Protection Based on Machine Learning
    Xie, Hui
    Wei, Li
    Fang, Fang
    [J]. IWCMC 2021: 2021 17TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE (IWCMC), 2021, : 1003 - 1006
  • [50] Deep learning-based privacy-preserving recommendations in federated learning
    Kolli, Chandra Sekhar
    Reddy, V. V. Krishna
    Reddy, Tatireddy Subba
    Chandol, Mohan Kumar
    Dasari, Durga Bhavani
    Reddy, Mule RamaKrishna
    [J]. INTERNATIONAL JOURNAL OF GENERAL SYSTEMS, 2024, 53 (06) : 651 - 677