PermPress: Machine Learning-Based Pipeline to Evaluate Permissions in App Privacy Policies

被引：3

作者：

Rahman, Muhammad Sajidur ^{[1
]}

Naghavi, Pirouz ^{[1
]}

Kojusner, Blas ^{[1
]}

Afroz, Sadia ^{[2
]}

Williams, Byron ^{[1
]}

Rampazzi, Sara ^{[1
]}

Bindschaedler, Vincent ^{[1
]}

机构：

[1] Univ Florida, Dept Comp & Informat Sci & Engn, Gainesville, FL 32603 USA

[2] Avast Software, Emeryville, CA 94608 USA

来源：

IEEE ACCESS | 2022年 / 10卷

关键词：

Privacy; Data privacy; Smart phones; Machine learning; Data collection; Predictive models; Mobile applications; Androids; Computer applications; Annotations; Privacy policy; android apps; data privacy; NLP; machine learning; annotated dataset; PERSONAL INFORMATION; RISK-ASSESSMENT; ANDROID APPS;

D O I：

10.1109/ACCESS.2022.3199882

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Privacy laws and app stores (e.g., Google Play Store) require mobile apps to have transparent privacy policies to disclose sensitive actions and data collection, such as accessing the phonebook, camera, storage, GPS, and microphone. However, many mobile apps do not accurately disclose their sensitive data access that requires sensitive ('dangerous') permissions. Thus, analyzing discrepancies between apps' permissions and privacy policies facilitates the identification of compliance issues upon which privacy regulators and marketplace operators can act. In this paper, we propose PermPress - an automated machine-learning system to evaluate an Android app's permission-completeness, i.e., whether its privacy policy matches its dangerous permissions. PermPress combines machine learning techniques with human annotation of privacy policies to establish whether app policies contain permission-relevant information. PermPress leverages MPP-270, an annotated policy corpus, for establishing a gold standard dataset of permission completeness. This corpus shows that only 31% of apps disclose all dangerous permissions in privacy policies. By leveraging the annotated dataset and machine learning techniques, PermPress achieves an AUC score of 0.92 in predicting the permission-completeness of apps. A large-scale evaluation of 164,156 Android apps shows that, on average, 7% of apps do not disclose more than half of their declared dangerous permissions in privacy policies, whereas 60% of apps omit to disclose at least one dangerous permission-related data collection in privacy policies. Our investigation uncovers the non-transparent state of app privacy policies and highlights the need to standardize app privacy policies' compliance and completeness checking process.

引用

页码：89248 / 89269

页数：22

共 50 条

[41] Local learning-based feature weighting with privacy preservation
Li, Yun
Yang, Jun
Ji, Wei
[J]. NEUROCOMPUTING, 2016, 174 : 1107 - 1115
[42] Machine Learning-Based App for Self-Evaluation of Teacher-Specific Instructional Style and Tools
Duzhin, Fedor
Gustafsson, Anders
[J]. EDUCATION SCIENCES, 2018, 8 (01)
[43] Learning-Based Privacy-Preserving Location Sharing
Shen, Nan
Chen, Xuan
Liang, Shuang
Yang, Jun
Li, Tong
Jia, Chunfu
[J]. COMPUTATIONAL INTELLIGENCE AND INTELLIGENT SYSTEMS, (ISICA 2015), 2016, 575 : 672 - 682
[44] A machine learning-based credit lending eligibility prediction and suitable bank recommendation: an Android app for entrepreneurs
Parvin, Jakia
Chowdhury, Mahfuzulhoq
[J]. INTERNATIONAL JOURNAL OF APPLIED MANAGEMENT SCIENCE, 2023, 15 (03) : 238 - 257
[45] Quality-Driven Machine Learning-based Data Science Pipeline Realization: a software engineering approach
d'Aloisio, Giordano
[J]. 2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS (ICSE-COMPANION 2022), 2022, : 291 - 293
[46] Learn2Write: Augmented Reality and Machine Learning-Based Mobile App to Learn Writing
Opu, Md. Nahidul Islam
Islam, Md. Rakibul
Kabir, Muhammad Ashad
Hossain, Md. Sabir
Islam, Mohammad Mainul
[J]. COMPUTERS, 2022, 11 (01)
[47] LidSonic for Visually Impaired: Green Machine Learning-Based Assistive Smart Glasses with Smart App and Arduino
Busaeed, Sahar
Mehmood, Rashid
Katib, Iyad
Corchado, Juan M.
[J]. ELECTRONICS, 2022, 11 (07)
[48] The Explainability-Privacy-Utility Trade-Off for Machine Learning-Based Tabular Data Analysis
Abbasi, Wisam
Mori, Paolo
Saracino, Andrea
[J]. PROCEEDINGS OF THE 20TH INTERNATIONAL CONFERENCE ON SECURITY AND CRYPTOGRAPHY, SECRYPT 2023, 2023, : 511 - 519
[49] Research on Privacy Protection Based on Machine Learning
Xie, Hui
Wei, Li
Fang, Fang
[J]. IWCMC 2021: 2021 17TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE (IWCMC), 2021, : 1003 - 1006
[50] Deep learning-based privacy-preserving recommendations in federated learning
Kolli, Chandra Sekhar
Reddy, V. V. Krishna
Reddy, Tatireddy Subba
Chandol, Mohan Kumar
Dasari, Durga Bhavani
Reddy, Mule RamaKrishna
[J]. INTERNATIONAL JOURNAL OF GENERAL SYSTEMS, 2024, 53 (06) : 651 - 677

← 1 2 3 4 5 →