Exploring the Utility of Anonymized EHR Datasets in Machine Learning Experiments in the Context of the MODELHealth Project

被引:0
|
作者
Pitoglou, Stavros [1 ,2 ]
Filntisi, Arianna [2 ]
Anastasiou, Athanasios [1 ]
Matsopoulos, George K. [1 ]
Koutsouris, Dimitrios [1 ]
机构
[1] Natl Tech Univ Athens, Sch Elect & Comp Engn, Athens 15780, Greece
[2] Comp Solut SA, Athens 11527, Greece
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 12期
关键词
machine learning; anonymization; Mondrian; HEALTH-CARE; BIG DATA; PRIVACY; ALGORITHMS; SECURITY; THREATS;
D O I
10.3390/app12125942
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The object of this paper was the application of machine learning to a clinical dataset that was anonymized using the Mondrian algorithm. (1) Background: The preservation of patient privacy is a necessity rising from the increasing digitization of health data; however, the effect of data anonymization on the performance of machine learning models remains to be explored. (2) Methods: The original EHR derived dataset was subjected to anonymization by applying the Mondrian algorithm for various k values and quasi identifier (QI) set attributes. The logistic regression, decision trees, k-nearest neighbors, Gaussian naive Bayes and support vector machine models were applied to the different dataset versions. (3) Results: The classifiers demonstrated different degrees of resilience to the anonymization, with the decision tree and the KNN models showing remarkably stable performance, as opposed to the Gaussian naive Bayes model. The choice of the QI set attributes and the generalized information loss value played a more important role than the size of the QI set or the k value. (4) Conclusions: Data anonymization can reduce the performance of certain machine learning models, although the appropriate selection of classifier and parameter values can mitigate this effect.
引用
收藏
页数:20
相关论文
共 41 条
  • [21] Exploring the Utility of a Machine Learning Approach with Mobile-Based Cognitive Function Tasks for Detecting Depression
    Takeshige, Momoka
    Oka, Taiki
    Ohwan, Mai
    Hirai, Kei
    JAPANESE PSYCHOLOGICAL RESEARCH, 2025, 67 (02) : 195 - 207
  • [22] Exploring the Influence of Using Collaborative Tools on the Community of Inquiry in an Interdisciplinary Project-Based Learning Context
    Hsu, Yu-Chiung
    Shiue, Ya-Ming
    EURASIA JOURNAL OF MATHEMATICS SCIENCE AND TECHNOLOGY EDUCATION, 2018, 14 (03) : 933 - 945
  • [23] Exploring quality dimensions in trustworthy Machine Learning in the context of official statistics: model explainability and uncertainty quantification
    Saeid Molladavoudi
    Wesley Yung
    AStA Wirtschafts- und Sozialstatistisches Archiv, 2023, 17 (3-4) : 223 - 252
  • [24] Exploring the Design Context of AI-Powered Services: A Qualitative Investigation of Designers' Experiences with Machine Learning
    Bergstrom, Emil
    Warnestal, Pontus
    ARTIFICIAL INTELLIGENCE IN HCI, AI-HCI 2022, 2022, 13336 : 3 - 21
  • [25] Exploring the Challenges and Enablers of Implementing a STEM Project-Based Learning Programme in a Diverse Junior Secondary Context
    Kimberley Wilson
    International Journal of Science and Mathematics Education, 2021, 19 : 881 - 897
  • [26] Exploring the Challenges and Enablers of Implementing a STEM Project-Based Learning Programme in a Diverse Junior Secondary Context
    Wilson, Kimberley
    INTERNATIONAL JOURNAL OF SCIENCE AND MATHEMATICS EDUCATION, 2021, 19 (05) : 881 - 897
  • [27] Exploring the artificial intelligence and machine learning models in the context of drug design difficulties and future potential for the pharmaceutical sectors
    Shiammala, Periyasamy Natarajan
    Duraimutharasan, Navaneetha Krishna Bose
    Vaseeharan, Baskaralingam
    Alothaim, Abdulaziz S.
    Al-Malki, Esam S.
    Snekaa, Babu
    Safi, Sher Zaman
    Singh, Sanjeev Kumar
    Velmurugan, Devadasan
    Selvaraj, Chandrabose
    METHODS, 2023, 219 : 82 - 94
  • [28] Exploring the power of machine learning to predict carbon dioxide trapping efficiency in saline aquifers for carbon geological storage project
    Safaei-Farouji, Majid
    Hung Vo Thanh
    Dai, Zhenxue
    Mehbodniya, Abolfazl
    Rahimi, Mohammad
    Ashraf, Umar
    Radwan, Ahmed E.
    JOURNAL OF CLEANER PRODUCTION, 2022, 372
  • [29] Exploring online public survey lifestyle datasets with statistical analysis, machine learning and semantic ontology (vol 14, Article number: 24190 (2024)
    Chatterjee, Ayan
    Riegler, Michael A.
    Johnson, Miriam Sinkerud
    Das, Jishnu
    Pahari, Nibedita
    Ramachandra, Raghavendra
    Ghosh, Bikramaditya
    Saha, Arpan
    Bajpai, Ram
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [30] Mobile Expert System: Exploring Context-Aware Machine Learning Rules for Personalized Decision-Making in Mobile Applications
    Sarker, Iqbal H.
    Khan, Asif Irshad
    Abushark, Yoosef B.
    Alsolami, Fawaz
    SYMMETRY-BASEL, 2021, 13 (10):