Fold-stratified cross-validation for unbiased and privacy-preserving federated learning

被引:26
|
作者
Bey, Romain [1 ,2 ]
Goussault, Romain [3 ]
Grolleau, Francois [1 ,2 ]
Benchoufi, Mehdi [1 ,2 ]
Porcher, Raphael [1 ,2 ]
机构
[1] Univ Paris, Ctr Res Epidemiol & Stat CRESS, French Inst Hlth & Med Res, Natl Inst Agr Res INRA,INSERM, Paris, France
[2] Nantes Univ, Ctr Hosp Univ Nantes, CIC 1413, Ctr Res Cancerol & Immunol Nantes Angers CRCINA,D, Nantes, France
[3] Nantes Univ, Ctr Hosp Univ Nantes, Ctr Res Cancerol & Immunol Nantes Angers CRCINA, Dermatol Dept, Nantes CIC 1413, France
关键词
federated learning; privacy; validation; duplicated electronic health records; data leakage; ELECTRONIC HEALTH RECORDS;
D O I
10.1093/jamia/ocaa096
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: We introduce fold-stratified cross-validation, a validation methodology that is compatible with privacy-preserving federated learning and that prevents data leakage caused by duplicates of electronic health records (EHRs). Materials and Methods: Fold-stratified cross-validation complements cross-validation with an initial stratification of EHRs in folds containing patients with similar characteristics, thus ensuring that duplicates of a record are jointly present either in training or in validation folds. Monte Carlo simulations are performed to investigate the properties of fold-stratified cross-validation in the case of a model data analysis using both synthetic data and MIMIC-III (Medical Information Mart for Intensive Care-III) medical records. Results: In situations in which duplicated EHRs could induce overoptimistic estimations of accuracy, applying fold-stratified cross-validation prevented this bias, while not requiring full deduplication. However, a pessimistic bias might appear if the covariate used for the stratification was strongly associated with the outcome. Discussion: Although fold-stratified cross-validation presents low computational overhead, to be efficient it requires the preliminary identification of a covariate that is both shared by duplicated records and weakly associated with the outcome. When available, the hash of a personal identifier or a patient's date of birth provides such a covariate. On the contrary, pseudonymization interferes with fold-stratified cross-validation, as it may break the equality of the stratifying covariate among duplicates. Conclusion: Fold-stratified cross-validation is an easy-to-implement methodology that prevents data leakage when a model is trained on distributed EHRs that contain duplicates, while preserving privacy.
引用
收藏
页码:1244 / 1251
页数:8
相关论文
共 50 条
  • [1] Privacy-Preserving Personalized Federated Learning
    Hu, Rui
    Guo, Yuanxiong
    Li, Hongning
    Pei, Qingqi
    Gong, Yanmin
    ICC 2020 - 2020 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2020,
  • [2] Frameworks for Privacy-Preserving Federated Learning
    Phong, Le Trieu
    Phuong, Tran Thi
    Wang, Lihua
    Ozawa, Seiichi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (01) : 2 - 12
  • [3] Adaptive privacy-preserving federated learning
    Liu, Xiaoyuan
    Li, Hongwei
    Xu, Guowen
    Lu, Rongxing
    He, Miao
    PEER-TO-PEER NETWORKING AND APPLICATIONS, 2020, 13 (06) : 2356 - 2366
  • [4] Privacy-preserving Techniques in Federated Learning
    Liu Y.-X.
    Chen H.
    Liu Y.-H.
    Li C.-P.
    Ruan Jian Xue Bao/Journal of Software, 2022, 33 (03): : 1057 - 1092
  • [5] Adaptive privacy-preserving federated learning
    Xiaoyuan Liu
    Hongwei Li
    Guowen Xu
    Rongxing Lu
    Miao He
    Peer-to-Peer Networking and Applications, 2020, 13 : 2356 - 2366
  • [6] Federated learning for privacy-preserving AI
    Cheng, Yong
    Liu, Yang
    Chen, Tianjian
    Yang, Qiang
    COMMUNICATIONS OF THE ACM, 2020, 63 (12) : 33 - 36
  • [7] Privacy-Preserving and Reliable Federated Learning
    Lu, Yi
    Zhang, Lei
    Wang, Lulu
    Gao, Yuanyuan
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2021, PT III, 2022, 13157 : 346 - 361
  • [8] Privacy-preserving Cross-domain Recommendation with Federated Graph Learning
    Tian, Changxin
    Xie, Yuexiang
    Chen, Xu
    Li, Yaliang
    Zhao, Xin
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2024, 42 (05)
  • [9] Anonymous and Efficient Authentication Scheme for Privacy-Preserving Federated Cross Learning
    Li, Zeshuai
    Liang, Xiaoyan
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT IX, ICIC 2024, 2024, 14870 : 281 - 293
  • [10] Personalized Privacy-Preserving Framework for Cross-Silo Federated Learning
    Tran, Van-Tuan
    Pham, Huy-Hieu
    Wong, Kok-Seng
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2024, 12 (04) : 1014 - 1024