Synergy conformal prediction applied to large-scale bioactivity datasets and in federated learning

被引:0
|
作者
Norinder, Ulf [1 ,2 ,3 ,4 ]
Spjuth, Ola [1 ,2 ]
Svensson, Fredrik [5 ]
机构
[1] Uppsala Univ, Dept Pharmaceut Biosci, Box 591, SE-75124 Uppsala, Sweden
[2] Uppsala Univ, Sci Life Lab, Box 591, SE-75124 Uppsala, Sweden
[3] Stockholm Univ, Dept Comp & Syst Sci, Box 7003, S-16407 Kista, Sweden
[4] Orebro Univ, MTM Res Ctr, Sch Sci & Technol, S-70182 Orebro, Sweden
[5] UCL, Alzheimers Res UK UCL Drug Discovery Inst, Cruciform Bldg,Gower St, London WC1E 6BT, England
关键词
Conformal prediction; Federated learning; Confidence; Machine learning;
D O I
10.1186/s13321-021-00555-7
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Confidence predictors can deliver predictions with the associated confidence required for decision making and can play an important role in drug discovery and toxicity predictions. In this work we investigate a recently introduced version of conformal prediction, synergy conformal prediction, focusing on the predictive performance when applied to bioactivity data. We compare the performance to other variants of conformal predictors for multiple partitioned datasets and demonstrate the utility of synergy conformal predictors for federated learning where data cannot be pooled in one location. Our results show that synergy conformal predictors based on training data randomly sampled with replacement can compete with other conformal setups, while using completely separate training sets often results in worse performance. However, in a federated setup where no method has access to all the data, synergy conformal prediction is shown to give promising results. Based on our study, we conclude that synergy conformal predictors are a valuable addition to the conformal prediction toolbox.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Supervised machine learning for diagnostic classification from large-scale neuroimaging datasets
    Lanka, Pradyumna
    Rangaprakash, D.
    Dretsch, Michael N.
    Katz, Jeffrey S.
    Denney, Thomas S., Jr.
    Deshpande, Gopikrishna
    [J]. BRAIN IMAGING AND BEHAVIOR, 2020, 14 (06) : 2378 - 2416
  • [42] Supervised machine learning for diagnostic classification from large-scale neuroimaging datasets
    Pradyumna Lanka
    D Rangaprakash
    Michael N. Dretsch
    Jeffrey S. Katz
    Thomas S. Denney
    Gopikrishna Deshpande
    [J]. Brain Imaging and Behavior, 2020, 14 : 2378 - 2416
  • [43] Towards algorithmic analytics for large-scale datasets
    Bzdok, Danilo
    Nichols, Thomas E.
    Smith, Stephen M.
    [J]. NATURE MACHINE INTELLIGENCE, 2019, 1 (07) : 296 - 306
  • [44] RANSAC-SVM for Large-Scale Datasets
    Nishida, Kenji
    Kurita, Takio
    [J]. 19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 3767 - 3770
  • [45] MedDialog: Large-scale Medical Dialogue Datasets
    Zeng, Guangtao
    Yang, Wenmian
    Ju, Zeqian
    Yang, Yue
    Wang, Sicheng
    Zhang, Ruisi
    Zhou, Meng
    Zeng, Jiaqi
    Dong, Xiangyu
    Zhang, Ruoyu
    Fang, Hongchao
    Zhu, Penghui
    Chen, Shu
    Xie, Pengtao
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 9241 - 9250
  • [46] Map Matching Algorithm for Large-scale Datasets
    Fiedler, David
    Cap, Michal
    Nykl, Jan
    Zilecky, Pavol
    [J]. ICAART: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 3, 2022, : 500 - 508
  • [47] Momentum Online LDA for Large-scale Datasets
    Ouyang, Jihong
    Lu, You
    Li, Ximing
    [J]. 21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 1075 - 1076
  • [48] Large-Scale Datasets in Special Education Research
    Griffin, Megan M.
    Steinbrecher, Trisha D.
    [J]. USING SECONDARY DATASETS TO UNDERSTAND PERSONS WITH DEVELOPMENTAL DISABILITIES AND THEIR FAMILIES, 2013, 45 : 155 - 183
  • [49] Towards algorithmic analytics for large-scale datasets
    Danilo Bzdok
    Thomas E. Nichols
    Stephen M. Smith
    [J]. Nature Machine Intelligence, 2019, 1 : 296 - 306
  • [50] Iterative Classification for Sanitizing Large-Scale Datasets
    Li, Bo
    Vorobeychik, Yevgeniy
    Li, Muqun
    Malin, Bradley
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2015, : 841 - 846