Cross-protein transfer learning substantially improves disease variant prediction

被引:18
|
作者
Jagota, Milind [1 ]
Ye, Chengzhong [2 ]
Albors, Carlos [1 ]
Rastogi, Ruchir [1 ]
Koehl, Antoine [2 ]
Ioannidis, Nilah [1 ,3 ,4 ]
Song, Yun S. [1 ,2 ,4 ]
机构
[1] Univ Calif Berkeley, Comp Sci Div, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
[3] Chan Zuckerberg Biohub, San Francisco, CA 94158 USA
[4] Univ Calif Berkeley, Ctr Computat Biol, Berkeley, CA 94720 USA
关键词
DESCRIPTORS; SEQUENCE; PEPTIDES; DESIGN; IMPACT; SCALE; SET; MAP;
D O I
10.1186/s13059-023-03024-6
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Genetic variation in the human genome is a major determinant of individual disease risk, but the vast majority of missense variants have unknown etiological effects. Here, we present a robust learning framework for leveraging saturation mutagenesis experiments to construct accurate computational predictors of proteome-wide missense variant pathogenicity. Results: We train cross-protein transfer (CPT) models using deep mutational scanning (DMS) data from only five proteins and achieve state-of-the-art performance on clinical variant interpretation for unseen proteins across the human proteome. We also improve predictive accuracy on DMS data from held-out proteins. High sensitivity is crucial for clinical applications and our model CPT-1 particularly excels in this regime. For instance, at 95% sensitivity of detecting human disease variants annotated in ClinVar, CPT-1 improves specificity to 68%, from 27% for ESM-1v and 55% for EVE. Furthermore, for genes not used to train REVEL, a supervised method widely used by clinicians, we show that CPT-1 compares favorably with REVEL. Our framework combines predictive features derived from general protein sequence models, vertebrate sequence alignments, and AlphaFold structures, and it is adaptable to the future inclusion of other sources of information. We find that vertebrate alignments, albeit rather shallow with only 100 genomes, provide a strong signal for variant pathogenicity prediction that is complementary to recent deep learning-based models trained on massive amounts of protein sequence data. We release predictions for all possible missense variants in 90% of human genes. Conclusions: Our results demonstrate the utility of mutational scanning data for learning properties of variants that transfer to unseen proteins.
引用
收藏
页数:19
相关论文
共 50 条
  • [31] Machine Learning Improves Upon Clinicians' Prediction of End Stage Kidney Disease
    Chuah, Aaron
    Walters, Giles
    Christiadi, Daniel
    Karpe, Krishna
    Kennard, Alice
    Singer, Richard
    Talaulikar, Girish
    Ge, Wenbo
    Suominen, Hanna
    Andrews, T. Daniel
    Jiang, Simon
    FRONTIERS IN MEDICINE, 2022, 9
  • [32] Deep Transfer Learning Based PPI Prediction for Protein Complex Detection
    Yuan, Xin
    Deng, Hangyu
    Hu, Jinglu
    2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 321 - 326
  • [33] DNA-binding protein prediction based on deep transfer learning
    Yan, Jun
    Jiang, Tengsheng
    Liu, Junkai
    Lu, Yaoyao
    Guan, Shixuan
    Li, Haiou
    Wu, Hongjie
    Ding, Yijie
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2022, 19 (08) : 7719 - 7736
  • [34] Protein Ubiquitylation and Sumoylation Site Prediction Based on Ensemble and Transfer Learning
    He, Fei
    Wang, Rui
    Gao, Yanxin
    Wang, Duolin
    Yu, Yang
    Xu, Dong
    Zhao, Xiaowei
    2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 117 - 123
  • [35] The Prediction of Cross-Regional Landslide Susceptibility Based on Pixel Transfer Learning
    Wang, Xiao
    Wang, Di
    Li, Xinyue
    Zhang, Mengmeng
    Cheng, Sizhi
    Li, Shaoda
    Dong, Jianhui
    Xu, Luting
    Sun, Tiegang
    Li, Weile
    Ran, Peilian
    Liu, Liang
    Wang, Baojie
    Zhao, Ling
    Huang, Xinyi
    REMOTE SENSING, 2024, 16 (02)
  • [36] Micro Transfer Learning Mechanism for Cross-Domain Equipment RUL Prediction
    Xiang, Sheng
    Li, Penghua
    Luo, Jun
    Qin, Yi
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2025, 22 : 1460 - 1470
  • [37] Federated Transfer Learning Based Cross-Domain Prediction for Smart Manufacturing
    Wang, Kevin I-Kai
    Zhou, Xiaokang
    Liang, Wei
    Yan, Zheng
    She, Jinhua
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2022, 18 (06) : 4088 - 4096
  • [38] Transfer Learning for Cross-City Traffic Prediction to Solve Data Scarcity
    Zhang, Xijun
    Wan, Guangyu
    Zhang, Hong
    TRANSPORTATION RESEARCH RECORD, 2024,
  • [39] Cross-City Transfer Learning for Deep Spatio-Temporal Prediction
    Wang, Leye
    Geng, Xu
    Ma, Xiaojuan
    Liu, Feng
    Yang, Qiang
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 1893 - 1899
  • [40] Micro Transfer Learning Mechanism for Cross-Domain Equipment RUL Prediction
    Xiang, Sheng
    Li, Penghua
    Luo, Jun
    Qin, Yi
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2025, 22 : 1460 - 1470