Disease variant prediction with deep generative models of evolutionary data

被引:269
|
作者
Frazer, Jonathan [1 ]
Notin, Pascal [2 ]
Dias, Mafalda [1 ]
Gomez, Aidan [2 ]
Min, Joseph K. [1 ]
Brock, Kelly [1 ]
Gal, Yarin [2 ]
Marks, Debora S. [1 ,3 ]
机构
[1] Harvard Med Sch, Dept Syst Biol, Marks Grp, Boston, MA 02115 USA
[2] Univ Oxford, Dept Comp Sci, OATML Grp, Oxford, England
[3] Broad Inst Harvard & MIT, Cambridge, MA 02142 USA
基金
英国工程与自然科学研究理事会; 美国国家卫生研究院;
关键词
MISSENSE VARIANTS; MEDICAL GENETICS; AMERICAN-COLLEGE; ASSOCIATION; MUTATION; IMPACT; MSH2;
D O I
10.1038/s41586-021-04043-8
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Quantifying the pathogenicity of protein variants in human disease-related genes would have a marked effect on clinical decisions, yet the overwhelming majority (over 98%) of these variants still have unknown consequences(1-3). In principle, computational methods could support the large-scale interpretation of genetic variants. However, state-of-the-art methods(4-10) have relied on training machine learning models on known disease labels. As these labels are sparse, biased and of variable quality, the resulting models have been considered insufficiently reliable(11). Here we propose an approach that leverages deep generative models to predict variant pathogenicity without relying on labels. By modelling the distribution of sequence variation across organisms, we implicitly capture constraints on the protein sequences that maintain fitness. Our model EVE (evolutionary model of variant effect) not only outperforms computational approaches that rely on labelled data but also performs on par with, if not better than, predictions from high-throughput experiments, which are increasingly used as evidence for variant classification(12-16). We predict the pathogenicity of more than 36 million variants across 3,219 disease genes and provide evidence for the classification of more than 256,000 variants of unknown significance. Our work suggests that models of evolutionary information can provide valuable independent evidence for variant interpretation that will be widely useful in research and clinical settings.
引用
收藏
页码:91 / +
页数:18
相关论文
共 50 条
  • [1] Disease variant prediction with deep generative models of evolutionary data
    Jonathan Frazer
    Pascal Notin
    Mafalda Dias
    Aidan Gomez
    Joseph K. Min
    Kelly Brock
    Yarin Gal
    Debora S. Marks
    [J]. Nature, 2021, 599 : 91 - 95
  • [2] Publisher Correction: Disease variant prediction with deep generative models of evolutionary data
    Jonathan Frazer
    Pascal Notin
    Mafalda Dias
    Aidan Gomez
    Joseph K. Min
    Kelly Brock
    Yarin Gal
    Debora S. Marks
    [J]. Nature, 2022, 601 : E7 - E7
  • [3] Disease variant prediction with deep generative models of evolutionary data (vol 599, pg 91, 2021)
    Frazer, Jonathan
    Notin, Pascal
    Dias, Mafalda
    Gomez, Aidan
    Min, Joseph K.
    Brock, Kelly
    Gal, Yarin
    Marks, Debora S.
    [J]. NATURE, 2022, 601 (7892) : E7 - E7
  • [4] A Robust Framework for Data Generative and Heart Disease Prediction Based on Efficient Deep Learning Models
    Sarra, Raniya R. R.
    Dinar, Ahmed M. M.
    Mohammed, Mazin Abed
    Abd Ghani, Mohd Khanapi
    Albahar, Marwan Ali
    [J]. DIAGNOSTICS, 2022, 12 (12)
  • [5] An Overview of Deep Generative Models in Functional and Evolutionary Genomics
    Yelmen, Burak
    Jay, Flora
    [J]. ANNUAL REVIEW OF BIOMEDICAL DATA SCIENCE, 2023, 6 : 173 - 189
  • [6] Protein design and variant prediction using autoregressive generative models
    Jung-Eun Shin
    Adam J. Riesselman
    Aaron W. Kollasch
    Conor McMahon
    Elana Simon
    Chris Sander
    Aashish Manglik
    Andrew C. Kruse
    Debora S. Marks
    [J]. Nature Communications, 12
  • [7] Protein design and variant prediction using autoregressive generative models
    Shin, Jung-Eun
    Riesselman, Adam J.
    Kollasch, Aaron W.
    McMahon, Conor
    Simon, Elana
    Sander, Chris
    Manglik, Aashish
    Kruse, Andrew C.
    Marks, Debora S.
    [J]. NATURE COMMUNICATIONS, 2021, 12 (01)
  • [8] Deep Generative Models for Synthetic Data: A Survey
    Eigenschink, Peter
    Reutterer, Thomas
    Vamosi, Stefan
    Vamosi, Ralf
    Sun, Chang
    Kalcher, Klaudius
    [J]. IEEE ACCESS, 2023, 11 : 47304 - 47320
  • [9] Implications of data topology for deep generative models
    Jin, Yinzhu
    Mcdaniel, Rory
    Tatro, N. Joseph
    Catanzaro, Michael J.
    Smith, Abraham D.
    Bendich, Paul
    Dwyer, Matthew B.
    Fletcher, P. Thomas
    [J]. FRONTIERS IN COMPUTER SCIENCE, 2024, 6
  • [10] Deep generative models of LDLR protein structure to predict variant pathogenicity
    James, Jose K.
    Norland, Kristjan
    Johar, Angad S.
    Kullo, Iftikhar J.
    [J]. JOURNAL OF LIPID RESEARCH, 2023, 64 (12)