Genetic variant pathogenicity prediction trained using disease-specific clinical sequencing data sets

被引:15
|
作者
Evans, Perry [1 ]
Wu, Chao [2 ]
Lindy, Amanda [3 ]
McKnight, Dianalee A. [3 ]
Lebo, Matthew [4 ,5 ]
Sarmady, Mahdi [2 ,6 ]
Abou Tayoun, Ahmad N. [2 ,6 ,7 ]
机构
[1] Childrens Hosp Philadelphia, Dept Biomed & Hlth Informat, Philadelphia, PA 19104 USA
[2] Childrens Hosp Philadelphia, Div Genom Diagnost, Philadelphia, PA 19104 USA
[3] GeneDx, Gaithersburg, MD 20877 USA
[4] Partners HealthCare Personalized Med, Lab Mol Med, Cambridge, MA 02139 USA
[5] Harvard Med Sch, Brigham & Womens Hosp, Dept Pathol, Boston, MA 02115 USA
[6] Univ Penn, Dept Pathol & Lab Med, Perelman Sch Med, Philadelphia, PA 19104 USA
[7] Al Jalila Childrens Specialty Hosp, Dubai, U Arab Emirates
关键词
GENOME; SUBSTITUTIONS; MUTATION;
D O I
10.1101/gr.240994.118
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Recent advances in DNA sequencing have expanded our understanding of the molecular basis of genetic disorders and increased the utilization of clinical genomic tests. Given the paucity of evidence to accurately classify each variant and the difficulty of experimentally evaluating its clinical significance, a large number of variants generated by clinical tests are reported as variants of unknown clinical significance. Population-scale variant databases can improve clinical interpretation. Specifically, pathogenicity prediction for novel missense variants can use features describing regional variant constraint. Constrained genomic regions are those that have an unusually low variant count in the general population. Computational methods have been introduced to capture these regions and incorporate them into pathogenicity classifiers, but these methods have yet to be compared on an independent clinical variant data set. Here, we introduce one variant data set derived from clinical sequencing panels and use it to compare the ability of different genomic constraint metrics to determine missense variant pathogenicity. This data set is compiled from 17,071 patients surveyed with clinical genomic sequencing for cardiomyopathy, epilepsy, or RASopathies. We further use this data set to demonstrate the necessity of disease-specific classifiers and to train PathoPredictor, a disease-specific ensemble classifier of pathogenicity based on regional constraint and variant-level features. PathoPredictor achieves an average precision >90% for variants from all 99 tested disease genes while approaching 100% accuracy for some genes. The accumulation of larger clinical variant training data sets can significantly enhance their performance in a disease-and gene-specific manner.
引用
收藏
页码:1144 / 1151
页数:8
相关论文
共 50 条
  • [1] DVPred: a disease-specific prediction tool for variant pathogenicity classification for hearing loss
    Bu, Fengxiao
    Zhong, Mingjun
    Chen, Qinyi
    Wang, Yumei
    Zhao, Xia
    Zhang, Qian
    Li, Xiarong
    Booth, Kevin T.
    Azaiez, Hela
    Lu, Yu
    Cheng, Jing
    Smith, Richard J. H.
    Yuan, Huijun
    [J]. HUMAN GENETICS, 2022, 141 (3-4) : 401 - 411
  • [2] DVPred: a disease-specific prediction tool for variant pathogenicity classification for hearing loss
    Fengxiao Bu
    Mingjun Zhong
    Qinyi Chen
    Yumei Wang
    Xia Zhao
    Qian Zhang
    Xiarong Li
    Kevin T. Booth
    Hela Azaiez
    Yu Lu
    Jing Cheng
    Richard J. H. Smith
    Huijun Yuan
    [J]. Human Genetics, 2022, 141 : 401 - 411
  • [3] Disease-specific variant pathogenicity prediction significantly improves variant interpretation in inherited cardiac conditions
    Zhang, Xiaolei
    Walsh, Roddy
    Whiffin, Nicola
    Buchan, Rachel
    Midwinter, William
    Wilk, Alicja
    Govind, Risha
    Li, Nicholas
    Ahmad, Mian
    Mazzarotto, Francesco
    Roberts, Angharad
    Theotokis, Pantazis I.
    Mazaika, Erica
    Allouba, Mona
    de Marvao, Antonio
    Pua, Chee Jian
    Day, Sharlene M.
    Ashley, Euan
    Colan, Steven D.
    Michels, Michelle
    Pereira, Alexandre C.
    Jacoby, Daniel
    Ho, Carolyn Y.
    Olivotto, Iacopo
    Gunnarsson, Gunnar T.
    Jefferies, John L.
    Semsarian, Chris
    Ingles, Jodie
    O'Regan, Declan P.
    Aguib, Yasmine
    Yacoub, Magdi H.
    Cook, Stuart A.
    Barton, Paul J. R.
    Bottolo, Leonardo
    Ware, James S.
    [J]. GENETICS IN MEDICINE, 2021, 23 (01) : 69 - 79
  • [4] Using large sequencing data sets to refine intragenic disease regions and prioritize clinical variant interpretation
    Amr, Sami S.
    Al Turki, Saeed H.
    Lebo, Matthew
    Sarmady, Mahdi
    Rehm, Heidi L.
    Abou Tayoun, Ahmad N.
    [J]. GENETICS IN MEDICINE, 2017, 19 (05) : 496 - 504
  • [5] MS-ResNet: disease-specific survival prediction using longitudinal CT images and clinical data
    Han, Jiahao
    Xiao, Ning
    Yang, Wanting
    Luo, Shichao
    Zhao, Jun
    Qiang, Yan
    Chaudhary, Suman
    Zhao, Juanjuan
    [J]. INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2022, 17 (06) : 1049 - 1057
  • [6] MS-ResNet: disease-specific survival prediction using longitudinal CT images and clinical data
    Jiahao Han
    Ning Xiao
    Wanting Yang
    Shichao Luo
    Jun Zhao
    Yan Qiang
    Suman Chaudhary
    Juanjuan Zhao
    [J]. International Journal of Computer Assisted Radiology and Surgery, 2022, 17 : 1049 - 1057
  • [7] A systematic approach for applying disease-specific phenotype in clinical variant interpretation
    Groopman, Emily
    Goldstein, Jennifer
    McNulty, Shannon
    Ross, Justyne
    Chang, Kelsea
    Harrison, Steven
    Berg, Jonathan
    [J]. GENETICS IN MEDICINE, 2022, 24 (03) : S222 - S223
  • [8] PdmIRD: missense variants pathogenicity prediction for inherited retinal diseases in a disease-specific manner
    Zeng, Bing
    Liu, Dong Cheng
    Huang, Jian Guo
    Xia, Xiao Bo
    Qin, Bo
    [J]. HUMAN GENETICS, 2024, 143 (03) : 331 - 342
  • [9] PdmIRD: missense variants pathogenicity prediction for inherited retinal diseases in a disease-specific manner
    Bing Zeng
    Dong Cheng Liu
    Jian Guo Huang
    Xiao Bo Xia
    Bo Qin
    [J]. Human Genetics, 2024, 143 : 331 - 342
  • [10] Development and Validation of a Disease-Specific Risk Adjustment System Using Automated Clinical Data
    Tabak, Ying P.
    Sun, Xiaowu
    Derby, Karen G.
    Kurtz, Stephen G.
    Johannes, Richard S.
    [J]. HEALTH SERVICES RESEARCH, 2010, 45 (06) : 1815 - 1835