Genetic variant pathogenicity prediction trained using disease-specific clinical sequencing data sets

被引:15
|
作者
Evans, Perry [1 ]
Wu, Chao [2 ]
Lindy, Amanda [3 ]
McKnight, Dianalee A. [3 ]
Lebo, Matthew [4 ,5 ]
Sarmady, Mahdi [2 ,6 ]
Abou Tayoun, Ahmad N. [2 ,6 ,7 ]
机构
[1] Childrens Hosp Philadelphia, Dept Biomed & Hlth Informat, Philadelphia, PA 19104 USA
[2] Childrens Hosp Philadelphia, Div Genom Diagnost, Philadelphia, PA 19104 USA
[3] GeneDx, Gaithersburg, MD 20877 USA
[4] Partners HealthCare Personalized Med, Lab Mol Med, Cambridge, MA 02139 USA
[5] Harvard Med Sch, Brigham & Womens Hosp, Dept Pathol, Boston, MA 02115 USA
[6] Univ Penn, Dept Pathol & Lab Med, Perelman Sch Med, Philadelphia, PA 19104 USA
[7] Al Jalila Childrens Specialty Hosp, Dubai, U Arab Emirates
关键词
GENOME; SUBSTITUTIONS; MUTATION;
D O I
10.1101/gr.240994.118
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Recent advances in DNA sequencing have expanded our understanding of the molecular basis of genetic disorders and increased the utilization of clinical genomic tests. Given the paucity of evidence to accurately classify each variant and the difficulty of experimentally evaluating its clinical significance, a large number of variants generated by clinical tests are reported as variants of unknown clinical significance. Population-scale variant databases can improve clinical interpretation. Specifically, pathogenicity prediction for novel missense variants can use features describing regional variant constraint. Constrained genomic regions are those that have an unusually low variant count in the general population. Computational methods have been introduced to capture these regions and incorporate them into pathogenicity classifiers, but these methods have yet to be compared on an independent clinical variant data set. Here, we introduce one variant data set derived from clinical sequencing panels and use it to compare the ability of different genomic constraint metrics to determine missense variant pathogenicity. This data set is compiled from 17,071 patients surveyed with clinical genomic sequencing for cardiomyopathy, epilepsy, or RASopathies. We further use this data set to demonstrate the necessity of disease-specific classifiers and to train PathoPredictor, a disease-specific ensemble classifier of pathogenicity based on regional constraint and variant-level features. PathoPredictor achieves an average precision >90% for variants from all 99 tested disease genes while approaching 100% accuracy for some genes. The accumulation of larger clinical variant training data sets can significantly enhance their performance in a disease-and gene-specific manner.
引用
收藏
页码:1144 / 1151
页数:8
相关论文
共 50 条
  • [21] Nationwide hospital admission data statistics and disease-specific 30-day readmission prediction
    Shuwen Wang
    Xingquan Zhu
    [J]. Health Information Science and Systems, 10
  • [22] Utility of Disease-Specific Measures and Clinical Balance Tests in Prediction of Falls in Persons With Multiple Sclerosis
    Dibble, Leland E.
    Lopez-Lennon, Cielita
    Lake, Warren
    Hoffmeister, Carrie
    Gappmaier, Eduard
    [J]. JOURNAL OF NEUROLOGIC PHYSICAL THERAPY, 2013, 37 (03): : 99 - 104
  • [23] Clinical diagnosis of metabolic disorders using untargeted metabolomic profiling and disease-specific networks learned from profiling data
    Thistlethwaite, Lillian R.
    Li, Xiqi
    Burrage, Lindsay C.
    Riehle, Kevin
    Hacia, Joseph G.
    Braverman, Nancy
    Wangler, Michael F.
    Miller, Marcus J.
    Elsea, Sarah H.
    Milosavljevic, Aleksandar
    [J]. SCIENTIFIC REPORTS, 2022, 12 (01)
  • [24] Clinical diagnosis of metabolic disorders using untargeted metabolomic profiling and disease-specific networks learned from profiling data
    Lillian R. Thistlethwaite
    Xiqi Li
    Lindsay C. Burrage
    Kevin Riehle
    Joseph G. Hacia
    Nancy Braverman
    Michael F. Wangler
    Marcus J. Miller
    Sarah H. Elsea
    Aleksandar Milosavljevic
    [J]. Scientific Reports, 12
  • [25] Significance analysis and improved discovery of disease-specific Differentially Co-expressed Gene Sets in microarray data
    Li, Haixia
    Karuturi, R. Krishna Murthy
    [J]. INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2010, 4 (06) : 617 - 638
  • [26] Prediction of genetic contributions to complex traits using whole genome sequencing data
    Chen Yao
    Ning Leng
    Kent A Weigel
    Kristine E Lee
    Corinne D Engelman
    Kristin J Meyers
    [J]. BMC Proceedings, 8 (Suppl 1)
  • [27] Design and development of a disease-specific clinical database system to increase the availability of hospital data in China
    Mimi Liu
    Jinni Luo
    Lin Li
    Xuemei Pan
    Shuyan Tan
    Weidong Ji
    Hongzheng Zhang
    Shengsheng Tang
    Jingjing Liu
    Bin Wu
    Zebin Chen
    Xiaoying Wu
    Yi Zhou
    [J]. Health Information Science and Systems, 11
  • [28] Design and development of a disease-specific clinical database system to increase the availability of hospital data in China
    Liu, Mimi
    Luo, Jinni
    Li, Lin
    Pan, Xuemei
    Tan, Shuyan
    Ji, Weidong
    Zhang, Hongzheng
    Tang, Shengsheng
    Liu, Jingjing
    Wu, Bin
    Chen, Zebin
    Wu, Xiaoying
    Zhou, Yi
    [J]. HEALTH INFORMATION SCIENCE AND SYSTEMS, 2023, 11 (01)
  • [29] Genetic diagnosis of autoinflammatory disease patients using clinical exome sequencing
    Batlle-Maso, Laura
    Mensa-Vilaro, Anna
    Solis-Moruno, Manuel
    Marques-Bonet, Tomas
    Arostegui, Juan I.
    Casals, Ferran
    [J]. EUROPEAN JOURNAL OF MEDICAL GENETICS, 2020, 63 (05)
  • [30] Disease-Specific Autoantibodies Induce Trained Immunity in RA Synovial Tissues and Its Gene Signature Correlates with the Response to Clinical Therapy
    Dai, Xiaoli
    Dai, Xiaoqiu
    Gong, Zheng
    Yang, Chen
    Zeng, Keqin
    Gong, Fang-Yuan
    Zhong, Qiao
    Gao, Xiao-Ming
    [J]. MEDIATORS OF INFLAMMATION, 2020, 2020