Predicting target genes of non-coding regulatory variants with IRT

被引:4
|
作者
Wu, Zhenqin [1 ,2 ]
Ioannidis, Nilah M. [2 ]
Zou, James [2 ,3 ]
机构
[1] Stanford Univ, Dept Chem, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Biomed Data Sci, Sch Med, Stanford, CA 94305 USA
[3] Chan Zuckerberg Biohub, San Francisco, CA 94158 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
GENOME-WIDE ASSOCIATION; HUMAN PIGMENTATION; EXPRESSION; ANNOTATION; IRF4; MC1R; IDENTIFICATION; FRAMEWORK; IMPACT; LOCI;
D O I
10.1093/bioinformatics/btaa254
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Interpreting genetic variants of unknown significance (VUS) is essential in clinical applications of genome sequencing for diagnosis and personalized care. Non-coding variants remain particularly difficult to interpret, despite making up a large majority of trait associations identified in genome-wide association studies (GWAS) analyses. Predicting the regulatory effects of non-coding variants on candidate genes is a key step in evaluating their clinical significance. Here, we develop a machine-learning algorithm, Inference of Connected expression quantitative trait loci (eQTLs) (IRT), to predict the regulatory targets of non-coding variants identified in studies of eQTLs. We assemble datasets using eQTL results from the Genotype-Tissue Expression (GTEx) project and learn to separate positive and negative pairs based on annotations characterizing the variant, gene and the intermediate sequence. IRT achieves an area under the receiver operating characteristic curve (ROC-AUC) of 0.799 using random cross-validation, and 0.700 for a more stringent position-based cross-validation. Further evaluation on rare variants and experimentally validated regulatory variants shows a significant enrichment in IRT identifying the true target genes versus negative controls. In gene-ranking experiments, IRT achieves a top-1 accuracy of 50% and top-3 accuracy of 90%. Salient features, including GC-content, histone modifications and Hi-C interactions are further analyzed and visualized to illustrate their influences on predictions. IRT can be applied to any VUS of interest and each candidate nearby gene to output a score reflecting the likelihood of regulatory effect on the expression level. These scores can be used to prioritize variants and genes to assist in patient diagnosis and GWAS follow-up studies.
引用
收藏
页码:4440 / 4448
页数:9
相关论文
共 50 条
  • [1] Using coding and non-coding rare variants to target candidate genes in patients with severe tinnitus
    Alvaro Gallego-Martinez
    Alba Escalera-Balsera
    Natalia Trpchevska
    Paula Robles-Bolivar
    Pablo Roman-Naranjo
    Lidia Frejo
    Patricia Perez-Carpena
    Jan Bulla
    Silvano Gallus
    Barbara Canlon
    Christopher R. Cederroth
    Jose A. Lopez-Escamez
    npj Genomic Medicine, 7
  • [2] Using coding and non-coding rare variants to target candidate genes in patients with severe tinnitus
    Gallego-Martinez, Alvaro
    Escalera-Balsera, Alba
    Trpchevska, Natalia
    Robles-Bolivar, Paula
    Roman-Naranjo, Pablo
    Frejo, Lidia
    Perez-Carpena, Patricia
    Bulla, Jan
    Gallus, Silvano
    Canlon, Barbara
    Cederroth, Christopher R. R.
    Lopez-Escamez, Jose A. A.
    NPJ GENOMIC MEDICINE, 2022, 7 (01)
  • [3] Deciphering non-coding variants with GeneHancer regulatory regions
    Fishilevich, S.
    Barshir, R.
    Stein, T. Iny
    Safran, M.
    Lancet, D.
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2020, 28 (SUPPL 1) : 652 - 653
  • [4] Predicting the impact of non-coding variants on DNA methylation
    Zeng, Haoyang
    Gifford, David K.
    NUCLEIC ACIDS RESEARCH, 2017, 45 (11)
  • [5] Non-coding transcript variants of protein-coding genes - what are they good for?
    Dhamija, Sonam
    Menon, Manoj B.
    RNA BIOLOGY, 2018, 15 (08) : 1025 - 1031
  • [6] Principles and methods of in-silico prioritization of non-coding regulatory variants
    Lee, Phil H.
    Lee, Christian
    Li, Xihao
    Wee, Brian
    Dwivedi, Tushar
    Daly, Mark
    HUMAN GENETICS, 2018, 137 (01) : 15 - 30
  • [7] RegVar: Tissue-specific Prioritization of Non-coding Regulatory Variants
    Lu, Hao
    Ma, Luyu
    Quan, Cheng
    Li, Lei
    Lu, Yiming
    Zhou, Gangqiao
    Zhang, Chenggang
    GENOMICS PROTEOMICS & BIOINFORMATICS, 2023, 21 (02) : 385 - 395
  • [8] Editorial: Deciphering Non-Coding Regulatory Variants: Computational and Functional Validation
    Chen, Li
    Li, Mulin Jun
    FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2021, 9
  • [9] Principles and methods of in-silico prioritization of non-coding regulatory variants
    Phil H. Lee
    Christian Lee
    Xihao Li
    Brian Wee
    Tushar Dwivedi
    Mark Daly
    Human Genetics, 2018, 137 : 15 - 30
  • [10] Identification of atrial fibrillation associated genes and functional non-coding variants
    van Ouwerkerk, Antoinette F.
    Bosada, Fernanda M.
    van Duijvenboden, Karel
    Hill, Matthew C.
    Montefiori, Lindsey E.
    Scholman, Koen T.
    Liu, Jia
    de Vries, Antoine A. F.
    Boukens, Bastiaan J.
    Ellinor, Patrick T.
    Goumans, Marie Jose T. H.
    Efimov, Igor R.
    Nobrega, Marcelo A.
    Barnett, Phil
    Martin, James F.
    Christoffels, Vincent M.
    NATURE COMMUNICATIONS, 2019, 10 (1)