Predicting target genes of non-coding regulatory variants with IRT

被引:4
|
作者
Wu, Zhenqin [1 ,2 ]
Ioannidis, Nilah M. [2 ]
Zou, James [2 ,3 ]
机构
[1] Stanford Univ, Dept Chem, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Biomed Data Sci, Sch Med, Stanford, CA 94305 USA
[3] Chan Zuckerberg Biohub, San Francisco, CA 94158 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
GENOME-WIDE ASSOCIATION; HUMAN PIGMENTATION; EXPRESSION; ANNOTATION; IRF4; MC1R; IDENTIFICATION; FRAMEWORK; IMPACT; LOCI;
D O I
10.1093/bioinformatics/btaa254
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Interpreting genetic variants of unknown significance (VUS) is essential in clinical applications of genome sequencing for diagnosis and personalized care. Non-coding variants remain particularly difficult to interpret, despite making up a large majority of trait associations identified in genome-wide association studies (GWAS) analyses. Predicting the regulatory effects of non-coding variants on candidate genes is a key step in evaluating their clinical significance. Here, we develop a machine-learning algorithm, Inference of Connected expression quantitative trait loci (eQTLs) (IRT), to predict the regulatory targets of non-coding variants identified in studies of eQTLs. We assemble datasets using eQTL results from the Genotype-Tissue Expression (GTEx) project and learn to separate positive and negative pairs based on annotations characterizing the variant, gene and the intermediate sequence. IRT achieves an area under the receiver operating characteristic curve (ROC-AUC) of 0.799 using random cross-validation, and 0.700 for a more stringent position-based cross-validation. Further evaluation on rare variants and experimentally validated regulatory variants shows a significant enrichment in IRT identifying the true target genes versus negative controls. In gene-ranking experiments, IRT achieves a top-1 accuracy of 50% and top-3 accuracy of 90%. Salient features, including GC-content, histone modifications and Hi-C interactions are further analyzed and visualized to illustrate their influences on predictions. IRT can be applied to any VUS of interest and each candidate nearby gene to output a score reflecting the likelihood of regulatory effect on the expression level. These scores can be used to prioritize variants and genes to assist in patient diagnosis and GWAS follow-up studies.
引用
收藏
页码:4440 / 4448
页数:9
相关论文
共 50 条
  • [21] Predicting non-coding RNA genes in Escherichia coli with boosted genetic programming
    Sætrom, P
    Sneve, R
    Kristiansen, KI
    Snove, O
    Grünfeld, T
    Rognes, T
    Seeberg, E
    NUCLEIC ACIDS RESEARCH, 2005, 33 (10) : 3263 - 3270
  • [22] Role of non-coding sequence variants in cancer
    Ekta Khurana
    Yao Fu
    Dimple Chakravarty
    Francesca Demichelis
    Mark A. Rubin
    Mark Gerstein
    Nature Reviews Genetics, 2016, 17 : 93 - 108
  • [23] Triaging risk variants in the non-coding genome
    Koch, Linda
    NATURE REVIEWS GENETICS, 2014, 15 (12) : 779 - 779
  • [24] Role of Non-Coding Variants in Brugada Syndrome
    Perez-Agustin, Adrian
    Pinsach-Abuin, Mel Lina
    Pagans, Sara
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2020, 21 (22) : 1 - 19
  • [25] Exploration of Coding and Non-coding Variants in Cancer Using GenomePaint
    Zhou, Xin
    Wang, Jian
    Patel, Jaimin
    Valentine, Marc
    Shao, Ying
    Newman, Scott
    Sioson, Edgar
    Tian, Liqing
    Liu, Yu
    Brady, Samuel W.
    Flasch, Diane
    Ma, Xiaotu
    Liu, Yanling
    Paul, Robin
    Edmonson, Michael N.
    Rusch, Michael C.
    Li, Chunliang
    Baker, Suzanne J.
    Easton, John
    Zhang, Jinghui
    CANCER CELL, 2021, 39 (01) : 83 - +
  • [26] Role of non-coding variants in cardiovascular disease
    Heshmatzad, Katayoun
    Naderi, Niloofar
    Maleki, Majid
    Abbasi, Shiva
    Ghasemi, Serwa
    Ashrafi, Nooshin
    Fazelifar, Amir Farjam
    Mahdavi, Mohammad
    Kalayinia, Samira
    JOURNAL OF CELLULAR AND MOLECULAR MEDICINE, 2023, 27 (12) : 1621 - 1636
  • [27] Non-coding genetic variants in human disease
    Zhang, Feng
    Lupski, James R.
    HUMAN MOLECULAR GENETICS, 2015, 24 : R102 - R110
  • [28] Triaging risk variants in the non-coding genome
    Linda Koch
    Nature Reviews Genetics, 2014, 15 : 779 - 779
  • [29] Role of non-coding sequence variants in cancer
    Khurana, Ekta
    Fu, Yao
    Chakravarty, Dimple
    Demichelis, Francesca
    Rubin, Marka.
    Gerstein, Mark
    NATURE REVIEWS GENETICS, 2016, 17 (02) : 93 - 108
  • [30] Non-coding DNA variants for risk in lupus☆
    Zhang, Yutong
    Hou, Guojun
    Shen, Nan
    BEST PRACTICE & RESEARCH IN CLINICAL RHEUMATOLOGY, 2024, 38 (02):