Predicting target genes of non-coding regulatory variants with IRT

被引:4
|
作者
Wu, Zhenqin [1 ,2 ]
Ioannidis, Nilah M. [2 ]
Zou, James [2 ,3 ]
机构
[1] Stanford Univ, Dept Chem, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Biomed Data Sci, Sch Med, Stanford, CA 94305 USA
[3] Chan Zuckerberg Biohub, San Francisco, CA 94158 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
GENOME-WIDE ASSOCIATION; HUMAN PIGMENTATION; EXPRESSION; ANNOTATION; IRF4; MC1R; IDENTIFICATION; FRAMEWORK; IMPACT; LOCI;
D O I
10.1093/bioinformatics/btaa254
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Interpreting genetic variants of unknown significance (VUS) is essential in clinical applications of genome sequencing for diagnosis and personalized care. Non-coding variants remain particularly difficult to interpret, despite making up a large majority of trait associations identified in genome-wide association studies (GWAS) analyses. Predicting the regulatory effects of non-coding variants on candidate genes is a key step in evaluating their clinical significance. Here, we develop a machine-learning algorithm, Inference of Connected expression quantitative trait loci (eQTLs) (IRT), to predict the regulatory targets of non-coding variants identified in studies of eQTLs. We assemble datasets using eQTL results from the Genotype-Tissue Expression (GTEx) project and learn to separate positive and negative pairs based on annotations characterizing the variant, gene and the intermediate sequence. IRT achieves an area under the receiver operating characteristic curve (ROC-AUC) of 0.799 using random cross-validation, and 0.700 for a more stringent position-based cross-validation. Further evaluation on rare variants and experimentally validated regulatory variants shows a significant enrichment in IRT identifying the true target genes versus negative controls. In gene-ranking experiments, IRT achieves a top-1 accuracy of 50% and top-3 accuracy of 90%. Salient features, including GC-content, histone modifications and Hi-C interactions are further analyzed and visualized to illustrate their influences on predictions. IRT can be applied to any VUS of interest and each candidate nearby gene to output a score reflecting the likelihood of regulatory effect on the expression level. These scores can be used to prioritize variants and genes to assist in patient diagnosis and GWAS follow-up studies.
引用
收藏
页码:4440 / 4448
页数:9
相关论文
共 50 条
  • [41] Prioritization of regulatory variants with tissue-specific function in the non-coding regions of human genome
    Dong, Shengcheng
    Boyle, Alan P.
    NUCLEIC ACIDS RESEARCH, 2022, 50 (01)
  • [42] Local regulatory networks across two tissues and applications to analyze rare non-coding variants
    Reymond, A.
    Delaneau, O.
    Zazhytska, M.
    Popadin, K.
    Kumar, S.
    Ambrosini, G.
    Gschwind, A.
    Borel, C.
    Marbach, D.
    Lamparter, D.
    Wiederkehr, M.
    Bergmann, S.
    Bucher, P.
    Antonarakis, S. E.
    Dermitzakis, E. T.
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2018, 26 : 44 - 44
  • [43] A cell type-aware framework for nominating non-coding variants in Mendelian regulatory disorders
    Lee, Arthur S.
    Ayers, Lauren J.
    Kosicki, Michael
    Chan, Wai-Man
    Fozo, Lydia N.
    Pratt, Brandon M.
    Collins, Thomas E.
    Zhao, Boxun
    Rose, Matthew F.
    Sanchis-Juan, Alba
    Fu, Jack M.
    Wong, Isaac
    Zhao, Xuefang
    Tenney, Alan P.
    Lee, Cassia
    Laricchia, Kristen M.
    Barry, Brenda J.
    Bradford, Victoria R.
    Jurgens, Julie A.
    England, Eleina M.
    Lek, Monkol
    Macarthur, Daniel G.
    Lee, Eunjung Alice
    Talkowski, Michael E.
    Brand, Harrison
    Pennacchio, Len A.
    Engle, Elizabeth C.
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [44] Novel association of genetic variants in non-coding regulatory regions with HIV-1 infection
    Waqar, Walifa
    Altaf, Saba
    Nazir, Sadia
    Javed, Aneela
    INFECTION GENETICS AND EVOLUTION, 2020, 85
  • [45] Predicting pathogenicity from non-coding mutations
    Campbell, Colin
    Francis, Amy
    Gaunt, Tom R. R.
    NATURE BIOMEDICAL ENGINEERING, 2023, 7 (06) : 709 - 710
  • [46] Predicting pathogenicity from non-coding mutations
    Colin Campbell
    Amy Francis
    Tom R. Gaunt
    Nature Biomedical Engineering, 2023, 7 : 709 - 710
  • [47] In silico analysis of non-coding RNAs and putative target genes implicated in metabolic syndrome
    Dandare, Abdullahi
    Rabia, Ghulam
    Khan, Muhammad Jawad
    COMPUTERS IN BIOLOGY AND MEDICINE, 2021, 130
  • [48] Expression analysis of long non-coding RNAs and their target genes in multiple sclerosis patients
    Maziar Ganji
    Arezou Sayad
    Mir Davood Omrani
    Shahram Arsang-Jang
    Mehrdokht Mazdeh
    Mohammad Taheri
    Neurological Sciences, 2019, 40 : 801 - 811
  • [49] Predicting pathogenic non-coding variants on imbalanced data set using cluster ensemble sampling
    Chuang, Kai-Wen
    Chen, Chien-Yu
    2019 IEEE 19TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2019, : 850 - 855
  • [50] Prioritization of non-coding disease-causing variants and long non-coding RNAs in liver cancer
    Li, Hua
    He, Zekun
    Gu, Yang
    Fang, Lin
    Lv, Xin
    ONCOLOGY LETTERS, 2016, 12 (05) : 3987 - 3994