Reaching the End-Game for GWAS: Machine Learning Approaches for the Prioritization of Complex Disease Loci

被引:71
|
作者
Nicholls, Hannah L. [1 ,2 ]
John, Christopher R. [2 ,3 ]
Watson, David S. [2 ,4 ]
Munroe, Patricia B. [1 ,5 ]
Barnes, Michael R. [1 ,2 ,5 ,6 ]
Cabrera, Claudia P. [1 ,2 ,5 ]
机构
[1] Queen Mary Univ London, Barts & London Sch Med & Dent, William Harvey Res Inst, Clin Pharmacol, London, England
[2] Queen Mary Univ London, Barts & London Sch Med & Dent, William Harvey Res Inst, Ctr Translat Bioinformat, London, England
[3] Queen Mary Univ London, Barts & London Sch Med & Dent, William Harvey Res Inst, Ctr Expt Med & Rheumatol, London, England
[4] Univ Oxford, Oxford Internet Inst, Oxford, England
[5] Queen Mary Univ London, Barts & London Sch Med & Dent, NIHR Barts Biomed Res Ctr, London, England
[6] Alan Turing Inst, British Lib, London, England
关键词
machine learning; artificial intelligence; genome-wide association study; genomics; candidate gene; clinical translation; deep learning; data science; GENOME-WIDE ASSOCIATION; VARIABLE SELECTION; GENE; RISK; SCHIZOPHRENIA; IMPACT;
D O I
10.3389/fgene.2020.00350
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Genome-wide association studies (GWAS) have revealed thousands of genetic loci that underpin the complex biology of many human traits. However, the strength of GWAS - the ability to detect genetic association by linkage disequilibrium (LD) - is also its limitation. Whilst the ever-increasing study size and improved design have augmented the power of GWAS to detect effects, differentiation of causal variants or genes from other highly correlated genes associated by LD remains the real challenge. This has severely hindered the biological insights and clinical translation of GWAS findings. Although thousands of disease susceptibility loci have been reported, causal genes at these loci remain elusive. Machine learning (ML) techniques offer an opportunity to dissect the heterogeneity of variant and gene signals in the post-GWAS analysis phase. ML models for GWAS prioritization vary greatly in their complexity, ranging from relatively simple logistic regression approaches to more complex ensemble models such as random forests and gradient boosting, as well as deep learning models, i.e., neural networks. Paired with functional validation, these methods show important promise for clinical translation, providing a strong evidence-based approach to direct post-GWAS research. However, as ML approaches continue to evolve to meet the challenge of causal gene identification, a critical assessment of the underlying methodologies and their applicability to the GWAS prioritization problem is needed. This review investigates the landscape of ML applications in three parts: selected models, input features, and output model performance, with a focus on prioritizations of complex disease associated loci. Overall, we explore the contributions ML has made towards reaching the GWAS end-game with consequent wide-ranging translational impact.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Importance of GWAS Risk Loci and Clinical Data in Predicting Asthma Using Machine-learning Approaches
    Qin, Zan-Mei
    Liang, Si-Qiao
    Long, Jian-Xiong
    Deng, Jing-Min
    Wei, Xuan
    Yang, Mei-Ling
    Tang, Shao-Jie
    Li, Hai-Li
    [J]. COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2024, 27 (03) : 400 - 407
  • [2] MendelVar: gene prioritization at GWAS loci using phenotypic enrichment of Mendelian disease genes
    Sobczyk, M. K.
    Gaunt, T. R.
    Paternoster, L.
    [J]. BIOINFORMATICS, 2021, 37 (01) : 1 - 8
  • [3] A machine learning approach for gene prioritization in Parkinson's disease
    Lanore, Aymeric
    Basset, Aymeric
    Lesage, Suzanne
    [J]. BRAIN, 2024, 147 (03) : 743 - 745
  • [4] Machine Learning Approaches in Parkinson's Disease
    Landolfi, Annamaria
    Ricciardi, Carlo
    Donisi, Leandro
    Cesarelli, Giuseppe
    Troisi, Jacopo
    Vitale, Carmine
    Barone, Paolo
    Amboni, Marianna
    [J]. CURRENT MEDICINAL CHEMISTRY, 2021, 28 (32) : 6548 - 6568
  • [5] Machine Learning Approaches for the Prioritization of Genomic Variants Impacting Pre-mRNA Splicing
    Rowlands, Charlie F.
    Baralle, Diana
    Ellingford, Jamie M.
    [J]. CELLS, 2019, 8 (12)
  • [6] Candidate gene prioritization by network analysis of differential expression using machine learning approaches
    Daniela Nitsch
    Joana P Gonçalves
    Fabian Ojeda
    Bart de Moor
    Yves Moreau
    [J]. BMC Bioinformatics, 11
  • [7] Machine Learning Approaches in Inflammatory Bowel Disease
    Scarpino, Ileana
    Vallelunga, Rosarina
    Luzza, Francesco
    Cannataro, Mario
    [J]. COMPUTATIONAL SCIENCE, ICCS 2022, PT II, 2022, : 539 - 545
  • [8] Candidate gene prioritization by network analysis of differential expression using machine learning approaches
    Nitsch, Daniela
    Goncalves, Joana P.
    Ojeda, Fabian
    de Moor, Bart
    Moreau, Yves
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [9] Machine Learning Approaches for Predicting Protein Complex Similarity
    Farhoodi, Roshanak
    Akbal-Delibas, Bahar
    Haspel, Nurit
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2017, 24 (01) : 40 - 51
  • [10] Prioritization of disease genes from GWAS using ensemble-based positive-unlabeled learning
    Nikita Kolosov
    Mark J. Daly
    Mykyta Artomov
    [J]. European Journal of Human Genetics, 2021, 29 : 1527 - 1535