Random forests on Hadoop for genome-wide association studies of multivariate neuroimaging phenotypes

被引:24
|
作者
Wang, Yue [1 ,4 ]
Goh, Wilson [2 ,3 ]
Wong, Limsoon [2 ]
Montana, Giovanni [4 ]
机构
[1] Natl Univ Singapore, Grad Sch Integrat Sci & Engn, Singapore 117548, Singapore
[2] Natl Univ Singapore, Sch Comp, Singapore 117548, Singapore
[3] Imperial Coll London, Dept Comp, London, England
[4] Imperial Coll London, Dept Math, London, England
来源
BMC BIOINFORMATICS | 2013年 / 14卷
基金
英国工程与自然科学研究理事会; 美国国家卫生研究院; 加拿大健康研究院;
关键词
GENETIC ASSOCIATIONS; DISEASE;
D O I
10.1186/1471-2105-14-S16-S6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Multivariate quantitative traits arise naturally in recent neuroimaging genetics studies, in which both structural and functional variability of the human brain is measured non-invasively through techniques such as magnetic resonance imaging (MRI). There is growing interest in detecting genetic variants associated with such multivariate traits, especially in genome-wide studies. Random forests (RFs) classifiers, which are ensembles of decision trees, are amongst the best performing machine learning algorithms and have been successfully employed for the prioritisation of genetic variants in case-control studies. RFs can also be applied to produce gene rankings in association studies with multivariate quantitative traits, and to estimate genetic similarities measures that are predictive of the trait. However, in studies involving hundreds of thousands of SNPs and high-dimensional traits, a very large ensemble of trees must be inferred from the data in order to obtain reliable rankings, which makes the application of these algorithms computationally prohibitive. Results: We have developed a parallel version of the RF algorithm for regression and genetic similarity learning tasks in large-scale population genetic association studies involving multivariate traits, called PaRFR (Parallel Random Forest Regression). Our implementation takes advantage of the MapReduce programming model and is deployed on Hadoop, an open-source software framework that supports data-intensive distributed applications. Notable speed-ups are obtained by introducing a distance-based criterion for node splitting in the tree estimation process. PaRFR has been applied to a genome-wide association study on Alzheimer's disease (AD) in which the quantitative trait consists of a high-dimensional neuroimaging phenotype describing longitudinal changes in the human brain structure. PaRFR provides a ranking of SNPs associated to this trait, and produces pair-wise measures of genetic proximity that can be directly compared to pair-wise measures of phenotypic proximity. Several known AD-related variants have been identified, including APOE4 and TOMM40. We also present experimental evidence supporting the hypothesis of a linear relationship between the number of top-ranked mutated states, or frequent mutation patterns, and an indicator of disease severity. Availability: The Java codes are freely available at http://www2.imperial.ac.uk/similar to gmontana.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Random forests on Hadoop for genome-wide association studies of multivariate neuroimaging phenotypes
    Yue Wang
    Wilson Goh
    Limsoon Wong
    Giovanni Montana
    [J]. BMC Bioinformatics, 14
  • [2] Genome-Wide Association of Neuroimaging Phenotypes in PTSD at Multiple Sites
    Morey, Rajendra A.
    Logue, Mark
    Ashley-Koch, Allison
    Garrett, Melanie
    Lancaster, Sarah
    Hauser, Mike
    McLaughlin, Kate
    Peverill, Matthew
    Sheridan, Margaret
    Harpaz-Rotem, Ilan
    Levy, Ifat
    Wrocklage, Kristen
    Krystal, John
    Abdallah, Chadi
    Thompson, Paul
    Dennis, Emily
    Baboyan, Vatche
    Harrison, Marc
    Thomaes, Kathleen
    Veltman, Dick
    Koch, Saskia
    Geuze, Elbert
    Stein, Dan
    Ipser, Jonathan
    Ressler, Kerry
    Stevens, Jennifer
    Miller, Mark
    van Rooij, Sanne
    [J]. BIOLOGICAL PSYCHIATRY, 2016, 79 (09) : 165S - 165S
  • [3] Imputing Phenotypes for Genome-wide Association Studies
    Hormozdiari, Farhad
    Kang, Eun Yong
    Bilow, Michael
    Ben-David, Eyal
    Vulpe, Chris
    McLachlan, Stela
    Lusis, Aldons J.
    Han, Buhm
    Eskin, Eleazar
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2016, 99 (01) : 89 - 103
  • [4] Discovery of Multivariate Phenotypes using Association Rule Mining and their Application to Genome-wide Association Studies
    Park, Sung Hee
    Kim, Sangsoo
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS (BIBMW), 2010, : 324 - 329
  • [5] ABUNDANT PLEIOTROPY ACROSS NEUROIMAGING MODALITIES IDENTIFIED THROUGH MULTIVARIATE GENOME-WIDE ASSOCIATION STUDIES
    Tissink, Elleke
    Shadrin, Alexey
    van der Meer, Dennis
    Roelfs, Daniel
    Fan, Chun
    Parker, Nadine
    Hindley, Guy
    Frei, Oleksandr
    Kaufmann, Tobias
    Dale, Anders
    Andreassen, Ole
    [J]. EUROPEAN NEUROPSYCHOPHARMACOLOGY, 2022, 63 : E56 - E57
  • [6] Genome-wide association studies of cardiac electrical phenotypes
    Glinge, Charlotte
    Lahrouchi, Najim
    Jabbari, Reza
    Tfelt-Hansen, Jacob
    Bezzina, Connie R.
    [J]. CARDIOVASCULAR RESEARCH, 2020, 116 (09) : 1620 - 1634
  • [7] Pattern discovery of multivariate phenotypes by Association Rule Mining and its scheme for Genome-Wide Association Studies
    Park, Sung Hee
    Kim, Sangsoo
    [J]. INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2012, 6 (05) : 505 - 520
  • [8] Evaluation of random forests performance for genome-wide association studies in the presence of interaction effects
    Yoonhee Kim
    Robert Wojciechowski
    Heejong Sung
    Rasika A Mathias
    Li Wang
    Alison P Klein
    Rhoshel K Lenroot
    James Malley
    Joan E Bailey-Wilson
    [J]. BMC Proceedings, 3 (Suppl 7)
  • [9] Multivariate SNP screen for genome-wide association studies
    Lubke, Gitta
    [J]. BEHAVIOR GENETICS, 2010, 40 (06) : 803 - 803
  • [10] Identifying Pleiotropic Genes in Genome-Wide Association Studies for Multivariate Phenotypes with Mixed Measurement Scales
    Yang, James J.
    Williams, L. Keoki
    Buu, Anne
    [J]. PLOS ONE, 2017, 12 (01):