Random forests for genomic data analysis

被引:573
|
作者
Chen, Xi [1 ]
Ishwaran, Hemant
机构
[1] Vanderbilt Univ, Dept Biostat, Nashville, TN 37232 USA
基金
美国国家科学基金会;
关键词
Random forests; Random survival forests; Classification; Prediction; Variable selection; Genomic data analysis; VARIABLE IMPORTANCE MEASURES; MACHINE LEARNING ALGORITHMS; GENE-EXPRESSION DATA; PATHWAY ANALYSIS; PREDICTION; CLASSIFICATION; PERFORMANCE; MODEL; SNPS; ASSOCIATION;
D O I
10.1016/j.ygeno.2012.04.003
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Random forests (RF) is a popular tree-based ensemble machine learning tool that is highly data adaptive, applies to "large p, small n" problems, and is able to account for correlation as well as interactions among features. This makes RF particularly appealing for high-dimensional genomic data analysis. In this article, we systematically review the applications and recent progresses of RF for genomic data, including prediction and classification, variable selection, pathway analysis, genetic association and epistasis detection, and unsupervised learning. (C) 2012 Elsevier Inc. All rights reserved.
引用
收藏
页码:323 / 329
页数:7
相关论文
共 50 条
  • [1] Comparison of Variable Selection Methods in Random Forests for Genomic Data Sets
    Szymczak, Silke
    Malley, James
    Franke, Andre
    [J]. HUMAN HEREDITY, 2013, 76 (02) : 88 - 89
  • [2] A genomic random interval model for statistical analysis of genomic lesion data
    Pounds, Stan
    Cheng, Cheng
    Li, Shaoyu
    Liu, Zhifa
    Zhang, Jinghui
    Mullighan, Charles
    [J]. BIOINFORMATICS, 2013, 29 (17) : 2088 - 2095
  • [3] Random Forests for Big Data
    Genuer, Robin
    Poggi, Jean-Michel
    Tuleau-Malot, Christine
    Villa-Vialaneix, Nathalie
    [J]. BIG DATA RESEARCH, 2017, 9 : 28 - 46
  • [4] Random forests of binary hierarchical classifiers for analysis of hyperspectral data
    Crawford, MM
    Ham, JS
    Chen, YC
    Ghosh, JD
    [J]. 2003 IEEE WORKSHOP ON ADVANCES IN TECHNIQUES FOR ANALYSIS OF REMOTELY SENSED DATA, 2004, : 337 - 345
  • [5] Rotation of Random Forests for Genomic and Proteomic Classification Problems
    Stiglic, Gregor
    Rodriguez, Juan J.
    Kokol, Peter
    [J]. SOFTWARE TOOLS AND ALGORITHMS FOR BIOLOGICAL SYSTEMS, 2011, 696 : 211 - 221
  • [6] A Truly Spatial Random Forests Algorithm for Geoscience Data Analysis and Modelling
    Hassan Talebi
    Luk J. M. Peeters
    Alex Otto
    Raimon Tolosana-Delgado
    [J]. Mathematical Geosciences, 2022, 54 : 1 - 22
  • [7] A Truly Spatial Random Forests Algorithm for Geoscience Data Analysis and Modelling
    Talebi, Hassan
    Peeters, Luk J. M.
    Otto, Alex
    Tolosana-Delgado, Raimon
    [J]. MATHEMATICAL GEOSCIENCES, 2022, 54 (01) : 1 - 22
  • [8] Mining Big Data with Random Forests
    Lulli, Alessandro
    Oneto, Luca
    Anguita, Davide
    [J]. COGNITIVE COMPUTATION, 2019, 11 (02) : 294 - 316
  • [9] Mining Big Data with Random Forests
    Alessandro Lulli
    Luca Oneto
    Davide Anguita
    [J]. Cognitive Computation, 2019, 11 : 294 - 316
  • [10] Random Forests for Spatially Dependent Data
    Saha, Arkajyoti
    Basu, Sumanta
    Datta, Abhirup
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2023, 118 (541) : 665 - 683