Graph Random Forest: A Graph Embedded Algorithm for Identifying Highly Connected Important Features

被引:11
|
作者
Tian, Leqi [1 ,2 ]
Wu, Wenbin [1 ]
Yu, Tianwei [1 ,2 ,3 ]
机构
[1] Chinese Univ Hong Kong, Sch Data Sci, Shenzhen 518172, Peoples R China
[2] Shenzhen Res Inst Big Data, Shenzhen 518172, Peoples R China
[3] Guangdong Prov Key Lab Big Data Comp, Shenzhen 518172, Peoples R China
关键词
feature selection; random forest; gene network; CANCER;
D O I
10.3390/biom13071153
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Random Forest (RF) is a widely used machine learning method with good performance on classification and regression tasks. It works well under low sample size situations, which benefits applications in the field of biology. For example, gene expression data often involve much larger numbers of features (p) compared to the size of samples (n). Though the predictive accuracy using RF is often high, there are some problems when selecting important genes using RF. The important genes selected by RF are usually scattered on the gene network, which conflicts with the biological assumption of functional consistency between effective features. To improve feature selection by incorporating external topological information between genes, we propose the Graph Random Forest (GRF) for identifying highly connected important features by involving the known biological network when constructing the forest. The algorithm can identify effective features that form highly connected sub-graphs and achieve equivalent classification accuracy to RF. To evaluate the capability of our proposed method, we conducted simulation experiments and applied the method to two real datasets-non-small cell lung cancer RNA-seq data from The Cancer Genome Atlas, and human embryonic stem cell RNA-seq dataset (GSE93593). The resulting high classification accuracy, connectivity of selected sub-graphs, and interpretable feature selection results suggest the method is a helpful addition to graph-based classification models and feature selection procedures.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] An improved graph layout algorithm of embedded node attributes
    Tang, Ying
    Wang, Bin
    Fan, Jing
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2016, 28 (02): : 228 - 237
  • [32] Quasi-Monte Carlo Graph Random Features
    Reid, Isaac
    Choromanski, Krzysztof
    Weller, Adrian
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [33] LINEAR TRANSFORMER TOPOLOGICAL MASKING WITH GRAPH RANDOM FEATURES
    Reid, Isaac
    Dubey, Kumar Avinava
    Jain, Deepali
    Whitney, Will
    Ahmed, Amr
    Ainslie, Joshua
    Bewley, Alex
    Jacob, Mithun
    Mehta, Aranyak
    Rendleman, David
    Schenck, Connor
    Turner, Richard E.
    Wagner, René
    Weller, Adrian
    Choromanski, Krzysztof
    arXiv,
  • [34] Some features of the spread of epidemics and information on a random graph
    Durrett, Rick
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2010, 107 (10) : 4491 - 4498
  • [35] Supermodularity in Unweighted Graph Optimization III: Highly Connected Digraphs
    Berczi, Kristof
    Frank, Andras
    MATHEMATICS OF OPERATIONS RESEARCH, 2018, 43 (03) : 763 - 780
  • [36] Determination of Different Sizes of Partitioning Clusters in a Highly Connected Graph
    Lavangnananda, Kittichai
    Panyarit, Chidchanok
    2019 11TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SMART TECHNOLOGY (KST), 2019, : 40 - 45
  • [37] Less is More: Reweighting Important Spectral Graph Features for Recommendation
    Peng, Shaowen
    Sugiyama, Kazunari
    Mine, Tsunenori
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 1273 - 1282
  • [38] Scheduling Algorithm Based on Logistics Random Graph Theory
    Li, Jing
    Peng, Haiyun
    INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2016, 9 (07): : 243 - 254
  • [39] MATCHING INDEX AND ITS ALGORITHM OF UNCERTAIN RANDOM GRAPH
    Zhang, B.
    Peng, J.
    Li, S.
    APPLIED AND COMPUTATIONAL MATHEMATICS, 2018, 17 (01) : 22 - 35
  • [40] Recommendation algorithm based on random walks in a bipartite graph
    Gama, Ricardo
    André, Nuno
    Pereira, César
    Almeida, Luís
    Pinto, Pedro
    RISTI - Revista Iberica de Sistemas e Tecnologias de Informacao, 2011, (08): : 15 - 24