Network-Constrained Forest for Regularized Omics Data Classification

被引:0
|
作者
Andel, Michael [1 ]
Klema, Jiri [1 ]
Krejcik, Zdenek [2 ]
机构
[1] Czech Tech Univ, Dept Comp Sci, Tech 2, CR-16635 Prague, Czech Republic
[2] Univ Nemocnice, Inst Hematol & Blood Transfus, Dept Mol Genet, Prague, Czech Republic
关键词
GENE SELECTION; CBL; EXPRESSION; CANCER; MUTATIONS; DISCOVERY; KNOWLEDGE; PROTEINS; KINASE; NPM1;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Contemporary molecular biology deals with a wide and heterogeneous set of measurements to model and understand underlying biological processes including complex diseases. Machine learning provides a frequent approach to build such models. However, the models built solely from measured data often suffer from overfitting, as the sample size is typically much smaller than the number of measured features. In this paper, we propose a random forest-based classifier that minimizes this overfitting with the aid of prior knowledge in the form of a feature interaction network. We illustrate the proposed method in the task of disease classification based on measured mRNA and miRNA profiles complemented by the interaction network composed of the miRNA-mRNA target relations and mRNA-mRNA interactions corresponding to the interactions between their encoded proteins. We demonstrate that the proposed network-constrained forest employs prior knowledge to increase learning bias and consequently to improve classification accuracy, stability and comprehensibility of the resulting model. The experiments are carried out in the domain of myelodysplastic syndrome that we are concerned about in the long term. We validate our approach in the public domain of ovarian carcinoma, with the same data form. We believe that the idea of a network-constrained forest can straightforwardly be generalized towards arbitrary omics data with an available and non-trivial feature interaction network.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Network-constrained forest for regularized classification of omics data
    Andel, Michael
    Klema, Jiri
    Krejcik, Zdenek
    [J]. METHODS, 2015, 83 : 88 - 97
  • [2] BiCoN: network-constrained biclustering of patients and omics data
    Lazareva, Olga
    Canzar, Stefan
    Yuan, Kevin
    Baumbach, Jan
    Blumenthal, David B.
    Tieri, Paolo
    Kacprowski, Tim
    List, Markus
    [J]. BIOINFORMATICS, 2021, 37 (16) : 2398 - 2404
  • [3] Network-constrained Support Vector Machine for Classification
    Chen, Li
    Xuan, Jianhua
    Wang, Yue
    Riggins, Rebecca B.
    Clarke, Robert
    [J]. SEVENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2008, : 60 - +
  • [4] A Repository of Network-Constrained Trajectory Data
    Funke, Stefan
    Storandt, Sabine
    [J]. MOVE++ 2019: PROCEEDINGS OF THE 1ST ACM SIGSPATIAL INTERNATIONAL WORKSHOP ON COMPUTING WITH MULTIFACETED MOVEMENT DATA, 2019,
  • [5] Length L-function for network-constrained point data
    Fang, Zidong
    Song, Ci
    Shu, Hua
    Chen, Jie
    Liu, Tianyu
    Wang, Xi
    Chen, Xiao
    Yan, Xiaorui
    Pei, Tao
    [J]. TRANSACTIONS IN GIS, 2023, 27 (02) : 476 - 493
  • [6] Adaptive Data Model and Index Structure for Network-constrained Trajectories
    Luo, Yubo
    Chen, Biyu
    [J]. Journal of Geo-Information Science, 2023, 25 (01) : 63 - 76
  • [7] Dealing with location uncertainty for modeling network-constrained lattice data
    Briz-Redon, Alvaro
    [J]. SPATIAL STATISTICS, 2024, 59
  • [8] Network-constrained regularization and variable selection for analysis of genomic data
    Li, Caiyan
    Li, Hongzhe
    [J]. BIOINFORMATICS, 2008, 24 (09) : 1175 - 1182
  • [9] Ripley's K-function for Network-Constrained Flow Data
    Kan, Zihan
    Kwan, Mei-Po
    Tang, Luliang
    [J]. GEOGRAPHICAL ANALYSIS, 2022, 54 (04) : 769 - 788
  • [10] Efficient Network-Constrained Trajectory Queries
    Torp, Kristian
    Hansen, Magnus N.
    [J]. 30TH ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS, ACM SIGSPATIAL GIS 2022, 2022, : 659 - 662