Impute Gene Expression Missing Values via Biological Networks: Optimal Fusion of Data and Knowledge

被引:2
|
作者
Xiang, Mingrong [1 ]
Hou, Jingyu [1 ]
Luo, Wei [1 ]
Tao, Wenjing [2 ]
Wang, Deshou [2 ]
机构
[1] Deakin Univ, Sch Informat Technol, Melbourne, Vic, Australia
[2] Southwest Univ, Key Lab Freshwater Fish Reprod & Dev, Key Lab Aquat Sci Chongqing, Sch Life Sci,Minist Educ, Chongqing 400715, Peoples R China
来源
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2021年
关键词
Missing Data Imputation; Biological Network; Gene Expression Data; Graph Neural Network; CHAINED EQUATIONS; IMPUTATION;
D O I
10.1109/IJCNN52387.2021.9533355
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Gene expression data often contain missing values that, if not handled properly, may mislead or invalidate the downstream analyses. With the emergence of graph neural networks (GNN), domain knowledge about gene regulation can be leveraged to guide the missing data imputation. We show in this paper, however, that naive application of GNN on the raw gene-expression data can actually lead to worse imputation. We analyse this problem considering both the intrinsic property of GNN message passing and potential data-knowledge inconsistency. We propose two measures towards optimal integration of biological networks in the gene-expression missing data imputation. These include expression data normalisation and a weighting scheme for GNN message passing. Experiments on two different biological networks and gene expression datasets show that our method outperforms state-of-the-art generic imputation algorithms and alternative GNN models, obtaining lower mean absolute error (MAE) consistently.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] Application of Biological Domain Knowledge Based Feature Selection on Gene Expression Data
    Yousef, Malik
    Kumar, Abhishek
    Bakir-Gungor, Burcu
    ENTROPY, 2021, 23 (01) : 1 - 15
  • [32] Classification by integrating plant stress response gene expression data with biological knowledge
    Meng, Jun
    Li, Rui
    Luan, Yushi
    MATHEMATICAL BIOSCIENCES, 2015, 266 : 65 - 72
  • [33] Integrating biological knowledge based on functional annotations for biclustering of gene expression data
    Nepomuceno, Juan A.
    Troncoso, Alicia
    Nepomuceno-Chamorro, Isabel A.
    Aguilar-Ruiz, Jesus S.
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2015, 119 (03) : 163 - 180
  • [34] Predicting Missing Values in Medical Data Via XGBoost Regression
    Xinmeng Zhang
    Chao Yan
    Cheng Gao
    Bradley A. Malin
    You Chen
    Journal of Healthcare Informatics Research, 2020, 4 : 383 - 394
  • [35] Predicting Missing Values in Medical Data Via XGBoost Regression
    Zhang, Xinmeng
    Yan, Chao
    Gao, Cheng
    Malin, Bradley A.
    Chen, You
    JOURNAL OF HEALTHCARE INFORMATICS RESEARCH, 2020, 4 (04) : 383 - 394
  • [36] Optimal data encoding and fusion in sensor networks
    Zherlitsyn, Gleb
    Matveev, Alexey
    2009 IEEE CONTROL APPLICATIONS CCA & INTELLIGENT CONTROL (ISIC), VOLS 1-3, 2009, : 666 - 670
  • [37] Effectiveness of Different Partition Based Clustering Algorithms for Estimation of Missing Values in Microarray Gene Expression Data
    Bose, Shilpi
    Das, Chandra
    Chakraborty, Abirlal
    Chattopadhyay, Samiran
    ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY, VOL 2, 2013, 177 : 37 - +
  • [38] Inferring gene expression networks via static and dynamic data integration
    Ferrazzi, Fulvia
    Magni, Paolo
    Sacchi, Lucia
    Bellazzi, Riccardo
    UBIQUITY: TECHNOLOGIES FOR BETTER HEALTH IN AGING SOCIETIES, 2006, 124 : 119 - 124
  • [39] Gene expression microarrays and the integration of biological knowledge
    Noordewier, MO
    Warren, PV
    TRENDS IN BIOTECHNOLOGY, 2001, 19 (10) : 412 - 415
  • [40] A new unsupervised gene clustering algorithm based on the integration of biological knowledge into expression data
    Verbanck, Marie
    Le, Sebastien
    Pages, Jerome
    BMC BIOINFORMATICS, 2013, 14