Impute Gene Expression Missing Values via Biological Networks: Optimal Fusion of Data and Knowledge

被引:2
|
作者
Xiang, Mingrong [1 ]
Hou, Jingyu [1 ]
Luo, Wei [1 ]
Tao, Wenjing [2 ]
Wang, Deshou [2 ]
机构
[1] Deakin Univ, Sch Informat Technol, Melbourne, Vic, Australia
[2] Southwest Univ, Key Lab Freshwater Fish Reprod & Dev, Key Lab Aquat Sci Chongqing, Sch Life Sci,Minist Educ, Chongqing 400715, Peoples R China
关键词
Missing Data Imputation; Biological Network; Gene Expression Data; Graph Neural Network; CHAINED EQUATIONS; IMPUTATION;
D O I
10.1109/IJCNN52387.2021.9533355
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Gene expression data often contain missing values that, if not handled properly, may mislead or invalidate the downstream analyses. With the emergence of graph neural networks (GNN), domain knowledge about gene regulation can be leveraged to guide the missing data imputation. We show in this paper, however, that naive application of GNN on the raw gene-expression data can actually lead to worse imputation. We analyse this problem considering both the intrinsic property of GNN message passing and potential data-knowledge inconsistency. We propose two measures towards optimal integration of biological networks in the gene-expression missing data imputation. These include expression data normalisation and a weighting scheme for GNN message passing. Experiments on two different biological networks and gene expression datasets show that our method outperforms state-of-the-art generic imputation algorithms and alternative GNN models, obtaining lower mean absolute error (MAE) consistently.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] AN IMPROVED FUZZY BASED APPROACH TO IMPUTE MISSING VALUES IN DNA MICROARRAY GENE EXPRESSION DATA WITH COLLABORATIVE FILTERING
    Saha, Sujay
    Bandopadhyay, Saikat
    Ghosh, Anupam
    Dey, Kashi Nath
    2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 911 - 916
  • [2] Random forest with Random projection to impute missing gene expression data
    Gondara, Lovedeep
    2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2015, : 1251 - 1256
  • [3] A principal component method to impute missing values for mixed data
    Audigier, Vincent
    Husson, Francois
    Josse, Julie
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2016, 10 (01) : 5 - 26
  • [4] Web-based knowledge aquisition to impute missing values for classification
    Tang, N
    Vemuri, VR
    IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2004), PROCEEDINGS, 2004, : 124 - 130
  • [5] A principal component method to impute missing values for mixed data
    Vincent Audigier
    François Husson
    Julie Josse
    Advances in Data Analysis and Classification, 2016, 10 : 5 - 26
  • [6] To Tolerate or To Impute Missing Values in V2X Communications Data?
    Razavi-Far, Roozbeh
    Wan, Daoming
    Saif, Mehrdad
    Mozafari, Niloofar
    IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (13) : 11442 - 11452
  • [7] Improved KNN Imputation for Missing Values in Gene Expression Data
    Keerin, Phimmarin
    Boongoen, Tossapon
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (02): : 4009 - 4025
  • [8] Imputation of missing values in DNA microarray gene expression data
    Kim, H
    Golub, GH
    Park, H
    2004 IEEE COMPUTATIONAL SYSTEMS BIOINFORMATICS CONFERENCE, PROCEEDINGS, 2004, : 572 - 573
  • [9] Combining microarrays and biological knowledge for estimating gene networks via Bayesian networks
    Imoto, S
    Higuchi, T
    Goto, T
    Tashiro, K
    Kuhara, S
    Miyano, S
    PROCEEDINGS OF THE 2003 IEEE BIOINFORMATICS CONFERENCE, 2003, : 104 - 113
  • [10] Gene expression clustering: Dealing with the missing values
    Gruzdz, A
    Ihnatowicz, A
    Slezak, D
    INTELLIGENT INFORMATION PROCESSING AND WEB MINING, PROCEEDINGS, 2005, : 521 - 530