Model, properties and imputation method of missing SNP genotype data utilizing mutual information

被引:3
|
作者
Wang, Ying [1 ]
Wan, Weiming [1 ]
Wang, Rui-Sheng [2 ]
Feng, Enmin [3 ]
机构
[1] Dalian Jiaotong Univ, Sch Sci, Dalian 116028, Peoples R China
[2] Renmin Univ China, Sch Informat, Beijing 100872, Peoples R China
[3] Dalian Univ Technol, Dept Appl Math, Dalian 116024, Peoples R China
基金
中国国家自然科学基金;
关键词
Mutual information; Imputation method; Missing genotype data; Missing SNP site; Extension method;
D O I
10.1016/j.cam.2008.10.020
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Mutual information can be used as a measure for the association of a genetic marker or a combination of markers with the phenotype. In this paper, we study the imputation of missing genotype data. We first utilize joint mutual information to compute the dependence between SNP sites, then construct a mathematical model in order to find the two SNP sites having maximal dependence with missing SNP sites, and further study the properties of this model. Finally, an extension method to haplotype-based imputation is proposed to impute the missing values in genotype data. To verify our method, extensive experiments have been performed, and numerical results show that our method is superior to haplotype-based imputation methods. At the same time, numerical results also prove joint mutual information can better measure the dependence between SNP sites. According to experimental results, we also conclude that the dependence between the adjacent SNP sites is not necessarily strongest. (C) 2008 Elsevier B.V. All rights reserved.
引用
收藏
页码:168 / 174
页数:7
相关论文
共 50 条
  • [41] Missing information in imbalanced data stream: fuzzy adaptive imputation approach
    Halder, Bohnishikha
    Ahmed, Md Manjur
    Amagasa, Toshiyuki
    Isa, Nor Ashidi Mat
    Faisal, Rahat Hossain
    Rahman, Md Mostafijur
    APPLIED INTELLIGENCE, 2022, 52 (05) : 5561 - 5583
  • [42] Missing information in imbalanced data stream: fuzzy adaptive imputation approach
    Bohnishikha Halder
    Md Manjur Ahmed
    Toshiyuki Amagasa
    Nor Ashidi Mat Isa
    Rahat Hossain Faisal
    Md. Mostafijur Rahman
    Applied Intelligence, 2022, 52 : 5561 - 5583
  • [43] Hybrid prediction model with missing value imputation for medical data
    Purwar, Archana
    Singh, Sandeep Kumar
    EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (13) : 5621 - 5631
  • [44] Empirical likelihood for parametric model under imputation for missing data
    Wang, Lichun
    Wang, Qihua
    JOURNAL OF STATISTICS & MANAGEMENT SYSTEMS, 2006, 9 (01): : 1 - 13
  • [45] Uncertainty Management in Model-Based Imputation for Missing Data
    Azarkhail, Mohammadreza
    Woytowitz, Peter
    59TH ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM (RAMS), 2013,
  • [46] Missing value imputation for gene expression data: computational techniques to recover missing data from available information
    Liew, Alan Wee-Chung
    Law, Ngai-Fong
    Yan, Hong
    BRIEFINGS IN BIOINFORMATICS, 2011, 12 (05) : 498 - 513
  • [47] A Novel Imputation Model for Missing Concrete Dam Monitoring Data
    Cui, Xinran
    Gu, Hao
    Gu, Chongshi
    Cao, Wenhan
    Wang, Jiayi
    MATHEMATICS, 2023, 11 (09)
  • [48] Feature selection with missing data using mutual information estimators
    Doquire, Gauthier
    Verleysen, Michel
    NEUROCOMPUTING, 2012, 90 : 3 - 11
  • [49] Univariate imputation method for recovering missing data in wastewater treatment process
    Han, Honggui
    Sun, Meiting
    Han, Huayun
    Wu, Xiaolong
    Qiao, Junfei
    CHINESE JOURNAL OF CHEMICAL ENGINEERING, 2023, 53 : 201 - 210
  • [50] Multiple Imputation with Predictive Mean Matching Method for Numerical Missing Data
    Akmam, Emha Fathul
    Siswantining, Titin
    Soemartojo, Saskya Mary
    Sarwinda, Devvi
    2019 3RD INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTATIONAL SCIENCES (ICICOS 2019), 2019,