Model, properties and imputation method of missing SNP genotype data utilizing mutual information

被引:3
|
作者
Wang, Ying [1 ]
Wan, Weiming [1 ]
Wang, Rui-Sheng [2 ]
Feng, Enmin [3 ]
机构
[1] Dalian Jiaotong Univ, Sch Sci, Dalian 116028, Peoples R China
[2] Renmin Univ China, Sch Informat, Beijing 100872, Peoples R China
[3] Dalian Univ Technol, Dept Appl Math, Dalian 116024, Peoples R China
基金
中国国家自然科学基金;
关键词
Mutual information; Imputation method; Missing genotype data; Missing SNP site; Extension method;
D O I
10.1016/j.cam.2008.10.020
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Mutual information can be used as a measure for the association of a genetic marker or a combination of markers with the phenotype. In this paper, we study the imputation of missing genotype data. We first utilize joint mutual information to compute the dependence between SNP sites, then construct a mathematical model in order to find the two SNP sites having maximal dependence with missing SNP sites, and further study the properties of this model. Finally, an extension method to haplotype-based imputation is proposed to impute the missing values in genotype data. To verify our method, extensive experiments have been performed, and numerical results show that our method is superior to haplotype-based imputation methods. At the same time, numerical results also prove joint mutual information can better measure the dependence between SNP sites. According to experimental results, we also conclude that the dependence between the adjacent SNP sites is not necessarily strongest. (C) 2008 Elsevier B.V. All rights reserved.
引用
收藏
页码:168 / 174
页数:7
相关论文
共 50 条
  • [1] Fast accurate missing SNP genotype local imputation
    Wang Y.
    Cai Z.
    Stothard P.
    Moore S.
    Goebel R.
    Wang L.
    Lin G.
    BMC Research Notes, 5 (1)
  • [2] Missing data imputation by utilizing information within incomplete instances
    Zhang, Shichao
    Jin, Zhi
    Zhu, Xiaofeng
    JOURNAL OF SYSTEMS AND SOFTWARE, 2011, 84 (03) : 452 - 459
  • [3] K nearest neighbours with mutual information for simultaneous classification and missing data imputation
    Garcia-Laencina, Pedro J.
    Sancho-Gomez, Jose-Luis
    Figueiras-Vidal, Anibal R.
    Verleysen, Michel
    NEUROCOMPUTING, 2009, 72 (7-9) : 1483 - 1493
  • [4] Utilizing Genotype Imputation for the Augmentation of Sequence Data
    Fridley, Brooke L.
    Jenkins, Gregory
    Deyo-Svendsen, Matthew E.
    Hebbring, Scott
    Freimuth, Robert
    PLOS ONE, 2010, 5 (06):
  • [5] Multiple imputation of missing genotype data for unrelated individuals
    Souverein, OW
    Zwinderman, AH
    Tanck, MWT
    ANNALS OF HUMAN GENETICS, 2006, 70 : 372 - 381
  • [6] Assessing SNP-SNP interactions in the presence of missing genotype data
    Ruczinski, I.
    GENETIC EPIDEMIOLOGY, 2007, 31 (05) : 495 - 496
  • [7] Missing data imputation by K nearest neighbours based on grey relational structure and mutual information
    Ruilin Pan
    Tingsheng Yang
    Jianhua Cao
    Ke Lu
    Zhanchao Zhang
    Applied Intelligence, 2015, 43 : 614 - 632
  • [8] Missing data imputation by K nearest neighbours based on grey relational structure and mutual information
    Pan, Ruilin
    Yang, Tingsheng
    Cao, Jianhua
    Lu, Ke
    Zhang, Zhanchao
    APPLIED INTELLIGENCE, 2015, 43 (03) : 614 - 632
  • [9] MissII: Missing Information Imputation for Traffic Data
    Hou, Mingliang
    Tang, Tao
    Xia, Feng
    Sultan, Ibrahim
    Kaur, Roopdeep
    Kong, Xiangjie
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2024, 12 (03) : 752 - 765
  • [10] Imputation of missing information in worldwide patent data
    de Rassenfosse, Gaetan
    Seliger, Florian
    DATA IN BRIEF, 2021, 34