Cluster-based KNN Missing Value Imputation for DNA Microarray Data

被引:0
|
作者
Keerin, Phimmarin [1 ]
Kurutach, Werasak [1 ]
Boongoen, Tossapon [2 ]
机构
[1] Mahanakorn Univ Technol, Fac Informat Sci & Technol, Bangkok, Thailand
[2] Royal Thai Air Force Acad, Dept Math & Comp Sci, Bangkok, Thailand
关键词
missing value; imputation; microarray data; clustering; EXPRESSION DATA; CLASSIFICATION; CANCER;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Gene expressions measured using microarrays usually encounter the problem of missing values. Leaving this unsolved may critically degrade the reliability of any consequent downstream analysis or medical application. Yet, a further study of microarray data might be impossible with many analysis methods requiring a complete data set. This paper introduces a new methodology to impute missing values in microarray data. The proposed algorithm, CKNN impute, is an extension of k nearest neighbor imputation with local data clustering being incorporated for improved quality and efficiency. Gene expression data is typically represented as a matrix whose rows and columns correspond to genes and experiments, respectively. CKNN kicks off by finding a complete dataset via the removal of rows with missing value(s). Then, k clusters and their corresponding centroids are obtained by applying a clustering technique on the complete dataset. A set of similar genes of the target gene (with missing values) are those belonging to the cluster, whose centroid is the closest the target. Having known this, the target gene is imputed by applying k nearest neighbor method with similar genes previously determined. Empirical evaluation with published gene expression datasets suggest that the proposed technique performs better than the classical k nearest neighbor method and its extension found in the literature.
引用
收藏
页码:445 / 450
页数:6
相关论文
共 50 条
  • [1] An Improvement of Missing Value Imputation in DNA Microarray Data Using Cluster-based LLS Method
    Keerin, Phimmarin
    Kurutach, Werasak
    Boongoen, Tossapon
    [J]. 2013 13TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES (ISCIT): COMMUNICATION AND INFORMATION TECHNOLOGY FOR NEW LIFE STYLE BEYOND THE CLOUD, 2013, : 559 - 564
  • [2] Improving cluster-based missing value estimation of DNA microarray data
    Bras, Ligia P.
    Menezes, Jose C.
    [J]. BIOMOLECULAR ENGINEERING, 2007, 24 (02): : 273 - 282
  • [3] KNN-DTW Based Missing Value Imputation for Microarray Time Series Data
    Hsu, Hui-Huang
    Yang, Andy C.
    Lu, Ming-Da
    [J]. JOURNAL OF COMPUTERS, 2011, 6 (03) : 418 - 425
  • [4] A cluster-directed framework for neighbour based imputation of missing value in microarray data
    Keerin, Phimmarin
    Kurutach, Werasak
    Boongoen, Tossapon
    [J]. INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2016, 15 (02) : 165 - 193
  • [5] Correlated Cluster-Based Imputation for Treatment of Missing Values
    Myneni, Madhu Bala
    Srividya, Y.
    Dandamudi, Akhil
    [J]. PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INFORMATICS, ICCII 2016, 2017, 507 : 171 - 178
  • [6] Cluster-based Best Match Scanning for Large-Scale Missing Data Imputation
    Yu, Weiqing
    Zhu, Wendong
    Liu, Guangyi
    Kan, Bowen
    Zhao, Ting
    Liu, He
    [J]. 2017 3RD INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING AND COMMUNICATIONS (BIGCOM), 2017, : 232 - 238
  • [7] Imputation of missing values in DNA microarray gene expression data
    Kim, H
    Golub, GH
    Park, H
    [J]. 2004 IEEE COMPUTATIONAL SYSTEMS BIOINFORMATICS CONFERENCE, PROCEEDINGS, 2004, : 572 - 573
  • [8] Missing value estimation for DNA microarray gene expression data: local least squares imputation
    Kim, H
    Golub, GH
    Park, H
    [J]. BIOINFORMATICS, 2005, 21 (02) : 187 - 198
  • [9] Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data
    Sehgal, MSB
    Gondal, I
    Dooley, LS
    [J]. BIOINFORMATICS, 2005, 21 (10) : 2417 - 2423
  • [10] Impact of Missing Value Imputation on Classification for DNA Microarray Gene Expression Data-A Model-Based Study
    Sun, Youting
    Braga-Neto, Ulisses
    Dougherty, Edward R.
    [J]. EURASIP JOURNAL ON BIOINFORMATICS AND SYSTEMS BIOLOGY, 2009, (01):