ESTIMATION OF MISSING VALUES USING OPTIMISED HYBRID FUZZY C-MEANS AND MAJORITY VOTE FOR MICROARRAY DATA

被引：0

作者：

Kumaran, Shamini Raja ^{[1
]}

Othman, Mohd Shahizan ^{[1
]}

Yusuf, Lizawati Mi ^{[1
]}

机构：

[1] Univ Teknol Malaysia, Sch Comp, Skudai, Johor, Malaysia

来源：

JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGY-MALAYSIA | 2020年 / 19卷 / 04期

关键词：

Fuzzy C-means; majority vote; missing values; microarray data; data optimisation; IMPUTATION; ALGORITHM;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Missing values are a huge constraint in microarray technologies towards improving and identifying disease-causing genes. Estimating missing values is an undeniable scenario faced by field experts. The imputation method is an effective way to impute the proper values to proceed with the next process in microarray technology. Missing value imputation methods may increase the classification accuracy. Although these methods might predict the values, classification accuracy rates prove the ability of the methods to identify the missing values in gene expression data. In this study, a novel method, Optimised Hybrid of Fuzzy C-Means and Majority Vote (opt-FCMMV), was proposed to identify the missing values in the data. Using the Majority Vote (MV) and optimisation through Particle Swann Optimisation (PSO), this study predicted missing values in the data to form more informative and solid data. In order to verify the effectiveness of opt-FCMMV, several experiments were carried out on two publicly available microarray datasets (i.e. Ovary and Lung Cancer) under three missing value mechanisms with five different percentage values in the biomedical domain using Support Vector Machine (SVM) classifier. The experimental results showed that the proposed method functioned efficiently by showcasing the highest accuracy rate as compared to the one without imputations, with imputation by Fuzzy C-Means (FCM), and imputation by Fuzzy C-Means with Majority Vote (FCMMV). For example, the accuracy rates for Ovary Cancer data with 5% missing values were 64.0% for no imputation, 81.8% (FCM), 90.0% (FCMMV), and 93.7% (opt-FCMMV). Such an outcome indicates that the opt-FCMMV may also be applied in different domains in order to prepare the dataset for various data mining tasks.

引用

页码：459 / 482

页数：24

共 50 条

[1] Combining Fuzzy c-Means Classifiers Using Fuzzy Majority Vote
Yang, Haidong
Li, Chunsheng
FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 3, PROCEEDINGS, 2008, : 153 - +
[2] Missing value estimation for microarray data based on fuzzy C-means clustering
Luo, JiaWei
Yang, Tao
Wang, Yan
Eighth International Conference on High-Performance Computing in Asia-Pacific Region, Proceedings, 2005, : 611 - 616
[3] Fuzzy c-means classifier for incomplete data sets with outliers and missing values
Ichihashi, Hidetomo
Honda, Katsuhiro
INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE FOR MODELLING, CONTROL & AUTOMATION JOINTLY WITH INTERNATIONAL CONFERENCE ON INTELLIGENT AGENTS, WEB TECHNOLOGIES & INTERNET COMMERCE, VOL 2, PROCEEDINGS, 2006, : 457 - +
[4] Assessment of reliability of microarray data using Fuzzy c-Means classification
Alci, M
Asyali, MH
NEURAL INFORMATION PROCESSING, 2004, 3316 : 1322 - 1327
[5] Fuzzy C-means method for clustering microarray data
Dembélé, D
Kastner, P
BIOINFORMATICS, 2003, 19 (08) : 973 - 980
[6] The modified fuzzy c-means method for clustering of microarray data
Taraskina, A. S.
Cheremushkin, E. S.
PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON BIOINFORMATICS OF GENOME REGULATION AND STRUCTURE, VOL 1, 2006, : 180 - +
[7] Fuzzy C-Means clustering of incomplete data based on probabilistic information granules of missing values
Zhang, Liyong
Lu, Wei
Liu, Xiaodong
Pedrycz, Witold
Zhong, Chongquan
KNOWLEDGE-BASED SYSTEMS, 2016, 99 : 51 - 70
[8] Fuzzy c-means clustering of partially missing data sets
Hathaway, RJ
Overstreet, DD
Bezdek, JC
APPLICATIONS AND SCIENCE OF COMPUTATIONAL INTELLIGENCE III, 2000, 4055 : 159 - 165
[9] A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm
Aydilek, Ibrahim Berkan
Arslan, Ahmet
INFORMATION SCIENCES, 2013, 233 : 25 - 35
[10] CLUSTERING MICROARRAY GENE EXPRESSION DATA USING FUZZY C-MEANS AND DTW DISTANCE
Taghizad, H.
Mehridehnavi, A.
2011 3RD INTERNATIONAL CONFERENCE ON COMPUTER TECHNOLOGY AND DEVELOPMENT (ICCTD 2011), VOL 1, 2012, : 395 - 399

← 1 2 3 4 5 →