Gene selection from microarray gene expression data for classification of cancer subgroups employing PSO and adaptive K-nearest neighborhood technique

被引:128
|
作者
Kar, Subhajit [1 ]
Das Sharma, Kaushik [2 ]
Maitra, Madhubanti [3 ]
机构
[1] Future Inst Engn & Management, Dept Elect Engn, Kolkata, India
[2] Univ Calcutta, Dept Appl Phys, Kolkata, India
[3] Jadavpur Univ, Dept Elect Engn, Kolkata, India
关键词
Microarray data; SRBCT data; ALL_AML data; MLL data; Particle swarm optimization (PSO); Adaptive K-nearest neighborhood (KNN); Support vector machine (SVM); PARTICLE SWARM OPTIMIZATION; TYPE-2; FUZZY-LOGIC; NEURAL-NETWORKS; IDENTIFICATION; ALGORITHM; VALIDATION; DESIGN; SYSTEM;
D O I
10.1016/j.eswa.2014.08.014
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
These days, microarray gene expression data are playing an essential role in cancer classifications. However, due to the availability of small number of effective samples compared to the large number of genes in microarray data, many computational methods have failed to identify a small subset of important genes. Therefore, it is a challenging task to identify small number of disease-specific significant genes related for precise diagnosis of cancer sub classes. In this paper, particle swarm optimization (PSO) method along with adaptive K-nearest neighborhood (KNN) based gene selection technique are proposed to distinguish a small subset of useful genes that are sufficient for the desired classification purpose. A proper value of K would help to form the appropriate numbers of neighborhood to be explored and hence to classify the dataset accurately. Thus, a heuristic for selecting the optimal values of K efficiently, guided by the classification accuracy is also proposed. This proposed technique of finding minimum possible meaningful set of genes is applied on three benchmark microarray datasets, namely the small round blue cell tumor (SRBCT) data, the acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML) data and the mixed-lineage leukemia (MLL) data. Results demonstrate the usefulness of the proposed method in terms of classification accuracy on blind test samples, number of informative genes and computing time. Further, the usefulness and universal characteristics of the identified genes are reconfirmed by using different classifiers, such as support vector machine (SVM). (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:612 / 627
页数:16
相关论文
共 50 条
  • [1] Gene selection and sample classification on microarray data based on adaptive genetic algorithm/k-nearest neighbor method
    Lee, Chien-Pang
    Lin, Wen-Shin
    Chen, Yuh-Min
    Kuo, Bo-Jein
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (05) : 4661 - 4667
  • [2] Gene expression cancer classification using modified K-Nearest Neighbors technique
    Ayyad, Sarah M.
    Saleh, Ahmed, I
    Labib, Labib M.
    [J]. BIOSYSTEMS, 2019, 176 : 41 - 51
  • [3] Extreme Learning Machine and Fuzzy K-Nearest Neighbour Based Hybrid Gene Selection Technique for Cancer Classification
    Sungheetha, Akey
    Sharma, R. Rajesh
    [J]. JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2016, 6 (07) : 1652 - 1656
  • [4] Optimal gene selection for cancer classification with partial correlation and k-nearest neighbor classifier
    Yoo, SH
    Cho, SB
    [J]. PRICAI 2004: TRENDS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 3157 : 713 - 722
  • [5] Gene selection for cancer classification in microarray data
    Zhang, Lijuan
    Li, Zhoujun
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2009, 46 (05): : 794 - 802
  • [6] Optimized gene selection and classification of cancer from microarray gene expression data using deep learning
    Shah, Shamveel Hussain
    Iqbal, Muhammad Javed
    Ahmad, Iftikhar
    Khan, Suleman
    Rodrigues, Joel J. P. C.
    [J]. NEURAL COMPUTING & APPLICATIONS, 2020,
  • [7] Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method
    Li, LP
    Darden, TA
    Weinberg, CR
    Levine, AJ
    Pedersen, LG
    [J]. COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2001, 4 (08) : 727 - 739
  • [8] A STUDY ON GENE SELECTION AND CLASSIFICATION ALGORITHMS FOR CLASSIFICATION OF MICROARRAY GENE EXPRESSION DATA
    Chin, Yeo Lee
    Deris, Safaai
    [J]. JURNAL TEKNOLOGI, 2005, 43
  • [9] A Comparison of PSO and GA Approaches for Gene Selection and Classification of Microarray Data
    Garcia-Nieto, Jose
    Alba, Enrique
    Jourdan, Laetitia
    Talbi, El-Ghazali
    [J]. GECCO 2007: GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, VOL 1 AND 2, 2007, : 427 - 427
  • [10] Dataset complexity in gene expression based cancer classification using ensembles of k-nearest neighbors
    Okun, Oleg
    Priisalu, Helen
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2009, 45 (2-3) : 151 - 162