An efficient statistical feature selection approach for classification of gene expression data

被引:110
|
作者
Chandra, B. [1 ]
Gupta, Manish [1 ]
机构
[1] Indian Inst Technol Delhi, Dept Math, New Delhi 110016, India
关键词
Cancer diagnosis and prediction; Gene selection; Classification; Feature selection; CANCER CLASSIFICATION; T-TEST; PREDICTION; TUMOR;
D O I
10.1016/j.jbi.2011.01.001
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Classification of gene expression data plays a significant role in prediction and diagnosis of diseases. Gene expression data has a special characteristic that there is a mismatch in gene dimension as opposed to sample dimension. All genes do not contribute for efficient classification of samples. A robust feature selection algorithm is required to identify the important genes which help in classifying the samples efficiently. In order to select informative genes (features) based on relevance and redundancy characteristics, many feature selection algorithms have been introduced in the past. Most of the earlier algorithms require computationally expensive search strategy to find an optimal feature subset. Existing feature selection methods are also sensitive to the evaluation measures. The paper introduces a novel and efficient feature selection approach based on statistically defined effective range of features for every class termed as ERGS (Effective Range based Gene Selection). The basic principle behind ERGS is that higher weight is given to the feature that discriminates the classes clearly. Experimental results on well-known gene expression datasets illustrate the effectiveness of the proposed approach. Two popular classifiers viz. Nave Bayes Classifier (NBC) and Support Vector Machine (SVM) have been used for classification. The proposed feature selection algorithm can be helpful in ranking the genes and also is capable of identifying the most relevant genes responsible for diseases like leukemia, colon tumor, lung cancer, diffuse large B-cell lymphoma (DLBCL), prostate cancer. (C) 2011 Elsevier Inc. All rights reserved.
引用
收藏
页码:529 / 535
页数:7
相关论文
共 50 条
  • [1] GENE EXPRESSION DATA CLASSIFICATION COMBINING HIERARCHICAL REPRESENTATION AND EFFICIENT FEATURE SELECTION
    Bosio, Mattia
    Bellot, Pau
    Salembier, Philippe
    Oliveras-Verges, Albert
    JOURNAL OF BIOLOGICAL SYSTEMS, 2012, 20 (04) : 349 - 375
  • [2] Feature Selection and Classification in gene expression cancer data
    Pavithra, D.
    Lakshmanan, B.
    2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN DATA SCIENCE (ICCIDS), 2017,
  • [3] Efficient Feature Selection Model for Gene Expression Data
    Saengsiri, Patharawut
    Wichian, Sageemas Na
    Meesad, Phayung
    MECHANICAL AND AEROSPACE ENGINEERING, PTS 1-7, 2012, 110-116 : 1948 - +
  • [4] An Efficient Feature Selection Technique for Gene Expression Data
    Chandra, B.
    2018 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY (CIBCB), 2018, : 132 - 137
  • [5] An Efficient Statistical Feature Selection Based Classification
    Narayanamma, K. Laxmi
    Krishnaiah, R., V
    Sammulal, P.
    JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES, 2019, 14 (04): : 27 - 40
  • [6] A hybrid feature selection algorithm for gene expression data classification
    Lu, Huijuan
    Chen, Junying
    Yan, Ke
    Jin, Qun
    Xue, Yu
    Gao, Zhigang
    NEUROCOMPUTING, 2017, 256 : 56 - 62
  • [7] Feature selection as a preprocessing step for classification in gene expression data
    Borges, Helyane Bronoski
    Nievola, Julio Cesar
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, 2007, : 157 - +
  • [8] Review on Feature Selection Methods for Gene Expression Data Classification
    Almutiri, Talal
    Saeed, Faisal
    EMERGING TRENDS IN INTELLIGENT COMPUTING AND INFORMATICS: DATA SCIENCE, INTELLIGENT INFORMATION SYSTEMS AND SMART COMPUTING, 2020, 1073 : 24 - 34
  • [9] Feature Selection of Gene Expression Data for Cancer Classification: A Review
    Singh, Rabindra Kumar
    Sivabalakrishnan, M.
    BIG DATA, CLOUD AND COMPUTING CHALLENGES, 2015, 50 : 52 - 57
  • [10] Classification of Gene Expression Data Using Feature Selection Based on Type Combination Approach Model With Advanced Feature Selection Technology
    Siddesh, G. M.
    Gururaj, T.
    INTERNATIONAL JOURNAL OF COGNITIVE INFORMATICS AND NATURAL INTELLIGENCE, 2021, 15 (04)