Feature Selection for Gene Expression Using Model-Based Entropy

被引:62
|
作者
Zhu, Shenghuo [1 ]
Wang, Dingding [2 ]
Yu, Kai [1 ]
Li, Tao [2 ]
Gong, Yihong [1 ]
机构
[1] NEC Labs Amer, Cupertino, CA 95014 USA
[2] Florida Int Univ, Sch Comp Sci, Miami, FL 33199 USA
基金
美国国家科学基金会;
关键词
Feature selection; multivariate Gaussian generative model; entropy; CLASSIFICATION; INFORMATION; PREDICTION;
D O I
10.1109/TCBB.2008.35
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Gene expression data usually contain a large number of genes but a small number of samples. Feature selection for gene expression data aims at finding a set of genes that best discriminate biological samples of different types. Using machine learning techniques, traditional gene selection based on empirical mutual information suffers the data sparseness issue due to the small number of samples. To overcome the sparseness issue, we propose a model-based approach to estimate the entropy of class variables on the model, instead of on the data themselves. Here, we use multivariate normal distributions to fit the data, because multivariate normal distributions have maximum entropy among all real-valued distributions with a specified mean and standard deviation and are widely used to approximate various distributions. Given that the data follow a multivariate normal distribution, since the conditional distribution of class variables given the selected features is a normal distribution, its entropy can be computed with the log-determinant of its covariance matrix. Because of the large number of genes, the computation of all possible log-determinants is not efficient. We propose several algorithms to largely reduce the computational cost. The experiments on seven gene data sets and the comparison with other five approaches show the accuracy of the multivariate Gaussian generative model for feature selection, and the efficiency of our algorithms.
引用
下载
收藏
页码:25 / 36
页数:12
相关论文
共 50 条
  • [1] Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification
    Sun, Lin
    Zhang, Xiaoyu
    Qian, Yuhua
    Xu, Jiucheng
    Zhang, Shiguang
    INFORMATION SCIENCES, 2019, 502 : 18 - 41
  • [2] Feature Selection for Surrogate Model-Based Optimization
    Rehbach, Frederik
    Gentile, Lorenzo
    Bartz-Beielstein, Thomas
    PROCEEDINGS OF THE 2019 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION (GECCCO'19 COMPANION), 2019, : 399 - 400
  • [3] Classification of Gene Expression Data Using Feature Selection Based on Type Combination Approach Model With Advanced Feature Selection Technology
    Siddesh, G. M.
    Gururaj, T.
    INTERNATIONAL JOURNAL OF COGNITIVE INFORMATICS AND NATURAL INTELLIGENCE, 2021, 15 (04)
  • [4] A model selection criterion for model-based clustering of annotated gene expression data
    Gallopin, Melina
    Celeux, Gilles
    Jaffrezic, Florence
    Rau, Andrea
    STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2015, 14 (05) : 413 - 428
  • [5] Feature Selection Using Neighborhood based Entropy
    Farnaghi-Zadeh, Fatemeh
    Rahmani, Mohsen
    Amiri, Maryam
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2022, 28 (11) : 1169 - 1192
  • [6] Feature Selection Using Approximate Conditional Entropy Based on Fuzzy Information Granule for Gene Expression Data Classification
    Zhang, Hengyi
    FRONTIERS IN GENETICS, 2021, 12
  • [7] A DCA Based Algorithm for Feature Selection in Model-Based Clustering
    Viet Anh Nguyen
    Hoai An Le Thi
    Hoai Minh Le
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2020), PT I, 2020, 12033 : 404 - 415
  • [8] Feature gene selection based on fuzzy neighborhood joint entropy
    Yan Wang
    Minjie Sun
    Linbo Long
    Jinhui Liu
    Yifan Ren
    Complex & Intelligent Systems, 2024, 10 : 129 - 144
  • [9] Feature gene selection based on fuzzy neighborhood joint entropy
    Wang, Yan
    Sun, Minjie
    Long, Linbo
    Liu, Jinhui
    Ren, Yifan
    COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (01) : 129 - 144
  • [10] Efficient Feature Selection Model for Gene Expression Data
    Saengsiri, Patharawut
    Wichian, Sageemas Na
    Meesad, Phayung
    MECHANICAL AND AEROSPACE ENGINEERING, PTS 1-7, 2012, 110-116 : 1948 - +