Attribute clustering for grouping, selection, and classification of gene expression data

被引:106
|
作者
Au, WH [1 ]
Chan, KCC
Wong, AKC
Wang, Y
机构
[1] Hong Kong Polytech Univ, Dept Comp, Kowloon, Hong Kong, Peoples R China
[2] Univ Waterloo, Dept Syst Design Engn, Waterloo, ON N2L 3G1, Canada
[3] Pattern Discovery Software Syst Ltd, Waterloo, ON N2L 5Z4, Canada
关键词
data mining; attribute clustering; gene selection; gene expression classification; microarray analysis;
D O I
10.1109/TCBB.2005.17
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
This paper presents an attribute clustering method which is able to group genes based on their interdependence so as to mine meaningful patterns from the gene expression data. It can be used for gene grouping, selection, and classification. The partitioning of a relational table into attribute subgroups allows a small number of attributes within or across the groups to be selected for analysis. By clustering attributes, the search dimension of a data mining algorithm is reduced. The reduction of search dimension is especially important to data mining in gene expression data because such data typically consist of a huge number of genes (attributes) and a small number of gene expression profiles (tuples). Most data mining algorithms are typically developed and optimized to scale to the number of tuples instead of the number of attributes. The situation becomes even worse when the number of attributes overwhelms the number of tuples, in which case, the likelihood of reporting patterns that are actually irrelevant due to chances becomes rather high. It is for the aforementioned reasons that gene grouping and selection are important preprocessing steps for many data mining algorithms to be effective when applied to gene expression data. This paper defines the problem of attribute clustering and introduces a methodology to solving it. Our proposed method groups interdependent attributes into clusters by optimizing a criterion function derived from an information measure that reflects the interdependence between attributes. By applying our algorithm to gene expression data, meaningful clusters of genes are discovered. The grouping of genes based on attribute interdependence within group helps to capture different aspects of gene association patterns in each group. Significant genes selected from each group then contain useful information for gene expression classification and identification. To evaluate the performance of the proposed approach, we applied it to two well-known gene expression data sets and compared our results with those obtained by other methods. Our experiments show that the proposed method is able to find the meaningful clusters of genes. By selecting a subset of genes which have high multiple-interdependence with others within clusters, significant classification information can be obtained. Thus, a small pool of selected genes can be used to build classifiers with very high classification rate. From the pool, gene expressions of different categories can be identified.
引用
收藏
页码:83 / 101
页数:19
相关论文
共 50 条
  • [1] Attribute clustering for grouping, selection, and classification of gene expression data (vol 2, pg 83, 2005)
    Au, W. H.
    Chan, K. C. C.
    Wong, A. K. C.
    Wang, Y.
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2007, 4 (01) : 157 - 157
  • [2] Rough Set based Attribute Clustering for Sample Classification of Gene Expression Data
    Nayak, Rudra Kalyan
    Mishra, Debahuti
    Shaw, Kailash
    Mishra, Sashikala
    [J]. INTERNATIONAL CONFERENCE ON MODELLING OPTIMIZATION AND COMPUTING, 2012, 38 : 1788 - 1792
  • [3] Ensemble gene selection by grouping for microarray data classification
    Liu, Huawen
    Liu, Lei
    Zhang, Huijie
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2010, 43 (01) : 81 - 87
  • [4] Spatial clustering based gene selection for gene expression analysis in microarray data classification
    Dhas, P. Edwin
    Lalitha, S.
    Govindaraj, Annalakshmi
    Jyoshna, B.
    [J]. AUTOMATIKA, 2024, 65 (01) : 152 - 158
  • [5] Attribute Selection and Classification of Prostate Cancer Gene Expression Data Using Artificial Neural Networks
    Tirumala, Sreenivas Sremath
    Narayanan, A.
    [J]. TRENDS AND APPLICATIONS IN KNOWLEDGE DISCOVERY AND DATA MINING (PAKDD 2016), 2016, 9794 : 26 - 34
  • [6] Iterative clustering analysis for grouping missing data in gene expression profiles
    Kim, Dae-Won
    Kang, Bo-Yeong
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2006, 3918 : 129 - 138
  • [7] A model for gene selection and classification of gene expression data
    Mohamad M.S.
    Omatu S.
    Deris S.
    Hashim S.Z.M.
    [J]. Artificial Life and Robotics, 2007, 11 (2) : 219 - 222
  • [8] Feature selection and gene clustering from gene expression data
    Mitra, P
    Majumder, DD
    [J]. PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, 2004, : 343 - 346
  • [9] Comparison of data-merging methods with SVM attribute selection and classification in breast cancer gene expression
    Bevilacqua, Vitoantonio
    Pannarale, Paolo
    Abbrescia, Mirko
    Cava, Claudia
    Paradiso, Angelo
    Tommasi, Stefania
    [J]. BMC BIOINFORMATICS, 2012, 13
  • [10] Comparison of data-merging methods with SVM attribute selection and classification in breast cancer gene expression
    Vitoantonio Bevilacqua
    Paolo Pannarale
    Mirko Abbrescia
    Claudia Cava
    Angelo Paradiso
    Stefania Tommasi
    [J]. BMC Bioinformatics, 13