NIFTI: An evolutionary approach for finding number of clusters in microarray data

被引：4

作者：

Jonnalagadda, Sudhakar ^{[1
]}

Srinivasan, Rajagopalan ^{[1
]}

机构：

[1] Natl Univ Singapore, Dept Chem & Biomol Engn, Singapore 119260, Singapore

来源：

BMC BIOINFORMATICS | 2009年 / 10卷

关键词：

GENE-EXPRESSION DATA; VALIDATION TECHNIQUES; DATA SET; PATTERNS; VALIDITY; MODEL;

D O I：

10.1186/1471-2105-10-40

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Background: Clustering techniques are routinely used in gene expression data analysis to organize the massive data. Clustering techniques arrange a large number of genes or assays into a few clusters while maximizing the intra-cluster similarity and inter-cluster separation. While clustering of genes facilitates learning the functions of un-characterized genes using their association with known genes, clustering of assays reveals the disease stages and subtypes. Many clustering algorithms require the user to specify the number of clusters a priori. A wrong specification of number of clusters generally leads to either failure to detect novel clusters (disease subtypes) or unnecessary splitting of natural clusters. Results: We have developed a novel method to find the number of clusters in gene expression data. Our procedure evaluates different partitions (each with different number of clusters) from the clustering algorithm and finds the partition that best describes the data. In contrast to the existing methods that evaluate the partitions independently, our procedure considers the dynamic rearrangement of cluster members when a new cluster is added. Partition quality is measured based on a new index called Net InFormation Transfer Index (NIFTI) that measures the information change when an additional cluster is introduced. Information content of a partition increases when clusters do not intersect and decreases if they are not clearly separated. A partition with the highest Total Information Content (TIC) is selected as the optimal one. We illustrate our method using four publicly available microarray datasets. Conclusion: In all four case studies, the proposed method correctly identified the number of clusters and performs better than other well known methods. Our method also showed invariance to the clustering techniques.

引用

页数：13

共 50 条

[1] NIFTI: An evolutionary approach for finding number of clusters in microarray data
Sudhakar Jonnalagadda
Rajagopalan Srinivasan
[J]. BMC Bioinformatics, 10
[2] Estimating the number of clusters in DNA microarray data
Bolshakova, N
Azuaje, F
[J]. METHODS OF INFORMATION IN MEDICINE, 2006, 45 (02) : 153 - 157
[3] A strategy for finding relevant clusters;: with an application to microarray data
Berget, I
Mevik, BH
Vebo, H
Næs, T
[J]. JOURNAL OF CHEMOMETRICS, 2005, 19 (09) : 482 - 491
[4] A Novel Approach for Automatic Number of Clusters Detection in Microarray Data based on Consensus Clustering
Vinh, Nguyen Xuan
Epps, Julien
[J]. 2009 9TH IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING, 2009, : 84 - 91
[5] Finding Number of Clusters before Finding Clusters
Pakhira, Malay K.
[J]. 2ND INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION, CONTROL AND INFORMATION TECHNOLOGY (C3IT-2012), 2012, 4 : 27 - 37
[6] CNAnova: a new approach for finding recurrent copy number abnormalities in cancer SNP microarray data
Ivakhno, Sergii
Tavare, Simon
[J]. BIOINFORMATICS, 2010, 26 (11) : 1395 - 1402
[7] On finding the number of clusters
Kothari, R
Pitts, D
[J]. PATTERN RECOGNITION LETTERS, 1999, 20 (04) : 405 - 416
[8] Finding the number of clusters in a dataset: An information-theoretic approach
Sugar, CA
James, GM
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2003, 98 (463) : 750 - 763
[9] An evolutionary algorithm for clustering data streams with a variable number of clusters
Silva, Jonathan de Andrade
Hruschka, Eduardo Raul
Gama, Joao
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 67 : 228 - 238
[10] Efficiently Finding the Optimum Number of Clusters in a Dataset with a New Hybrid Cellular Evolutionary Algorithm
Arellano-Verdejo, Javier
Guzman-Arenas, Adolfo
Godoy-Calderon, Salvador
Barron Fernandez, Ricardo
[J]. COMPUTACION Y SISTEMAS, 2014, 18 (02): : 313 - 327

← 1 2 3 4 5 →