A strategy for finding relevant clusters;: with an application to microarray data

被引:8
|
作者
Berget, I
Mevik, BH
Vebo, H
Næs, T
机构
[1] Norwegian Univ Life Sci, Dept Anim & Aquacultural Sci, N-1432 As, Norway
[2] Norwegian Univ Life Sci, Dept Chem Biotechnol & Food Sci, N-1432 As, Norway
[3] Norwegian Food Res Inst, MATFORSK, As, Norway
[4] Univ Oslo, Dept Math, Oslo, Norway
关键词
fuzzy clustering; noise cluster; relevant clusters; gene expression; sequential noise clustering;
D O I
10.1002/cem.954
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cluster analysis is a helpful tool for explorative analysis of large and complex data. Most clustering methods will, however, find clusters also in random data. An important aspect of cluster analysis is therefore to distinguish real and artificial clusters, as this will make interpretation of the clusters easier. In some cases, certain types of clusters are more interesting than others. When working with gene expression data, examples of such clusters are gene clusters with high between-sample variability or clusters with a certain expression profile. Here we present a strategy with the ability to search for such clusters. The clustering is done sequentially. For each sequence, the data is separated into 'interesting' and 'rest' using the fuzzy c-means algorithm with noise clustering. The interesting cluster is defined by adding a penalty function to the usual clustering criterion. The penalty function is constructed in such a way that clusters without the interesting properties are given a high penalty. The strategy is presented in a general frame, and can be adjusted by defining different criteria for each type of cluster that is of interest. The methodology is presented and demonstrated in the context of microarray gene expression analysis, using real and simulated data, but can be used for any type of data where cluster analysis may be a helpful tool. Copyright (c) 2006 John Wiley & Sons, Ltd.
引用
收藏
页码:482 / 491
页数:10
相关论文
共 50 条
  • [1] NIFTI: An evolutionary approach for finding number of clusters in microarray data
    Sudhakar Jonnalagadda
    Rajagopalan Srinivasan
    [J]. BMC Bioinformatics, 10
  • [2] NIFTI: An evolutionary approach for finding number of clusters in microarray data
    Jonnalagadda, Sudhakar
    Srinivasan, Rajagopalan
    [J]. BMC BIOINFORMATICS, 2009, 10
  • [3] Finding dominant sets in microarray data
    Fu, XP
    Teng, L
    Li, Y
    Chen, WB
    Mao, YM
    Shen, IF
    Xie, Y
    [J]. FRONTIERS IN BIOSCIENCE-LANDMARK, 2005, 10 : 3068 - 3077
  • [4] Finding similar patterns in microarray data
    Chen, XS
    Li, JY
    Daggard, G
    Huang, XD
    [J]. AI 2005: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2005, 3809 : 1272 - 1276
  • [5] Finding unexpected patterns in microarray data
    Perelman, S
    Mazzella, MA
    Muschietti, J
    Zhu, T
    Casal, JJ
    [J]. PLANT PHYSIOLOGY, 2003, 133 (04) : 1717 - 1725
  • [6] A Heuristic Method for Finding the Optimal Number of Clusters with Application in Medical Data
    Bayati, Hamidreza
    Davoudi, Heydar
    Fatemizadeh, Emad
    [J]. 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Vols 1-8, 2008, : 4684 - 4687
  • [7] Relevant and Significant Supervised Gene Clusters for Microarray Cancer Classification
    Maji, Pradipta
    Das, Chandra
    [J]. IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2012, 11 (02) : 161 - 168
  • [8] Finding edging genes from microarray data
    An, Jiyuan
    Chen, Yi-Ping Phoebe
    [J]. JOURNAL OF BIOTECHNOLOGY, 2008, 135 (03) : 233 - 240
  • [9] Estimating the number of clusters in DNA microarray data
    Bolshakova, N
    Azuaje, F
    [J]. METHODS OF INFORMATION IN MEDICINE, 2006, 45 (02) : 153 - 157
  • [10] Biologically Relevant Association Rules for Classification of Microarray Data
    Antonie, Luiza
    Bessonov, Kyrylo
    [J]. APPLIED COMPUTING REVIEW, 2012, 12 (01): : 12 - 23