Estimating the number of clusters in microarray data sets based on an information theoretic criterion

被引:0
|
作者
Nicorici, Daniel [1 ]
Astola, Jaakko [1 ]
Yli-Harja, Olli [1 ]
机构
[1] Tampere Univ Technol, Inst Signal Proc, FIN-33101 Tampere, Finland
关键词
number of clusters; microarray data; minimum; description length; normalized maximum likelihood;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This study focuses on an information theoretic approach for estimating the number of clusters K, in microarray data sets. We present an automatic method for estimating K, based on a particular version of the Normalized Maximum Likelihood (NML) model. The strength of the Minimum Description Length (MDL) methods, such as the NML model, in statistical inference is to find the model structure which, in this particular clustering problem, amounts to find the best number of clusters and the best cluster structure for the data. The models are compared using the NML code length. The study introduces a new method for computing the code length of the encoded clustering vector for the data samples, based on the NML model. Experiments with publicly available microarray data sets demonstrate the ability of the new method to find the biologically meaningful clusters.
引用
收藏
页码:936 / 940
页数:5
相关论文
共 50 条