A Clustering Method Based on the Maximum Entropy Principle

被引：36

作者：

Aldana-Bobadilla, Edwin ^{[1
]}

Kuri-Morales, Angel ^{[2
]}

机构：

[1] Univ Nacl Autonoma Mexico, Inst Invest Matemat Aplicadas & Sistemas, Mexico City 04510, DF, Mexico

[2] Inst Tecnol Autonomo Mexico, Mexico City 01080, DF, Mexico

来源：

ENTROPY | 2015年 / 17卷 / 01期

关键词：

clustering; Shannon's entropy; genetic algorithms; INFORMATION; OPTIMIZATION; NUMBER; VALIDATION; ALGORITHM;

D O I：

10.3390/e17010151

中图分类号：

O4 [物理学];

学科分类号：

0702 ;

摘要：

Clustering is an unsupervised process to determine which unlabeled objects in a set share interesting properties. The objects are grouped into k subsets (clusters) whose elements optimize a proximity measure. Methods based on information theory have proven to be feasible alternatives. They are based on the assumption that a cluster is one subset with the minimal possible degree of "disorder". They attempt to minimize the entropy of each cluster. We propose a clustering method based on the maximum entropy principle. Such a method explores the space of all possible probability distributions of the data to find one that maximizes the entropy subject to extra conditions based on prior information about the clusters. The prior information is based on the assumption that the elements of a cluster are "similar" to each other in accordance with some statistical measure. As a consequence of such a principle, those distributions of high entropy that satisfy the conditions are favored over others. Searching the space to find the optimal distribution of object in the clusters represents a hard combinatorial problem, which disallows the use of traditional optimization techniques. Genetic algorithms are a good alternative to solve this problem. We benchmark our method relative to the best theoretical performance, which is given by the Bayes classifier when data are normally distributed, and a multilayer perceptron network, which offers the best practical performance when data are not normal. In general, a supervised classification method will outperform a non-supervised one, since, in the first case, the elements of the classes are known a priori. In what follows, we show that our method's effectiveness is comparable to a supervised one. This clearly exhibits the superiority of our method.

引用

页码：151 / 180

页数：30

共 50 条

[31] The latent maximum entropy principle
Department of Computer Science and Engineering, Wright State University, Dayton, OH 45435, United States
不详
不详
ACM Trans. Knowl. Discov. Data, 2
[32] THE MAXIMUM-ENTROPY PRINCIPLE
FELLGETT, PB
KYBERNETES, 1987, 16 (02) : 125 - 125
[33] MAXIMUM-ENTROPY PRINCIPLE
BALASUBRAMANIAN, V
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1984, 35 (03): : 153 - 153
[34] MAXIMUM ENTROPY PRINCIPLE FOR TRANSPORTATION
Bilich, F.
DaSilva, R.
BAYESIAN INFERENCE AND MAXIMUM ENTROPY METHODS IN SCIENCE AND ENGINEERING, 2008, 1073 : 252 - +
[35] The Latent Maximum Entropy Principle
Wang, Shaojun
Schuurmans, Dale
Zhao, Yunxin
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2012, 6 (02)
[36] Generalized maximum entropy principle
Kesavan, H.K., 1600, (19):
[37] Metasystems and the maximum entropy principle
Pittarelli, M
INTERNATIONAL JOURNAL OF GENERAL SYSTEMS, 1996, 24 (1-2) : 191 - 206
[38] THE PRINCIPLE OF MAXIMUM-ENTROPY
GUIASU, S
SHENITZER, A
MATHEMATICAL INTELLIGENCER, 1985, 7 (01): : 42 - 48
[39] The latent maximum entropy principle
Wang, SJ
Rosenfeld, R
Zhao, YX
Schuurmans, D
ISIT: 2002 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, PROCEEDINGS, 2002, : 131 - 131
[40] Maximum entropy principle revisited
Wolfgang Dreyer
Matthias Kunik
Continuum Mechanics and Thermodynamics, 1998, 10 : 331 - 347

← 1 2 3 4 5 →