The topographic organization and visualization of binary data using multivariate-bernoulli latent variable models

被引:14
|
作者
Girolami, M [1 ]
机构
[1] Univ Paisley, Div Comp & Informat Syst, Appl Computat Intelligence Res Unit, Paisley PA1 2BE, Renfrew, Scotland
来源
IEEE TRANSACTIONS ON NEURAL NETWORKS | 2001年 / 12卷 / 06期
关键词
data clustering; data mining; data visualization; generative modeling; probabilistic modeling; self-organization; text document processing; unsupervised learning;
D O I
10.1109/72.963773
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A nonlinear latent variable model for the topographic organization and subsequent visualization of multivariate binary data is presented. The generative topographic mapping (GTM) is a nonlinear factor analysis model for continuous data which assumes an isotropic Gaussian noise model and performs uniform sampling from a two-dimensional (2-D) latent space. Despite the success of the GTM when applied to continuous data the development of a similar model for discrete binary data has been hindered due, in part, to the nonlinear link function inherent in the binomial distribution which yields a log-likelihood that is nonlinear in the model parameters. This paper presents an effective method for the parameter estimation of a binary latent variable model-a binary version of the GTM-by adopting a variational approximation to the binomial likelihood. This approximation thus provides a log-likelihood which is quadratic in the model parameters and so obviates the necessity of an iterative M-step in the expectation maximization (EM) algorithm. The power of this method is demonstrated on two significant application domains, handwritten digit recognition and the topographic organization of semantically similar text-based documents.
引用
收藏
页码:1367 / 1374
页数:8
相关论文
共 50 条