The topographic organization and visualization of binary data using multivariate-bernoulli latent variable models

被引:14
|
作者
Girolami, M [1 ]
机构
[1] Univ Paisley, Div Comp & Informat Syst, Appl Computat Intelligence Res Unit, Paisley PA1 2BE, Renfrew, Scotland
来源
IEEE TRANSACTIONS ON NEURAL NETWORKS | 2001年 / 12卷 / 06期
关键词
data clustering; data mining; data visualization; generative modeling; probabilistic modeling; self-organization; text document processing; unsupervised learning;
D O I
10.1109/72.963773
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A nonlinear latent variable model for the topographic organization and subsequent visualization of multivariate binary data is presented. The generative topographic mapping (GTM) is a nonlinear factor analysis model for continuous data which assumes an isotropic Gaussian noise model and performs uniform sampling from a two-dimensional (2-D) latent space. Despite the success of the GTM when applied to continuous data the development of a similar model for discrete binary data has been hindered due, in part, to the nonlinear link function inherent in the binomial distribution which yields a log-likelihood that is nonlinear in the model parameters. This paper presents an effective method for the parameter estimation of a binary latent variable model-a binary version of the GTM-by adopting a variational approximation to the binomial likelihood. This approximation thus provides a log-likelihood which is quadratic in the model parameters and so obviates the necessity of an iterative M-step in the expectation maximization (EM) algorithm. The power of this method is demonstrated on two significant application domains, handwritten digit recognition and the topographic organization of semantically similar text-based documents.
引用
收藏
页码:1367 / 1374
页数:8
相关论文
共 50 条
  • [1] GEE-Assisted Variable Selection for Latent Variable Models with Multivariate Binary Data
    Hui, Francis K. C.
    Muller, Samuel
    Welsh, A. H.
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2023, 118 (542) : 1252 - 1263
  • [2] Latent variable models for the topographic organisation of discrete and strictly positive data
    Girolami, M
    NEUROCOMPUTING, 2002, 48 : 185 - 198
  • [3] Hidden Markov Latent Variable Models with Multivariate Longitudinal Data
    Song, Xinyuan
    Xia, Yemao
    Zhu, Hongtu
    BIOMETRICS, 2017, 73 (01) : 313 - 323
  • [4] Bayesian analysis of transformation latent variable models with multivariate censored data
    Song, Xin-Yuan
    Pan, Deng
    Liu, Peng-Fei
    Cai, Jing-Heng
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2016, 25 (05) : 2337 - 2358
  • [5] Latent variable models for teratogenesis using multiple binary outcomes
    Legler, JM
    Ryan, LM
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1997, 92 (437) : 13 - 20
  • [6] SMALL SAMPLE VALIDITY OF LATENT VARIABLE MODELS FOR CORRELATED BINARY DATA
    QU, YS
    PIEDMONTE, MR
    WILLIAMS, GW
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 1994, 23 (01) : 243 - 269
  • [7] Polya-gamma data augmentation and latent variable models for multivariate binomial data
    Holmes, John B.
    Schofield, Matthew R.
    Barker, Richard J.
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2022, 71 (01) : 194 - 218
  • [8] Directed Clustering of Multivariate Data Based on Linear or Quadratic Latent Variable Models
    Zhang, Yingjuan
    Einbeck, Jochen
    ALGORITHMS, 2024, 17 (08)
  • [9] Generalized Linear Latent Variable Models for Multivariate Count and Biomass Data in Ecology
    Niku, Jenni
    Warton, David I.
    Hui, Francis K. C.
    Taskinen, Sara
    JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS, 2017, 22 (04) : 498 - 522
  • [10] Generalized Linear Latent Variable Models for Multivariate Count and Biomass Data in Ecology
    Jenni Niku
    David I. Warton
    Francis K. C. Hui
    Sara Taskinen
    Journal of Agricultural, Biological and Environmental Statistics, 2017, 22 : 498 - 522