Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data

被引:0
|
作者
Gal, Yarin [1 ]
Chen, Yutian [1 ]
Ghahramani, Zoubin [1 ]
机构
[1] Univ Cambridge, Cambridge, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multivariate categorical data occur in many applications of machine learning. One of the main difficulties with these vectors of categorical variables is sparsity. The number of possible observations grows exponentially with vector length, but dataset diversity might be poor in comparison. Recent models have gained significant improvement in supervised tasks with this data. These models embed observations in a continuous space to capture similarities between them. Building on these ideas we propose a Bayesian model for the unsupervised task of distribution estimation of multivariate categorical data. We model vectors of categorical variables as generated from a non-linear transformation of a continuous latent space. Non-linearity captures multi-modality in the distribution. The continuous representation addresses sparsity. Our model ties together many existing models, linking the linear categorical latent Gaussian model, the Gaussian process latent variable model, and Gaussian process classification. We derive inference for our model based on recent developments in sampling based variational inference. We show empirically that the model outperforms its linear and discrete counterparts in imputation tasks of sparse data.
引用
收藏
页码:645 / 654
页数:10
相关论文
共 50 条
  • [1] Latent Gaussian process for anomaly detection in categorical data
    Lv, Fengmao
    Liang, Tao
    Zhao, Jiayi
    Zhuo, Zhongliu
    Wu, Jinzhao
    Yang, Guowu
    KNOWLEDGE-BASED SYSTEMS, 2021, 220
  • [2] A latent Gaussian model for multivariate consumption data
    Allcroft, D. J.
    Glasbey, C. A.
    Paulo, M. J.
    FOOD QUALITY AND PREFERENCE, 2007, 18 (03) : 508 - 516
  • [3] Estimation and selection for the latent block model on categorical data
    Christine Keribin
    Vincent Brault
    Gilles Celeux
    Gérard Govaert
    Statistics and Computing, 2015, 25 : 1201 - 1216
  • [4] Estimation and selection for the latent block model on categorical data
    Keribin, Christine
    Brault, Vincent
    Celeux, Gilles
    Govaert, Gerard
    STATISTICS AND COMPUTING, 2015, 25 (06) : 1201 - 1216
  • [5] Additive Multivariate Gaussian Processes for Joint Species Distribution Modeling with Heterogeneous Data
    Vanhatalo, Jarno
    Hartmann, Marcelo
    Veneranta, Lari
    BAYESIAN ANALYSIS, 2020, 15 (02): : 415 - 447
  • [6] On-line estimation with the multivariate Gaussian distribution
    Dasgupta, Sanjoy
    Hsu, Daniel
    LEARNING THEORY, PROCEEDINGS, 2007, 4539 : 278 - +
  • [7] Estimation of Spatial Distribution Using the Gaussian Mixture Model with Multivariate Geoscience Data
    Kim, Ho-Rim
    Yu, Soonyoung
    Yun, Seong-Taek
    Kim, Kyoung-Ho
    Lee, Goon-Taek
    Lee, Jeong-Ho
    Heo, Chul-Ho
    Ryu, Dong-Woo
    ECONOMIC AND ENVIRONMENTAL GEOLOGY, 2022, 55 (04): : 353 - 366
  • [8] Data Fusion With Latent Map Gaussian Processes
    Eweis-Labolle, Jonathan Tammer
    Oune, Nicholas
    Bostanabad, Ramin
    JOURNAL OF MECHANICAL DESIGN, 2022, 144 (09)
  • [9] Generative classification model for categorical data based on latent Gaussian process
    Lv, Fengmao
    Yang, Guowu
    Zhu, William
    Liu, Chuan
    PATTERN RECOGNITION LETTERS, 2017, 92 : 56 - 61
  • [10] DISTRIBUTIONS OF LARGEST LATENT ROOT OF MULTIVARIATE COMPLEX GAUSSIAN DISTRIBUTION
    SUGIYAMA, T
    ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 1972, 24 (01) : 87 - 94