Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data

被引:0
|
作者
Gal, Yarin [1 ]
Chen, Yutian [1 ]
Ghahramani, Zoubin [1 ]
机构
[1] Univ Cambridge, Cambridge, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multivariate categorical data occur in many applications of machine learning. One of the main difficulties with these vectors of categorical variables is sparsity. The number of possible observations grows exponentially with vector length, but dataset diversity might be poor in comparison. Recent models have gained significant improvement in supervised tasks with this data. These models embed observations in a continuous space to capture similarities between them. Building on these ideas we propose a Bayesian model for the unsupervised task of distribution estimation of multivariate categorical data. We model vectors of categorical variables as generated from a non-linear transformation of a continuous latent space. Non-linearity captures multi-modality in the distribution. The continuous representation addresses sparsity. Our model ties together many existing models, linking the linear categorical latent Gaussian model, the Gaussian process latent variable model, and Gaussian process classification. We derive inference for our model based on recent developments in sampling based variational inference. We show empirically that the model outperforms its linear and discrete counterparts in imputation tasks of sparse data.
引用
收藏
页码:645 / 654
页数:10
相关论文
共 50 条
  • [31] Unified eigen analysis on multivariate Gaussian based estimation of distribution algorithms
    Dong, Weishan
    Yao, Xin
    INFORMATION SCIENCES, 2008, 178 (15) : 3000 - 3023
  • [32] Latent variable techniques for categorical data
    Lancaster, G
    Green, M
    STATISTICS AND COMPUTING, 2002, 12 (02) : 153 - 161
  • [33] Marginal modelling of multivariate categorical data
    Molenberghs, G
    Lesaffre, E
    STATISTICS IN MEDICINE, 1999, 18 (17-18) : 2237 - 2255
  • [34] Calibrated imputation for multivariate categorical data
    de Waal, Ton
    Daalmans, Jacco
    ASTA-ADVANCES IN STATISTICAL ANALYSIS, 2024, 108 (03) : 545 - 576
  • [35] Fuzzy clustering for categorical multivariate data
    Oh, CH
    Honda, K
    Ichihashi, H
    JOINT 9TH IFSA WORLD CONGRESS AND 20TH NAFIPS INTERNATIONAL CONFERENCE, PROCEEDINGS, VOLS. 1-5, 2001, : 2154 - 2159
  • [36] Multivariate Analysis of Categorical Data: Theory
    Rose, Elizabeth L.
    STRUCTURAL EQUATION MODELING-A MULTIDISCIPLINARY JOURNAL, 1995, 2 (03) : 274 - 276
  • [37] Outlier detection for multivariate categorical data
    Puig, Xavier
    Ginebra, Josep
    QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2018, 34 (07) : 1400 - 1412
  • [38] Anomaly Detection for Categorical Observations Using Latent Gaussian Process
    Lv, Fengmao
    Yang, Guowu
    Wu, Jinzhao
    Liu, Chuan
    Yang, Yuhong
    NEURAL INFORMATION PROCESSING, ICONIP 2017, PT V, 2017, 10638 : 285 - 296
  • [39] Novel Data Augmentation Employing Multivariate Gaussian Distribution for Neural Network-Based Blood Pressure Estimation
    Song, Kwangsub
    Park, Tae-Jun
    Chang, Joon-Hyuk
    APPLIED SCIENCES-BASEL, 2021, 11 (09):
  • [40] Latent Growth Curve Analysis with Categorical Data: Model Specification, Estimation, and Panel Attrition
    Zheng, Xiaying
    Yang, Ji Seung
    MULTIVARIATE BEHAVIORAL RESEARCH, 2018, 53 (01) : 134 - 135