Latent-Variable Generative Models for Data-Efficient Text Classification

被引:0
|
作者
Ding, Xiaoan [1 ]
Gimpel, Kevin [2 ]
机构
[1] Univ Chicago, Chicago, IL 60637 USA
[2] Toyota Technol Inst Chicago, Chicago, IL 60637 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Generative classifiers offer potential advantages over their discriminative counterparts, namely in the areas of data efficiency, robustness to data shift and adversarial examples, and zero-shot learning (Ng and Jordan, 2002; Yogatama et al., 2017; Lewis and Fan, 2019). In this paper, we improve generative text classifiers by introducing discrete latent variables into the generative story, and explore several graphical model configurations. We parameterize the distributions using standard neural architectures used in conditional language modeling and perform learning by directly maximizing the log marginal likelihood via gradient-based optimization, which avoids the need to do expectation-maximization. We empirically characterize the performance of our models on six text classification datasets. The choice of where to include the latent variable has a significant impact on performance, with the strongest results obtained when using the latent variable as an auxiliary conditioning variable in the generation of the textual input. This model consistently outperforms both the generative and discriminative classifiers in small-data settings. We analyze our model by using it for controlled generation, finding that the latent variable captures interpretable properties of the data, even with very small training sets.
引用
收藏
页码:507 / 517
页数:11
相关论文
共 50 条
  • [1] Maximum Reconstruction Estimation for Generative Latent-Variable Models
    Cheng, Yong
    Liu, Yang
    Xu, Wei
    [J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3173 - 3179
  • [2] INFORMATION MATRICES IN LATENT-VARIABLE MODELS
    MISLEVY, RJ
    SHEEHAN, KM
    [J]. JOURNAL OF EDUCATIONAL STATISTICS, 1989, 14 (04): : 335 - 350
  • [3] REVERSE REGRESSIONS FOR LATENT-VARIABLE MODELS
    LEVINE, DK
    [J]. JOURNAL OF ECONOMETRICS, 1986, 32 (02) : 291 - 292
  • [4] Latent-variable models for longitudinal data with bivariate ordinal outcomes
    Todem, David
    Kim, KyungMann
    Lesaffre, Emmanuel
    [J]. STATISTICS IN MEDICINE, 2007, 26 (05) : 1034 - 1054
  • [5] LATENT-VARIABLE MODELS OF ATTRIBUTIONAL MEASUREMENT
    SMITH, ER
    MILLER, FD
    [J]. PERSONALITY AND SOCIAL PSYCHOLOGY BULLETIN, 1982, 8 (02) : 221 - 225
  • [6] Deconvolutional Latent-Variable Model for Text Sequence Matching
    Shen, Dinghan
    Zhang, Yizhe
    Henao, Ricardo
    Su, Qinliang
    Carin, Lawrence
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 5438 - 5445
  • [7] DICE: Data-Efficient Clinical Event Extraction with Generative Models
    Ma, Mingyu Derek
    Taylor, Alexander K.
    Wang, Wei
    Peng, Nanyun
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 15898 - 15917
  • [8] INTERPRETATION OF LATENT-VARIABLE REGRESSION-MODELS
    KVALHEIM, OM
    KARSTANG, TV
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 1989, 7 (1-2) : 39 - 51
  • [9] A Data-Efficient Method for One-Shot Text Classification
    Wang, Hsin-Yang
    Liu, Mu
    Yamashita, Katsushi
    Okamoto, Yasuhiro
    Yamada, Satoshi
    [J]. 2022 IEEE 2ND INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND ARTIFICIAL INTELLIGENCE (CCAI 2022), 2022, : 76 - 80
  • [10] Exact Inference for Integer Latent-Variable Models
    Winner, Kevin
    Sujono, Debora
    Sheldon, Dan
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70