Sequence Modeling with Hierarchical Deep Generative Models with Dual Memory

被引:0
|
作者
Zheng, Yanan [1 ]
Wen, Lijie [1 ]
Wang, Jianmin [1 ]
Yan, Jun [2 ]
Ji, Lei [2 ]
机构
[1] Tsinghua Univ, Beijing 100084, Peoples R China
[2] Microsoft Res Asia, Dan Ling St, Beijing 100080, Peoples R China
关键词
Sequence Modeling; Hierarchical Deep Generative Models; Dual Memory Mechanism; Inference and Learning;
D O I
10.1145/3132847.3132952
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep Generative Models (DGMs) are able to extract high-level representations from massive unlabeled data and are explainable from a probabilistic perspective. Such characteristics favor sequence modeling tasks. However, it still remains a huge challenge to model sequences with DGMs. Unlike real-valued data that can be directly fed into models, sequence data consist of discrete elements and require being transformed into certain representations first. This leads to the following two challenges. First, high-level features are sensitive to small variations of inputs as well as the way of representing data. Second, the models are more likely to lose long-term information during multiple transformations. In this paper, we propose a Hierarchical Deep Generative Model With Dual Memory to address the two challenges. Furthermore, we provide a method to efficiently perform inference and learning on the model. The proposed model extends basic DGMs with an improved hierarchically organized multi-layer architecture. Besides, our model incorporates memories along dual directions, respectively denoted as broad memory and deep memory. The model is trained end-to-end by optimizing a variational lower bound on data log-likelihood using the improved stochastic variational method. We perform experiments on several tasks with various datasets and obtain excellent results. The results of language modeling show our method significantly outperforms state-of-the-art results in terms of generative performance. Extended experiments including document modeling and sentiment analysis, prove the high-effectiveness of dual memory mechanism and latent representations. Text random generation provides a straightforward perception for advantages of our model.
引用
下载
收藏
页码:1369 / 1378
页数:10
相关论文
共 50 条
  • [21] Learning Deep Generative Models
    Salakhutdinov, Ruslan
    ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, VOL 2, 2015, 2 : 361 - 385
  • [22] Deep generative models in DataSHIELD
    Stefan Lenz
    Moritz Hess
    Harald Binder
    BMC Medical Research Methodology, 21
  • [23] Metrics for Deep Generative Models
    Chen, Nutan
    Klushyn, Alexej
    Kurle, Richard
    Jiang, Xueyan
    Bayer, Justin
    van der Smagt, Patrick
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [24] For antibody sequence generative modeling, mixture models may be all you need
    Parkinson, Jonathan
    Wang, Wei
    BIOINFORMATICS, 2024, 40 (05)
  • [25] Hierarchical single- and dual-process models of recognition memory
    Pratte, Michael S.
    Rouder, Jeffrey N.
    JOURNAL OF MATHEMATICAL PSYCHOLOGY, 2011, 55 (01) : 36 - 46
  • [26] An introduction to deep generative modeling
    Ruthotto L.
    Haber E.
    GAMM Mitteilungen, 2021, 44 (02)
  • [27] Asymmetric deep generative models
    Partaourides, Harris
    Chatzis, Sotirios P.
    NEUROCOMPUTING, 2017, 241 : 90 - 96
  • [28] Auxiliary Deep Generative Models
    Maaloe, Lars
    Sonderby, Casper Kaae
    Sonderby, Soren Kaae
    Winther, Ole
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [29] Deep Generative Models: Survey
    Oussidi, Achraf
    Elhassouny, Azeddine
    2018 INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND COMPUTER VISION (ISCV2018), 2018,
  • [30] Deep generative models in DataSHIELD
    Lenz, Stefan
    Hess, Moritz
    Binder, Harald
    BMC MEDICAL RESEARCH METHODOLOGY, 2021, 21 (01)