Recurrent Topic-Transition GAN for Visual Paragraph Generation

被引:110
|
作者
Liang, Xiaodan [1 ]
Hu, Zhiting [1 ,2 ]
Zhang, Hao [1 ,2 ]
Gan, Chuang [3 ]
Xing, Eric P. [2 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Petuum Inc, Pittsburgh, PA USA
[3] Tsinghua Univ, Beijing, Peoples R China
关键词
D O I
10.1109/ICCV.2017.364
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A natural image usually conveys rich semantic content and can be viewed from different angles. Existing image description methods are largely restricted by small sets of biased visual paragraph annotations, and fail to cover rich underlying semantics. In this paper, we investigate a semi-supervised paragraph generative framework that is able to synthesize diverse and semantically coherent paragraph descriptions by reasoning over local semantic regions and exploiting linguistic knowledge. The proposed Recurrent Topic-Transition Generative Adversarial Network (RTT-GAN) builds an adversarial framework between a structured paragraph generator and multi-level paragraph discriminators. The paragraph generator generates sentences recurrently by incorporating region-based visual and language attention mechanisms at each step. The quality of generated paragraph sentences is assessed by multi-level adversarial discriminators from two aspects, namely, plausibility at sentence level and topic-transition coherence at paragraph level. The joint adversarial training of RTT-GAN drives the model to generate realistic paragraphs with smooth logical transition between sentence topics. Extensive quantitative experiments on image and video paragraph datasets demonstrate the effectiveness of our RTT-GAN in both supervised and semi-supervised settings. Qualitative results on telling diverse stories for an image verify the interpretability of RTT-GAN.
引用
收藏
页码:3382 / 3391
页数:10
相关论文
共 22 条
  • [1] Paragraph Generation Network with Visual Relationship Detection
    Che, Wenbin
    Fan, Xiaopeng
    Xiong, Ruiqin
    Zhao, Debin
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 1435 - 1443
  • [2] Visual Relationship Embedding Network for Image Paragraph Generation
    Che, Wenbin
    Fan, Xiaopeng
    Xiong, Ruiqin
    Zhao, Debin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (09) : 2307 - 2320
  • [3] Curiosity-driven Reinforcement Learning for Diverse Visual Paragraph Generation
    Luo, Yadan
    Huang, Zi
    Zhang, Zheng
    Wang, Ziwei
    Li, Jingjing
    Yang, Yang
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 2341 - 2350
  • [4] RV-GAN: Recurrent GAN for Unconditional Video Generation
    Gupta, Sonam
    Keshari, Arti
    Das, Sukhendu
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 2023 - 2032
  • [5] Recurrent Hierarchical Topic-Guided RNN for Language Generation
    Guo, Dandan
    Chen, Bo
    Lu, Ruiying
    Zhou, Mingyuan
    25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [6] Recurrent Hierarchical Topic-Guided RNN for Language Generation
    Guo, Dandan
    Chen, Bo
    Lu, Ruiying
    Zhou, Mingyuan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [7] IRC-GAN: Introspective Recurrent Convolutional GAN for Text-to-video Generation
    Deng, Kangle
    Fei, Tianyi
    Huang, Xin
    Peng, Yuxin
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2216 - 2222
  • [8] Visual Analysis of Topic Transition among Different Sources of Text Corpora
    Zhang Y.
    Shao Y.
    Zhang J.
    1600, Institute of Computing Technology (29): : 2265 - 2272
  • [9] Chinese Image Caption Generation via Visual Attention and Topic Modeling
    Liu, Maofu
    Hu, Huijun
    Li, Lingjun
    Yu, Yan
    Guan, Weili
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (02) : 1247 - 1257
  • [10] Retinal OCT Image Report Generation Based on Visual and Semantic Topic Attention Model
    Guo, Chao
    Zhu, Weifang
    Wang, Ting
    Lin, Tian
    Chen, Haoyu
    Chen, Xinjian
    MEDICAL IMAGING 2022: IMAGE PROCESSING, 2022, 12032