Visual Relationship Embedding Network for Image Paragraph Generation

被引:14
|
作者
Che, Wenbin [1 ,2 ]
Fan, Xiaopeng [1 ,2 ]
Xiong, Ruiqin [3 ]
Zhao, Debin [1 ,2 ]
机构
[1] Harbin Inst Technol, Res Ctr Intelligent Interface & Human Comp Intera, Dept Comp Sci & Technol, Harbin 150001, Peoples R China
[2] PengCheng Lab, Shenzhen 518055, Peoples R China
[3] Peking Univ, Inst Digital Media, Sch Elect Engn & Comp Sci, Beijing 100871, Peoples R China
基金
美国国家科学基金会;
关键词
Visualization; Semantics; Task analysis; Proposals; Automobiles; Buildings; Gallium nitride; Paragraph generation; image caption; region localization; attention network; visual relationship; GAN; LSTM; LANGUAGE;
D O I
10.1109/TMM.2019.2954750
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image paragraph generation aims to produce a complete description of a given image. This task is more challenging than image captioning, which only generates one sentence to describe the entire image. Traditional paragraph generation methods usually produce paragraph descriptions based on individual regions that are detected by a Region Proposal Network (RPN). However, relationships among visual objects are either ignored or utilized in an implicit manner in previous work. In this paper, we attempt to explore more visual information through a novel paragraph generation network that explicitly incorporates visual relationship semantics when producing descriptions. First, a novel Relation Pair Generative Adversarial Network (RP-GAN) is designed to locate regions that may cover subjective or objective elements. Then, their relationships are inferred through an attention-based network. Finally, the visual features and relationship semantics of valid relation pairs are taken as inputs by a Long Short-Term Memory (LSTM) network for generating sentences. The experimental results show that by explicitly utilizing the predicted relationship information, our proposed method obtains more accurate and informative paragraph descriptions than previous methods.
引用
收藏
页码:2307 / 2320
页数:14
相关论文
共 50 条
  • [21] Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning
    Xie, Yujia
    Zhou, Luowei
    Dai, Xiyang
    Yuan, Lu
    Bach, Nguyen
    Liu, Ce
    Zeng, Michael
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [22] Convolutional Auto-encoding of Sentence Topics for Image Paragraph Generation
    Wang, Jing
    Pan, Yingwei
    Yao, Ting
    Tang, Jinhui
    Mei, Tao
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 940 - 946
  • [23] An embedding method in image based on visual redundancy
    Xiaoyan, Qiao
    Ji, Guangong
    Liang, Hui
    2007 IEEE INTERNATIONAL CONFERENCE ON AUTOMATION AND LOGISTICS, VOLS 1-6, 2007, : 2969 - +
  • [24] Semantic and Visual Enrichment Hierarchical Network for Medical Image Report Generation
    Tang, Qian
    Yu, Yongbin
    Feng, Xiao
    Peng, Chenhui
    2022 ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING (CACML 2022), 2022, : 738 - 743
  • [25] Network Embedding With Dual Generation Tasks
    Li, Na
    Liu, Jie
    He, Zhicheng
    Zhang, Chunhai
    Xie, Jiaying
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (07) : 7303 - 7315
  • [26] Network Embedding with Dual Generation Tasks
    Liu, Jie
    Li, Na
    He, Zhicheng
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 5102 - 5108
  • [27] Visual Relationship Detection Using Joint Visual-Semantic Embedding
    Li, Binglin
    Wang, Yang
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 3291 - 3296
  • [28] Embedding Probability Guided Network for Image Steganalysis
    Li, Qiangjie
    Feng, Guorui
    Ren, Yanli
    Zhang, Xinpeng
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1095 - 1099
  • [29] Dense semantic embedding network for image captioning
    Xiao, Xinyu
    Wang, Lingfeng
    Ding, Kun
    Xiang, Shiming
    Pan, Chunhong
    PATTERN RECOGNITION, 2019, 90 : 285 - 296
  • [30] Modeling coverage with semantic embedding for image caption generation
    Jiang, Teng
    Zhang, Zehan
    Yang, Yupu
    VISUAL COMPUTER, 2019, 35 (11): : 1655 - 1665