Visual Relationship Embedding Network for Image Paragraph Generation

被引:14
|
作者
Che, Wenbin [1 ,2 ]
Fan, Xiaopeng [1 ,2 ]
Xiong, Ruiqin [3 ]
Zhao, Debin [1 ,2 ]
机构
[1] Harbin Inst Technol, Res Ctr Intelligent Interface & Human Comp Intera, Dept Comp Sci & Technol, Harbin 150001, Peoples R China
[2] PengCheng Lab, Shenzhen 518055, Peoples R China
[3] Peking Univ, Inst Digital Media, Sch Elect Engn & Comp Sci, Beijing 100871, Peoples R China
基金
美国国家科学基金会;
关键词
Visualization; Semantics; Task analysis; Proposals; Automobiles; Buildings; Gallium nitride; Paragraph generation; image caption; region localization; attention network; visual relationship; GAN; LSTM; LANGUAGE;
D O I
10.1109/TMM.2019.2954750
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image paragraph generation aims to produce a complete description of a given image. This task is more challenging than image captioning, which only generates one sentence to describe the entire image. Traditional paragraph generation methods usually produce paragraph descriptions based on individual regions that are detected by a Region Proposal Network (RPN). However, relationships among visual objects are either ignored or utilized in an implicit manner in previous work. In this paper, we attempt to explore more visual information through a novel paragraph generation network that explicitly incorporates visual relationship semantics when producing descriptions. First, a novel Relation Pair Generative Adversarial Network (RP-GAN) is designed to locate regions that may cover subjective or objective elements. Then, their relationships are inferred through an attention-based network. Finally, the visual features and relationship semantics of valid relation pairs are taken as inputs by a Long Short-Term Memory (LSTM) network for generating sentences. The experimental results show that by explicitly utilizing the predicted relationship information, our proposed method obtains more accurate and informative paragraph descriptions than previous methods.
引用
收藏
页码:2307 / 2320
页数:14
相关论文
共 50 条
  • [1] Paragraph Generation Network with Visual Relationship Detection
    Che, Wenbin
    Fan, Xiaopeng
    Xiong, Ruiqin
    Zhao, Debin
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 1435 - 1443
  • [2] Text Embedding Bank for Detailed Image Paragraph Captioning
    Gupta, Arjun
    Shen, Zengming
    Huang, Thomas
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 15791 - 15792
  • [3] Densely Supervised Hierarchical Policy-Value Network for Image Paragraph Generation
    Wu, Siying
    Zha, Zheng-Jun
    Wang, Zilei
    Li, Houqiang
    Wu, Feng
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 975 - 981
  • [4] Comprehensive Relation Modelling for Image Paragraph Generation
    Zhu, Xianglu
    Zhang, Zhang
    Wang, Wei
    Wang, Zilei
    MACHINE INTELLIGENCE RESEARCH, 2024, 21 (02) : 369 - 382
  • [5] Visual Relationship Detection With Image Position and Feature Information Embedding and Fusion
    Peng, Jinghui
    Zhang, Ying
    Huang, Weichun
    IEEE ACCESS, 2022, 10 : 117170 - 117176
  • [6] Contextual Translation Embedding for Visual Relationship Detection and Scene Graph Generation
    Hung, Zih-Siou
    Mallya, Arun
    Lazebnik, Svetlana
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (11) : 3820 - 3832
  • [7] Joint Embedding of Deep Visual and Semantic Features for Medical Image Report Generation
    Yang, Yan
    Yu, Jun
    Zhang, Jian
    Han, Weidong
    Jiang, Hanliang
    Huang, Qingming
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 167 - 178
  • [8] Bypass network for semantics driven image paragraph captioning
    Zheng, Qi
    Wang, Chaoyue
    Wang, Dadong
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 249
  • [9] Training Visual-Semantic Embedding Network for Boosting Automatic Image Annotation
    Zhang, Weifeng
    Hu, Hua
    Hu, Haiyang
    NEURAL PROCESSING LETTERS, 2018, 48 (03) : 1503 - 1519
  • [10] Recurrent Topic-Transition GAN for Visual Paragraph Generation
    Liang, Xiaodan
    Hu, Zhiting
    Zhang, Hao
    Gan, Chuang
    Xing, Eric P.
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3382 - 3391