Visual Relationship Embedding Network for Image Paragraph Generation

被引:14
|
作者
Che, Wenbin [1 ,2 ]
Fan, Xiaopeng [1 ,2 ]
Xiong, Ruiqin [3 ]
Zhao, Debin [1 ,2 ]
机构
[1] Harbin Inst Technol, Res Ctr Intelligent Interface & Human Comp Intera, Dept Comp Sci & Technol, Harbin 150001, Peoples R China
[2] PengCheng Lab, Shenzhen 518055, Peoples R China
[3] Peking Univ, Inst Digital Media, Sch Elect Engn & Comp Sci, Beijing 100871, Peoples R China
基金
美国国家科学基金会;
关键词
Visualization; Semantics; Task analysis; Proposals; Automobiles; Buildings; Gallium nitride; Paragraph generation; image caption; region localization; attention network; visual relationship; GAN; LSTM; LANGUAGE;
D O I
10.1109/TMM.2019.2954750
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image paragraph generation aims to produce a complete description of a given image. This task is more challenging than image captioning, which only generates one sentence to describe the entire image. Traditional paragraph generation methods usually produce paragraph descriptions based on individual regions that are detected by a Region Proposal Network (RPN). However, relationships among visual objects are either ignored or utilized in an implicit manner in previous work. In this paper, we attempt to explore more visual information through a novel paragraph generation network that explicitly incorporates visual relationship semantics when producing descriptions. First, a novel Relation Pair Generative Adversarial Network (RP-GAN) is designed to locate regions that may cover subjective or objective elements. Then, their relationships are inferred through an attention-based network. Finally, the visual features and relationship semantics of valid relation pairs are taken as inputs by a Long Short-Term Memory (LSTM) network for generating sentences. The experimental results show that by explicitly utilizing the predicted relationship information, our proposed method obtains more accurate and informative paragraph descriptions than previous methods.
引用
收藏
页码:2307 / 2320
页数:14
相关论文
共 50 条
  • [41] Funding map using paragraph embedding based on semantic diversity
    Kawamura, Takahiro
    Watanabe, Katsutaro
    Matsumoto, Naoya
    Egami, Shusaku
    Jibu, Mari
    SCIENTOMETRICS, 2018, 116 (02) : 941 - 958
  • [42] Training for Diversity in Image Paragraph Captioning
    Melas-Kyriazi, Luke
    Han, George
    Rush, Alexander M.
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 757 - 761
  • [43] Sentence-Permuted Paragraph Generation
    Yu, Wenhao
    Zhu, Chenguang
    Zhao, Tong
    Guo, Zhichun
    Jiang, Meng
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 5051 - 5062
  • [44] Funding map using paragraph embedding based on semantic diversity
    Takahiro Kawamura
    Katsutaro Watanabe
    Naoya Matsumoto
    Shusaku Egami
    Mari Jibu
    Scientometrics, 2018, 116 : 941 - 958
  • [45] Exploring reliable visual tracking via target embedding network
    He, Xuedong
    Chen, Calvin Yu-Chian
    KNOWLEDGE-BASED SYSTEMS, 2022, 244
  • [46] EmbeddingVis: A Visual Analytics Approach to Comparative Network Embedding Inspection
    Li, Quan
    Njotoprawiro, Kristanto Sean
    Haleem, Hammad
    Chen, Qiaoan
    Yi, Chris
    Ma, Xiaojuan
    2018 IEEE CONFERENCE ON VISUAL ANALYTICS SCIENCE AND TECHNOLOGY (VAST), 2018, : 48 - 59
  • [47] Locally controllable network based on visual–linguistic relation alignment for text-to-image generation
    Zaike Li
    Li Liu
    Huaxiang Zhang
    Dongmei Liu
    Yu Song
    Boqun Li
    Multimedia Systems, 2024, 30
  • [48] VD-SAN: Visual-Densely Semantic Attention Network for Image Caption Generation
    He, Xinwei
    Yang, Yang
    Shi, Baoguang
    Bai, Xiang
    NEUROCOMPUTING, 2019, 328 : 48 - 55
  • [49] VISUAL RELATIONSHIP DETECTION WITH A DEEP CONVOLUTIONAL RELATIONSHIP NETWORK
    Peng, Yaopeng
    Chen, Danny Z.
    Lin, Lanfen
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1461 - 1465
  • [50] Improved Data Embedding Method for Digital Image with Human Visual Systems
    Fujii, Atsuhiro
    Nakamoto, Masayoshi
    Muneyasu, Mitsuji
    Ohno, Shuichi
    ELECTRONICS AND COMMUNICATIONS IN JAPAN, 2014, 97 (11) : 19 - 27