Visual Relationship Embedding Network for Image Paragraph Generation

被引:14
|
作者
Che, Wenbin [1 ,2 ]
Fan, Xiaopeng [1 ,2 ]
Xiong, Ruiqin [3 ]
Zhao, Debin [1 ,2 ]
机构
[1] Harbin Inst Technol, Res Ctr Intelligent Interface & Human Comp Intera, Dept Comp Sci & Technol, Harbin 150001, Peoples R China
[2] PengCheng Lab, Shenzhen 518055, Peoples R China
[3] Peking Univ, Inst Digital Media, Sch Elect Engn & Comp Sci, Beijing 100871, Peoples R China
基金
美国国家科学基金会;
关键词
Visualization; Semantics; Task analysis; Proposals; Automobiles; Buildings; Gallium nitride; Paragraph generation; image caption; region localization; attention network; visual relationship; GAN; LSTM; LANGUAGE;
D O I
10.1109/TMM.2019.2954750
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image paragraph generation aims to produce a complete description of a given image. This task is more challenging than image captioning, which only generates one sentence to describe the entire image. Traditional paragraph generation methods usually produce paragraph descriptions based on individual regions that are detected by a Region Proposal Network (RPN). However, relationships among visual objects are either ignored or utilized in an implicit manner in previous work. In this paper, we attempt to explore more visual information through a novel paragraph generation network that explicitly incorporates visual relationship semantics when producing descriptions. First, a novel Relation Pair Generative Adversarial Network (RP-GAN) is designed to locate regions that may cover subjective or objective elements. Then, their relationships are inferred through an attention-based network. Finally, the visual features and relationship semantics of valid relation pairs are taken as inputs by a Long Short-Term Memory (LSTM) network for generating sentences. The experimental results show that by explicitly utilizing the predicted relationship information, our proposed method obtains more accurate and informative paragraph descriptions than previous methods.
引用
收藏
页码:2307 / 2320
页数:14
相关论文
共 50 条
  • [31] Modeling coverage with semantic embedding for image caption generation
    Teng Jiang
    Zehan Zhang
    Yupu Yang
    The Visual Computer, 2019, 35 : 1655 - 1665
  • [32] A Novel Paragraph Embedding Method for Spoken Document Summarization
    Chen, Kuan-Yu
    Liu, Shih-Hung
    Chen, Berlin
    Wang, Hsin-Min
    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [33] Graph neural network-based visual relationship and multilevel attention for image captioning
    Sharma, Himanshu
    Srivastava, Swati
    JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (05)
  • [34] Efficient Image Embedding for Fine-Grained Visual Classification
    Payatsuporn, Soranan
    Kijsirikul, Boonserm
    2022-14TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SMART TECHNOLOGY (KST 2022), 2022, : 40 - 45
  • [35] Visual Pattern Embedding in Multi-Secret Image Steganography
    Ogiela, Marek R.
    Koptyra, Katarzyna
    2015 INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATICS AND BIOMEDICAL SCIENCES (ICIIBMS), 2015, : 434 - 437
  • [36] Adaptive RGB Image Recognition by Visual-Depth Embedding
    Cai, Ziyun
    Long, Yang
    Shao, Ling
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (05) : 2471 - 2483
  • [37] Visual Relationship Attention for Image Captioning
    Zhang, Zongjian
    Wu, Qiang
    Wang, Yang
    Chen, Fang
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [38] Exploring Visual Relationship for Image Captioning
    Yao, Ting
    Pan, Yingwei
    Li, Yehao
    Mei, Tao
    COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 : 711 - 727
  • [39] NAGNE: Node-to-Attribute Generation Network Embedding for Heterogeneous Network
    Zhang, Zheding
    Xu, Huanliang
    Li, Yanbin
    Zhai, Zhaoyu
    Ding, Yu
    APPLIED SCIENCES-BASEL, 2024, 14 (03):
  • [40] Attributed Collaboration Network Embedding for Academic Relationship Mining
    Wang, Wei
    Liu, Jiaying
    Tang, Tao
    Tuarob, Suppawong
    Xia, Feng
    Gong, Zhiguo
    King, Irwin
    ACM TRANSACTIONS ON THE WEB, 2021, 15 (01)