Attribute-driven image captioning via soft-switch pointer

被引:8
|
作者
Zhou, Yujie [1 ,2 ]
Long, Jiefeng [1 ,2 ]
Xu, Suping [1 ,2 ]
Shang, Lin [1 ,2 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210023, Peoples R China
[2] Nanjing Univ, Dept Comp Sci & Technol, Nanjing 210023, Peoples R China
基金
中国国家自然科学基金;
关键词
Image captioning; Visual attributes detection; Attention; Pointing mechanism;
D O I
10.1016/j.patrec.2021.08.021
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual attributes detection provides rich semantic concepts for image captioning. Some previous methods attempt to directly encode the attributes into vectors and generate the corresponding captions, which ignore the correlations between the image regions and attributes. In this paper, we consider to bridge the gap between visual features and detected attributes: first to look at a specific region of the image and second to decide which attribute to attend to. We propose an attribute-driven image captioning approach consisting of two parts: the visual positioning part and the attribute selection part. Specifically, we introduce the pointer-generator network into the second part of our model as a soft-switch, which determines whether to generate a word through the hidden state or point to a detected attribute at each decoding step. Qualitative and Quantitative experiments show that our model can improve the coverage of key visual attributes and significantly boost the overall performance. (c) 2021 Published by Elsevier B.V.
引用
收藏
页码:34 / 41
页数:8
相关论文
共 10 条
  • [1] Show, Observe and Tell: Attribute-driven Attention Model for Image Captioning
    Chen, Hui
    Ding, Guiguang
    Lin, Zijia
    Zhao, Sicheng
    Hang, Jungong
    [J]. PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 606 - 612
  • [2] Attribute-Driven Filtering: A new attributes predicting approach for fine-grained image captioning
    Hossen, Md. Bipul
    Ye, Zhongfu
    Abdussalam, Amr
    Ul Hassan, Shabih
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 137
  • [3] Scalable attribute-driven face image retrieval
    An, Le
    Zou, Changjian
    Zhang, Liyan
    Denney, Bradley
    [J]. NEUROCOMPUTING, 2016, 172 : 215 - 224
  • [4] Attribute-Driven Spontaneous Motion in Unpaired Image Translation
    Wu, Ruizheng
    Tao, Xin
    Gu, Xiaodong
    Shen, Xiaoyong
    Jia, Jiaya
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5922 - 5931
  • [5] Modeling the Momentum Spillover Effect for Stock Prediction via Attribute-Driven Graph Attention Networks
    Cheng, Rui
    Li, Qing
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 55 - 62
  • [6] Adma-GAN: Attribute-Driven Memory Augmented GANs for Text-to-Image Generation.
    Wu, Xintian
    Zhao, Hanbin
    Zheng, Liangli
    Ding, Shouhong
    Li, Xi
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1593 - 1602
  • [7] Semi-supervised Credit Card Fraud Detection via Attribute-Driven Graph Representation
    Xiang, Sheng
    Zhu, Mingzhi
    Cheng, Dawei
    Li, Enxia
    Zhao, Ruihui
    Ouyang, Yi
    Chen, Ling
    Zheng, Yefeng
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 14557 - 14565
  • [8] Data-driven image captioning via salient region discovery
    Kilickaya, Mert
    Akkus, Burak Kerim
    Cakici, Ruket
    Erdem, Aykut
    Erdem, Erkut
    Ikizler-Cinbis, Nazli
    [J]. IET COMPUTER VISION, 2017, 11 (06) : 398 - 406
  • [9] KEYWORD-DRIVEN IMAGE CAPTIONING VIA CONTEXT-DEPENDENT BILATERAL LSTM
    Zhang, Xiaodan
    He, Shengfeng
    Song, Xinhang
    Wei, Pengxu
    Jiang, Shuqiang
    Ye, Qixiang
    Jiao, Jianbin
    Lau, Rynson W. H.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 781 - 786
  • [10] Removing mismatches for retinal image registration via multi-attribute-driven regularized mixture model
    Wang, Gang
    Wang, Zhicheng
    Chen, Yufei
    Zhou, Qiangqiang
    Zhao, Weidong
    [J]. INFORMATION SCIENCES, 2016, 372 : 492 - 504