Attribute-Driven Filtering: A new attributes predicting approach for fine-grained image captioning

被引:0
|
作者
Hossen, Md. Bipul [1 ]
Ye, Zhongfu [1 ]
Abdussalam, Amr [1 ]
Ul Hassan, Shabih [1 ]
机构
[1] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei 230027, Anhui, Peoples R China
关键词
Fine-grained captioning; Fusion mechanism; Encoder-decoder architecture; Attribute predictor module; ATTENTION;
D O I
10.1016/j.engappai.2024.109134
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Fine-grained image captioning with attribute information has garnered significant attention in the realms of computer vision and natural language processing, demanding precise and contextually relevant descriptions of visual content. While previous attribute-driven image captioning models have shown improvements, challenges remain, such as the independence of attribute predictors and caption generators and the semantic gap between images and attributes. Another common issue is the inclusion of all attributes at every time step, despite most attributes being irrelevant to the word currently being generated. This can divert the model's attention toward erroneous semantic details, resulting in a performance decline. To address these issues, we propose a novel Attribute-Driven Filtering (ADF) captioning network designed to provide rich and nuanced descriptions. This model incorporates a unique Attribute Predictor Module (APM) that dynamically predicts the most pertinent attributes in accordance with the textual context, utilizing different attributes at various time steps. The novelty of this approach lies in recognizing that not all attributes hold equal relevance at each time step, and the APM filters out irrelevant attributes to generate precise and contextually relevant captions. Furthermore, this model features a fusion mechanism that integrates visual information from a conventional attention module with attribute information predicted by the APM, aiming to reduce the visual semantic gap between images and attributes. Extensive experimentation demonstrates that the ADF model outperforms advanced models, achieving impressive CIDEr-D scores of 72.0 (Flickr30K) and 123.3 (MS-COCO) through reinforcement learning optimization. It consistently surpasses baseline models across diverse evaluation metrics, highlighting its effectiveness and robustness.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Fine-Grained Features for Image Captioning
    Shao, Mengyue
    Feng, Jie
    Wu, Jie
    Zhang, Haixiang
    Zheng, Yayu
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (03): : 4697 - 4712
  • [2] ICEAP: An advanced fine-grained image captioning network with enhanced attribute predictor
    Hossen, Md. Bipul
    Ye, Zhongfu
    Abdussalam, Amr
    Hossain, Mohammad Alamgir
    [J]. DISPLAYS, 2024, 84
  • [3] Attribute-driven image captioning via soft-switch pointer
    Zhou, Yujie
    Long, Jiefeng
    Xu, Suping
    Shang, Lin
    [J]. PATTERN RECOGNITION LETTERS, 2021, 152 : 34 - 41
  • [4] Show, Observe and Tell: Attribute-driven Attention Model for Image Captioning
    Chen, Hui
    Ding, Guiguang
    Lin, Zijia
    Zhao, Sicheng
    Hang, Jungong
    [J]. PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 606 - 612
  • [5] FineFormer: Fine-Grained Adaptive Object Transformer for Image Captioning
    Wang, Bo
    Zhang, Zhao
    Fan, Jicong
    Zhao, Mingbo
    Zhan, Choujun
    Xu, Mingliang
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2022, : 508 - 517
  • [6] c-RNN: A Fine-Grained Language Model for Image Captioning
    Gengshi Huang
    Haifeng Hu
    [J]. Neural Processing Letters, 2019, 49 : 683 - 691
  • [7] Fine-grained and Semantic-guided Visual Attention for Image Captioning
    Zhang, Zongjian
    Wu, Qiang
    Wang, Yang
    Chen, Fang
    [J]. 2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 1709 - 1717
  • [8] c-RNN: A Fine-Grained Language Model for Image Captioning
    Huang, Gengshi
    Hu, Haifeng
    [J]. NEURAL PROCESSING LETTERS, 2019, 49 (02) : 683 - 691
  • [9] Fine-Grained Image Captioning With Global-Local Discriminative Objective
    Wu, Jie
    Chen, Tianshui
    Wu, Hefeng
    Yang, Zhi
    Luo, Guangchun
    Lin, Liang
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 2413 - 2427
  • [10] Fine-grained image emotion captioning based on Generative Adversarial Networks
    Yang, Chunmiao
    Wang, Yang
    Han, Liying
    Jia, Xiran
    Sun, Hebin
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (34) : 81857 - 81875