Attribute-Driven Filtering: A new attributes predicting approach for fine-grained image captioning

被引:0
|
作者
Hossen, Md. Bipul [1 ]
Ye, Zhongfu [1 ]
Abdussalam, Amr [1 ]
Ul Hassan, Shabih [1 ]
机构
[1] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei 230027, Anhui, Peoples R China
关键词
Fine-grained captioning; Fusion mechanism; Encoder-decoder architecture; Attribute predictor module; ATTENTION;
D O I
10.1016/j.engappai.2024.109134
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Fine-grained image captioning with attribute information has garnered significant attention in the realms of computer vision and natural language processing, demanding precise and contextually relevant descriptions of visual content. While previous attribute-driven image captioning models have shown improvements, challenges remain, such as the independence of attribute predictors and caption generators and the semantic gap between images and attributes. Another common issue is the inclusion of all attributes at every time step, despite most attributes being irrelevant to the word currently being generated. This can divert the model's attention toward erroneous semantic details, resulting in a performance decline. To address these issues, we propose a novel Attribute-Driven Filtering (ADF) captioning network designed to provide rich and nuanced descriptions. This model incorporates a unique Attribute Predictor Module (APM) that dynamically predicts the most pertinent attributes in accordance with the textual context, utilizing different attributes at various time steps. The novelty of this approach lies in recognizing that not all attributes hold equal relevance at each time step, and the APM filters out irrelevant attributes to generate precise and contextually relevant captions. Furthermore, this model features a fusion mechanism that integrates visual information from a conventional attention module with attribute information predicted by the APM, aiming to reduce the visual semantic gap between images and attributes. Extensive experimentation demonstrates that the ADF model outperforms advanced models, achieving impressive CIDEr-D scores of 72.0 (Flickr30K) and 123.3 (MS-COCO) through reinforcement learning optimization. It consistently surpasses baseline models across diverse evaluation metrics, highlighting its effectiveness and robustness.
引用
收藏
页数:17
相关论文
共 50 条
  • [11] Attributes Grouping and Mining Hashing for Fine-Grained Image Retrieval
    Lu, Xin
    Chen, Shikun
    Cao, Yichao
    Zhou, Xin
    Lu, Xiaobo
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6558 - 6566
  • [12] REO-Relevance, Extraness, Omission: A Fine-grained Evaluation for Image Captioning
    Jiang, Ming
    Hu, Junjie
    Huang, Qiuyuan
    Zhang, Lei
    Diesner, Jana
    Gao, Jianfeng
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1475 - 1480
  • [13] Image Difference Captioning With Instance-Level Fine-Grained Feature Representation
    Huang, Qingbao
    Liang, Yu
    Wei, Jielong
    Yi, Cai
    Liang, Hanyu
    Leung, Ho-fung
    Li, Qing
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 2004 - 2017
  • [14] Context-Aware Visual Policy Network for Fine-Grained Image Captioning
    Zha, Zheng-Jun
    Liu, Daqing
    Zhang, Hanwang
    Zhang, Yongdong
    Wu, Feng
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (02) : 710 - 722
  • [15] Fine-grained person-based image captioning via advanced spectrum parsing
    Wu, Jianhui
    Ni, Fan
    Wang, Zijie
    Ju, Haoyu
    Zhang, Yue
    Hu, Fangqiang
    Li, Yifeng
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (11) : 34015 - 34030
  • [16] Integration of textual cues for fine-grained image captioning using deep CNN and LSTM
    Gupta, Neeraj
    Jalal, Anand Singh
    [J]. NEURAL COMPUTING & APPLICATIONS, 2020, 32 (24): : 17899 - 17908
  • [17] Integration of textual cues for fine-grained image captioning using deep CNN and LSTM
    Neeraj Gupta
    Anand Singh Jalal
    [J]. Neural Computing and Applications, 2020, 32 : 17899 - 17908
  • [18] Fine-grained person-based image captioning via advanced spectrum parsing
    Jianhui Wu
    Fan Ni
    Zijie Wang
    Haoyu Ju
    Yue Zhang
    Fangqiang Hu
    Yifeng Li
    [J]. Multimedia Tools and Applications, 2024, 83 : 34015 - 34030
  • [19] Correlation Filtering-Based Hashing for Fine-Grained Image Retrieval
    Ma, Lei
    Li, Xuan
    Shi, Yu
    Wu, Jinmeng
    Zhang, Yaozhong
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 2129 - 2133
  • [20] Attribute-specific Control Units in StyleGAN for Fine-grained Image Manipulation
    Wang, Rui
    Chen, Jian
    Yu, Gang
    Sun, Li
    Yu, Changqian
    Gao, Changxin
    Sang, Nong
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 926 - 934