Fine-grained person-based image captioning via advanced spectrum parsing

被引:0
|
作者
Wu, Jianhui [1 ]
Ni, Fan [1 ]
Wang, Zijie [1 ]
Ju, Haoyu [1 ]
Zhang, Yue [1 ]
Hu, Fangqiang [1 ]
Li, Yifeng [1 ]
机构
[1] Nanjing Tech Univ, Sch Comp Sci & Technol, Nanjing 211816, Peoples R China
关键词
Image captioning; Person-based images; Fine graininess; Spectrum domain;
D O I
10.1007/s11042-023-16893-7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent image captioning models have demonstrated remarkable performance in capturing substantial global semantic information in coarse-grained images and achieving high object coverage rates in generated captions. When applied to fine-grained images that contain heterogeneous object attributes, these models often struggle to maintain the desired granularity due to inadequate attention to local content. This paper investigates a solution for fine-grained caption generation on person-based images and heuristically proposes the Advanced Spectrum Parsing (ASP) model. Specifically, we design a novel spectrum branch to unveil the potential contour features of detected objects in the spectrum domain. We also preserve the spatial feature branch employed in existing methods, and leverage a multi-level feature extraction module to extract both spatial and spectrum features. Further more, we optimize these features, aiming to learn the spatial-spectrum correlation and complete the feature concatenation procedure via a multi-scale feature fusion module. In the inference stage, the integrated features enable the model to focus more on the local semantic regions of the person in the image. Extensive experimental results demonstrate that the proposed ASP for person-based datasets can yield promising results with both comprehensiveness and fine graininess.
引用
收藏
页码:34015 / 34030
页数:16
相关论文
共 50 条
  • [1] Fine-grained person-based image captioning via advanced spectrum parsing
    Jianhui Wu
    Fan Ni
    Zijie Wang
    Haoyu Ju
    Yue Zhang
    Fangqiang Hu
    Yifeng Li
    [J]. Multimedia Tools and Applications, 2024, 83 : 34015 - 34030
  • [2] Attention-Guided Hierarchical Parsing for Fine-Grained Person-Centric Image Captioning
    Gu, Zhengcheng
    Jin, Jing
    [J]. IEEE ACCESS, 2024, 12 : 86293 - 86301
  • [3] Fine-Grained Features for Image Captioning
    Shao, Mengyue
    Feng, Jie
    Wu, Jie
    Zhang, Haixiang
    Zheng, Yayu
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (03): : 4697 - 4712
  • [4] ICEAP: An advanced fine-grained image captioning network with enhanced attribute predictor
    Hossen, Md. Bipul
    Ye, Zhongfu
    Abdussalam, Amr
    Hossain, Mohammad Alamgir
    [J]. DISPLAYS, 2024, 84
  • [5] Fine-grained image emotion captioning based on Generative Adversarial Networks
    Yang, Chunmiao
    Wang, Yang
    Han, Liying
    Jia, Xiran
    Sun, Hebin
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (34) : 81857 - 81875
  • [6] FineFormer: Fine-Grained Adaptive Object Transformer for Image Captioning
    Wang, Bo
    Zhang, Zhao
    Fan, Jicong
    Zhao, Mingbo
    Zhan, Choujun
    Xu, Mingliang
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2022, : 508 - 517
  • [7] Fine-Grained Text Sentiment Transfer via Dependency Parsing
    Xiao, Lulu
    Qu, Xiaoye
    Li, Ruixuan
    Wang, Jun
    Zhou, Pan
    Li, Yuhua
    [J]. ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 2228 - 2235
  • [8] c-RNN: A Fine-Grained Language Model for Image Captioning
    Gengshi Huang
    Haifeng Hu
    [J]. Neural Processing Letters, 2019, 49 : 683 - 691
  • [9] Fine-grained and Semantic-guided Visual Attention for Image Captioning
    Zhang, Zongjian
    Wu, Qiang
    Wang, Yang
    Chen, Fang
    [J]. 2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 1709 - 1717
  • [10] c-RNN: A Fine-Grained Language Model for Image Captioning
    Huang, Gengshi
    Hu, Haifeng
    [J]. NEURAL PROCESSING LETTERS, 2019, 49 (02) : 683 - 691