Mask-guided network for image captioning

被引:2
|
作者
Lim, Jian Han [1 ]
Chan, Chee Seng [1 ]
机构
[1] Univ Malaya, Fac Comp Sci & Informat Technol, CISiP, Kuala Lumpur 50603, Malaysia
关键词
Image captioning; Deep learning; Scene understanding; Mask RCNN; Transformer;
D O I
10.1016/j.patrec.2023.07.013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Attention mechanisms have been widely adopted for image captioning because of their powerful performance. In this paper, we propose a Mask Captioning Network (MaC) consisting of an object layer and a background layer to capture the objects and scenes of an image to generate a sentence. To this end, we leverage the Mask RCNN to detect salient regions at the pixel level instead of a bounding box in the object layer. Meanwhile, in the background layer, a CNN model is used to encode the scene features. In addition, MaC is implemented in both LSTM-based and Transformer-based image captioning architectures. We introduce a mask-guided transformer encoder with additional features to enhance the model. Experimental results show that our model significantly outperforms (with a much richer sentence) baseline models and achieves comparable results with state-of-the-art methods on MSCOCO and Flickr30k datasets.
引用
收藏
页码:79 / 86
页数:8
相关论文
共 50 条
  • [1] A Mask-Guided Transformer Network with Topic Token for Remote Sensing Image Captioning
    Ren, Zihao
    Gou, Shuiping
    Guo, Zhang
    Mao, Shasha
    Li, Ruimin
    [J]. REMOTE SENSING, 2022, 14 (12)
  • [2] Mask-guided Image Classification with Siamese Networks
    Alqasir, Hiba
    Muselet, Damien
    Ducottet, Christophe
    [J]. PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 5: VISAPP, 2020, : 536 - 543
  • [3] Segmentation mask-guided person image generation
    Meichen Liu
    Xin Yan
    Chenhui Wang
    Kejun Wang
    [J]. Applied Intelligence, 2021, 51 : 1161 - 1176
  • [4] MagConv: Mask-Guided Convolution for Image Inpainting
    Yu, Xuexin
    Xu, Long
    Li, Jia
    Ji, Xiangyang
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 4716 - 4727
  • [5] Segmentation mask-guided person image generation
    Liu, Meichen
    Yan, Xin
    Wang, Chenhui
    Wang, Kejun
    [J]. APPLIED INTELLIGENCE, 2021, 51 (02) : 1161 - 1176
  • [6] Robust Medical Image Colorization with Spatial Mask-Guided Generative Adversarial Network
    Zhang, Zuyu
    Li, Yan
    Shin, Byeong-Seok
    [J]. BIOENGINEERING-BASEL, 2022, 9 (12):
  • [7] Mask-Guided Stamp Erasure for Real Document Image
    Yang, Xinye
    Yang, Dongbao
    Zhou, Yu
    Guo, Youhui
    Wang, Weiping
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1631 - 1636
  • [8] Mask-guided image person removal with data synthesis
    Jiang, Yunliang
    Gu, Chenyang
    Xue, Zhenfeng
    Zhang, Xiongtao
    Liu, Yong
    [J]. IET IMAGE PROCESSING, 2023, 17 (07) : 2214 - 2224
  • [9] Highlight mask-guided adaptive residual network for single image highlight detection and removal
    Wang, Shuaibin
    Li, Li
    Wang, Juan
    Peng, Tao
    Li, Zhenwei
    [J]. COMPUTER ANIMATION AND VIRTUAL WORLDS, 2024, 35 (03)
  • [10] Mask-Guided Attention Network for Occluded Pedestrian Detection
    Pang, Yanwei
    Xie, Jin
    Khan, Muhammad Haris
    Anwer, Rao Muhammad
    Khan, Fahad Shahbaz
    Shao, Ling
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4966 - 4974