MODELING LOCAL AND GLOBAL CONTEXTS FOR IMAGE CAPTIONING

被引:0
|
作者
Yao, Peng [1 ]
Li, Jiangyun [1 ]
Guo, Longteng [2 ]
Liu, Jing [2 ]
机构
[1] Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Beijing, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing, Peoples R China
基金
北京市自然科学基金;
关键词
Image captioning; self-attention; 1-D group convolution; image refiner;
D O I
10.1109/icme46284.2020.9102935
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Image captioning aims to first observe an image, most notably the involved objects that are highly context-dependent, and then depict it with a natural description. However, most of the current models solely use the isolated objects vectors as image representations, ignoring the contexts among them. In this paper, we introduce a Local-Global Context (LGC) network, endowing the independent object features with short-range perception (local contexts) and long-range dependence (global contexts). LGC network can be viewed as feature refiner, much beneficial to reason the novel objects and verbal words for the caption decoder. The local contexts are modeled with 1-D group convolution on adjacent objects, strengthening the local connections. Still further, self-attention mechanism is utilized to model the global contexts by correlating all the local contexts. Extensive experiments on MSCOCO dataset demonstrate that LGC network can easily plug into almost any neural captioning models and significantly improve the model performance.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Modeling graph-structured contexts for image captioning
    Li, Zhixin
    Wei, Jiahui
    Huang, Feicheng
    Ma, Huifang
    [J]. IMAGE AND VISION COMPUTING, 2023, 129
  • [2] Towards local visual modeling for image captioning
    Ma, Yiwei
    Ji, Jiayi
    Sun, Xiaoshuai
    Zhou, Yiyi
    Ji, Rongrong
    [J]. PATTERN RECOGNITION, 2023, 138
  • [3] GLCM: Global-Local Captioning Model for Remote Sensing Image Captioning
    Wang, Qi
    Huang, Wei
    Zhang, Xueting
    Li, Xuelong
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (11) : 6910 - 6922
  • [4] Local-to-Global Semantic Supervised Learning for Image Captioning
    Wang, Juan
    Duan, Yiping
    Tao, Xiaoming
    Lu, Jianhua
    [J]. ICC 2020 - 2020 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2020,
  • [5] LG-MLFormer: local and global MLP for image captioning
    Jiang, Zetao
    Wang, Xiuxian
    Zhai, Zhongyi
    Cheng, Bo
    [J]. INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2023, 12 (01)
  • [6] CSTNET: ENHANCING GLOBAL-TO-LOCAL INTERACTIONS FOR IMAGE CAPTIONING
    Yang, Xin
    Wang, Ying
    Chen, Haishun
    Li, Jie
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1861 - 1865
  • [7] Local-global visual interaction attention for image captioning
    Wang, Changzhi
    Gu, Xiaodong
    [J]. DIGITAL SIGNAL PROCESSING, 2022, 130
  • [8] LG-MLFormer: local and global MLP for image captioning
    Zetao Jiang
    Xiuxian Wang
    Zhongyi Zhai
    Bo Cheng
    [J]. International Journal of Multimedia Information Retrieval, 2023, 12
  • [9] Hierarchical Global-Local Temporal Modeling for Video Captioning
    Hu, Yaosi
    Chen, Zhenzhong
    Zha, Zheng-Jun
    Wu, Feng
    [J]. PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 774 - 783
  • [10] Transformer-based local-global guidance for image captioning
    Parvin, Hashem
    Naghsh-Nilchi, Ahmad Reza
    Mohammadi, Hossein Mahvash
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 223