MC-Net: multi-scale contextual information aggregation network for image captioning on remote sensing images

被引:2
|
作者
Huang, Haiyan [1 ]
Shao, Zhenfeng [1 ,7 ]
Cheng, Qimin [2 ]
Huang, Xiao [3 ]
Wu, Xiaoping [4 ]
Li, Guoming [5 ]
Tan, Li [6 ]
机构
[1] Wuhan Univ, State Key Lab Informat Engn Surveying Mapping & Re, Wuhan, Peoples R China
[2] Huazhong Univ Sci & Technol, Sch Elect Informat & Commun, Wuhan, Peoples R China
[3] Univ Arkansas, Dept Geosci, Fayetteville, AR USA
[4] Sichuan Normal Univ, Sch Geog & Resources Sci, Chengdu, Sichuan, Peoples R China
[5] Univ Elect Sci & Technol, Sch Resources & Environm, Chengdu, Sichuan, Peoples R China
[6] Chengdu Univ Technol, Sch Geophys, Chengdu, Sichuan, Peoples R China
[7] Wuhan Univ, State Key Lab Informat Engn Surveying Mapping & Re, Wuhan 430079, Peoples R China
基金
中国国家自然科学基金;
关键词
Image captioning; deep learning; semantic understanding; visual-text alignment; MODELS;
D O I
10.1080/17538947.2023.2283482
中图分类号
P9 [自然地理学];
学科分类号
0705 ; 070501 ;
摘要
Remote Sensing Image Captioning (RSIC) plays a crucial role in advancing semantic understanding and has increasingly become a focal point of research. Nevertheless, existing RSIC methods grapple with challenges due to the intricate multi-scale nature and multifaceted backgrounds inherent in Remote Sensing Images (RSIs). Compounding these challenges are the perceptible information disparities across diverse modalities. In response to these challenges, we propose a novel multi-scale contextual information aggregation image captioning network (MC-Net). This network incorporates an image encoder enhanced with a multi-scale feature extraction module, a feature fusion module, and a finely tuned adaptive decoder equipped with a visual-text alignment module. Notably, MC-Net possesses the capability to extract informative multiscale features, facilitated by the multilayer perceptron and transformer. We also introduce an adaptive gating mechanism during the decoding phase to ensure precise alignment between visual regions and their corresponding text descriptions. Empirical studies conducted on four publicly recognized cross-modal datasets unequivocally demonstrate the superior robustness and efficacy of MC-Net in comparison to contemporaneous RSIC methods.
引用
收藏
页码:4848 / 4866
页数:19
相关论文
共 50 条
  • [1] MC-Net: multi-scale context-attention network for medical CT image segmentation
    Haiying Xia
    Mingjun Ma
    Haisheng Li
    Shuxiang Song
    [J]. Applied Intelligence, 2022, 52 : 1508 - 1519
  • [2] MC-Net: multi-scale context-attention network for medical CT image segmentation
    Xia, Haiying
    Ma, Mingjun
    Li, Haisheng
    Song, Shuxiang
    [J]. APPLIED INTELLIGENCE, 2022, 52 (02) : 1508 - 1519
  • [3] Multi-scale Attentive Fusion Network for Remote Sensing Image Change Captioning
    Chen, Cai
    Wang, Yi
    Yap, Kim-Hui
    [J]. 2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [4] MULTI-SCALE CROPPING MECHANISM FOR REMOTE SENSING IMAGE CAPTIONING
    Zhang, Xueting
    Wang, Qi
    Chen, Shangdong
    Li, Xuelong
    [J]. 2019 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2019), 2019, : 10039 - 10042
  • [5] MC-Net: Multi-Scale Feature Fusion and Cross-Level Information Interaction Network for Traffic Sign Detection
    Yu, Zhongyi
    Cheng, Debo
    Zhang, Wenzhen
    Chen, Jing
    Zhang, Shichao
    [J]. 2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 841 - 848
  • [6] Remote Sensing Image Denoising Based on Multi-Scale Feature Fusion and Regional Contextual Information
    Ding, Anqi
    Cai, Zhouyin
    Li, Jia
    Zhang, Junjie
    [J]. 2022 IEEE 24TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2022,
  • [7] MC-Net: Multiple max-pooling integration module and cross multi-scale deconvolution network
    You, Hongfeng
    Yu, Long
    Tian, Shengwei
    Ma, Xiang
    Xing, Yan
    Xin, Ning
    Cai, Weiwei
    [J]. Knowledge-Based Systems, 2021, 231
  • [8] MC-Net: Multiple max-pooling integration module and cross multi-scale deconvolution network
    You, Hongfeng
    Yu, Long
    Tian, Shengwei
    Ma, Xiang
    Xing, Yan
    Xin, Ning
    Cai, Weiwei
    [J]. KNOWLEDGE-BASED SYSTEMS, 2021, 231
  • [9] Multi-Scale Context Aggregation for Semantic Segmentation of Remote Sensing Images
    Zhang, Jing
    Lin, Shaofu
    Ding, Lei
    Bruzzone, Lorenzo
    [J]. REMOTE SENSING, 2020, 12 (04)
  • [10] Semantic segmentation of multi-scale remote sensing images with contextual feature enhancement
    Zhang, Mei
    Liu, Lingling
    Pei, Yongtao
    Xie, Guojing
    Wen, Jinghua
    [J]. VISUAL COMPUTER, 2024,