Exploring and Distilling Cross-Modal Information for Image Captioning

被引:0
|
作者
Liu, Fenglin [1 ]
Ren, Xuancheng [2 ]
Liu, Yuanxin [3 ]
Lei, Kai [1 ]
Sun, Xu [2 ]
机构
[1] Peking Univ, Sch Elect & Comp Engn SECE, Shenzhen Key Lab Informat Centr Networking & Bloc, Beijing, Peoples R China
[2] Peking Univ, Sch EECS, MOE Key Lab Computat Linguist, Beijing, Peoples R China
[3] Beijing Univ Posts & Telecommun, Sch ICE, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, attention-based encoder-decoder models have been used extensively in image captioning. Yet there is still great difficulty for the current methods to achieve deep image understanding. In this work, we argue that such understanding requires visual attention to correlated image regions and semantic attention to coherent attributes of interest. To perform effective attention, we explore image captioning from a cross-modal perspective and propose the Global-and-Local Information Exploring-and-Distilling approach that explores and distills the source information in vision and language. It globally provides the aspect vector, a spatial and relational representation of images based on caption contexts, through the extraction of salient region groupings and attribute collocations, and locally extracts the fine-grained regions and attributes in reference to the aspect vector for word selection. Our fully-attentive model achieves a CIDEr score of 129.3 in offline COCO evaluation with remarkable efficiency in terms of accuracy, speed, and parameter budget.
引用
收藏
页码:5095 / 5101
页数:7
相关论文
共 50 条
  • [41] Cross-Modal Consistency for Single-Modal MR Image Segmentation
    Xu, Wenxuan
    Li, Cangxin
    Bian, Yun
    Meng, Qingquan
    Zhu, Weifang
    Shi, Fei
    Chen, Xinjian
    Shao, Chengwei
    Xiang, Dehui
    IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2024, 71 (09) : 2557 - 2567
  • [42] Semi-supervised cross-modal learning for cross modal retrieval and image annotation
    Fuhao Zou
    Xingqiang Bai
    Chaoyang Luan
    Kai Li
    Yunfei Wang
    Hefei Ling
    World Wide Web, 2019, 22 : 825 - 841
  • [43] Exploring a Fine-Grained Multiscale Method for Cross-Modal Remote Sensing Image Retrieval
    Yuan, Zhiqiang
    Zhang, Wenkai
    Fu, Kun
    Li, Xuan
    Deng, Chubo
    Wang, Hongqi
    Sun, Xian
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [44] Semi-supervised cross-modal learning for cross modal retrieval and image annotation
    Zou, Fuhao
    Bai, Xingqiang
    Luan, Chaoyang
    Li, Kai
    Wang, Yunfei
    Ling, Hefei
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2019, 22 (02): : 825 - 841
  • [45] Exploring latent weight factors and global information for food-oriented cross-modal retrieval
    Zhao, Wenyu
    Zhou, Dong
    Cao, Buqing
    Liang, Wei
    Sukhija, Nitin
    CONNECTION SCIENCE, 2023, 35 (01)
  • [46] Distractors-Immune Representation Learning with Cross-Modal Contrastive Regularization for Change Captioning
    Tu, Yunbin
    Li, Liang
    Su, Li
    Yan, Chenggang
    Huang, Qingming
    COMPUTER VISION-ECCV 2024, PT XLIII, 2025, 15101 : 311 - 328
  • [47] From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping
    Wang, Junyang
    Yan, Ming
    Zhang, Yi
    Sang, Jitao
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 4326 - 4334
  • [48] Texture BERT for Cross-modal Texture Image Retrieval
    Xu, Zelai
    Yu, Tan
    Li, Ping
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 4610 - 4614
  • [49] An ensemble prior of image structure for cross-modal inference
    Ravela, S
    Torralba, A
    Freeman, WT
    TENTH IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS 1 AND 2, PROCEEDINGS, 2005, : 871 - 876
  • [50] Cross-Modal Coherence for Text-to-Image Retrieval
    Alikhani, Malihe
    Han, Fangda
    Ravi, Hareesh
    Kapadia, Mubbasir
    Pavlovic, Vladimir
    Stone, Matthew
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 10427 - 10435