Improving Intra- and Inter-Modality Visual Relation for Image Captioning

被引:14
|
作者
Wang, Yong [1 ,2 ,4 ]
Zhang, WenKai [1 ,3 ]
Liu, Qing [1 ,3 ]
Zhang, Zhengyuan [1 ,2 ,4 ]
Gao, Xin [1 ,3 ]
Sun, Xian [1 ,3 ]
机构
[1] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing, Peoples R China
[3] Chinese Acad Sci, Inst Elect, Key Lab Network Informat Syst Technol, Beijing, Peoples R China
[4] Univ Chinese Acad Sci, Beijing, Peoples R China
关键词
Image Captioning; Intra- and Inter-Modality Visual Relation; Relation Enhanced Transformer Block; Visual Guided Alignment;
D O I
10.1145/3394171.3413877
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is widely shared that capturing relationships among multi-modality features would be helpful for representing and ultimately describing an image. In this paper, we present a novel Intra- and Inter-modality visual Relation Transformer to improve connections among visual features, termed (IRT)-R-2. Firstly, we propose Relation Enhanced Transformer Block (RETB) for image feature learning, which strengthens intra-modality visual relations among objects. Moreover, to bridge the gap between inter-modality feature representations, we align them explicitly via Visual Guided Alignment (VGA) module. Finally, an end-to-end formulation is adopted to train the whole model jointly. Experiments on the MS-COCO dataset show the effectiveness of our model, leading to improvements on all commonly used metrics on the "Karpathy" test split. Extensive ablation experiments are conducted for the comprehensive analysis of the proposed method.
引用
收藏
页码:4190 / 4198
页数:9
相关论文
共 50 条
  • [41] Inter-modality non-rigid breast image registration using finite-element method
    Krol, A
    Coman, IL
    Mandel, JA
    Baum, K
    Luo, M
    Feighn, DH
    Lipson, ED
    Beaumont, J
    2003 IEEE NUCLEAR SCIENCE SYMPOSIUM, CONFERENCE RECORD, VOLS 1-5, 2004, : 1958 - 1961
  • [42] Inter-Observes and Inter-Modality Variation Comparison Between Two Image Guided System for Renal Metastasis - a Pilot Study
    Leung, W.
    Wong, M.
    Cheung, S.
    Lee, W.
    Wong, Ray
    Luk, Hollis
    Fransica
    Chan, M.
    MEDICAL PHYSICS, 2017, 44 (06)
  • [43] Perfusion abnormalities in pulmonary embolism studied with perfusion MRI and ventilation-perfusion scintigraphy:: An intra-modality and inter-modality agreement study
    Amundsen, T
    Torheim, G
    Kvistad, KA
    Waage, A
    Bjermer, L
    Nordlid, KK
    Johnsen, H
    Åsberg, A
    Haraldseth, O
    JOURNAL OF MAGNETIC RESONANCE IMAGING, 2002, 15 (04) : 386 - 394
  • [44] Relation-Aware Image Captioning for Explainable Visual Question Answering
    Tseng, Ching-Shan
    Lin, Ying-Jia
    Kao, Hung-Yu
    2022 INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE, TAAI, 2022, : 149 - 154
  • [45] Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
    Xue, Hongwei
    Huang, Yupan
    Liu, Bei
    Peng, Houwen
    Fu, Jianlong
    Li, Houqiang
    Luo, Jiebo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [46] Midwives' visual interpretation of intrapartum cardiotocographs: intra- and inter-observer agreement
    Devane, D
    Lalor, J
    JOURNAL OF ADVANCED NURSING, 2005, 52 (02) : 133 - 141
  • [47] Visual Question Answering With Dense Inter- and Intra-Modality Interactions
    Liu, Fei
    Liu, Jing
    Fang, Zhiwei
    Hong, Richang
    Lu, Hanqing
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 3518 - 3529
  • [48] Improving Textual Emotion Recognition Based on Intra- and Inter-Class Variations
    Alhuzali, Hassan
    Ananiadou, Sophia
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (02) : 1297 - 1307
  • [49] Cortical connections and early visual function: intra- and inter-columnar processing
    Ben-Shahar, O
    Huggins, PS
    Izo, T
    Zucker, SW
    JOURNAL OF PHYSIOLOGY-PARIS, 2003, 97 (2-3) : 191 - 208
  • [50] Exploring Intra- and Inter-Video Relation for Surgical Semantic Scene Segmentation
    Jin, Yueming
    Yu, Yang
    Chen, Cheng
    Zhao, Zixu
    Heng, Pheng-Ann
    Stoyanov, Danail
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2022, 41 (11) : 2991 - 3002