Improving Intra- and Inter-Modality Visual Relation for Image Captioning

被引：14

作者：

Wang, Yong ^{[1
,2
,4
]}

Zhang, WenKai ^{[1
,3
]}

Liu, Qing ^{[1
,3
]}

Zhang, Zhengyuan ^{[1
,2
,4
]}

Gao, Xin ^{[1
,3
]}

Sun, Xian ^{[1
,3
]}

机构：

[1] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing, Peoples R China

[3] Chinese Acad Sci, Inst Elect, Key Lab Network Informat Syst Technol, Beijing, Peoples R China

[4] Univ Chinese Acad Sci, Beijing, Peoples R China

来源：

MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA | 2020年

关键词：

Image Captioning; Intra- and Inter-Modality Visual Relation; Relation Enhanced Transformer Block; Visual Guided Alignment;

D O I：

10.1145/3394171.3413877

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

It is widely shared that capturing relationships among multi-modality features would be helpful for representing and ultimately describing an image. In this paper, we present a novel Intra- and Inter-modality visual Relation Transformer to improve connections among visual features, termed (IRT)-R-2. Firstly, we propose Relation Enhanced Transformer Block (RETB) for image feature learning, which strengthens intra-modality visual relations among objects. Moreover, to bridge the gap between inter-modality feature representations, we align them explicitly via Visual Guided Alignment (VGA) module. Finally, an end-to-end formulation is adopted to train the whole model jointly. Experiments on the MS-COCO dataset show the effectiveness of our model, leading to improvements on all commonly used metrics on the "Karpathy" test split. Extensive ablation experiments are conducted for the comprehensive analysis of the proposed method.

引用

页码：4190 / 4198

页数：9

共 50 条

[41] Inter-modality non-rigid breast image registration using finite-element method
Krol, A
Coman, IL
Mandel, JA
Baum, K
Luo, M
Feighn, DH
Lipson, ED
Beaumont, J
2003 IEEE NUCLEAR SCIENCE SYMPOSIUM, CONFERENCE RECORD, VOLS 1-5, 2004, : 1958 - 1961
[42] Inter-Observes and Inter-Modality Variation Comparison Between Two Image Guided System for Renal Metastasis - a Pilot Study
Leung, W.
Wong, M.
Cheung, S.
Lee, W.
Wong, Ray
Luk, Hollis
Fransica
Chan, M.
MEDICAL PHYSICS, 2017, 44 (06)
[43] Perfusion abnormalities in pulmonary embolism studied with perfusion MRI and ventilation-perfusion scintigraphy:: An intra-modality and inter-modality agreement study
Amundsen, T
Torheim, G
Kvistad, KA
Waage, A
Bjermer, L
Nordlid, KK
Johnsen, H
Åsberg, A
Haraldseth, O
JOURNAL OF MAGNETIC RESONANCE IMAGING, 2002, 15 (04) : 386 - 394
[44] Relation-Aware Image Captioning for Explainable Visual Question Answering
Tseng, Ching-Shan
Lin, Ying-Jia
Kao, Hung-Yu
2022 INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE, TAAI, 2022, : 149 - 154
[45] Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
Xue, Hongwei
Huang, Yupan
Liu, Bei
Peng, Houwen
Fu, Jianlong
Li, Houqiang
Luo, Jiebo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[46] Midwives' visual interpretation of intrapartum cardiotocographs: intra- and inter-observer agreement
Devane, D
Lalor, J
JOURNAL OF ADVANCED NURSING, 2005, 52 (02) : 133 - 141
[47] Visual Question Answering With Dense Inter- and Intra-Modality Interactions
Liu, Fei
Liu, Jing
Fang, Zhiwei
Hong, Richang
Lu, Hanqing
IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 3518 - 3529
[48] Improving Textual Emotion Recognition Based on Intra- and Inter-Class Variations
Alhuzali, Hassan
Ananiadou, Sophia
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (02) : 1297 - 1307
[49] Cortical connections and early visual function: intra- and inter-columnar processing
Ben-Shahar, O
Huggins, PS
Izo, T
Zucker, SW
JOURNAL OF PHYSIOLOGY-PARIS, 2003, 97 (2-3) : 191 - 208
[50] Exploring Intra- and Inter-Video Relation for Surgical Semantic Scene Segmentation
Jin, Yueming
Yu, Yang
Chen, Cheng
Zhao, Zixu
Heng, Pheng-Ann
Stoyanov, Danail
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2022, 41 (11) : 2991 - 3002

← 1 2 3 4 5 →