Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image Classification and Retrieval

被引：14

作者：

Mafla, Andres ^{[1
]}

Dey, Sounak ^{[1
]}

Biten, Ali Furkan ^{[1
]}

Gomez, Lluis ^{[1
]}

Karatzas, Dimosthenis ^{[1
]}

机构：

[1] UAB, Comp Vis Ctr, Barcelona, Spain

来源：

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021 | 2021年

关键词：

D O I：

10.1109/WACV48630.2021.00407

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Scene text instances found in natural images carry explicit semantic information that can provide important cues to solve a wide array of computer vision problems. In this paper, we focus on leveraging multi-modal content in the form of visual and textual cues to tackle the task of fine-grained image classification and retrieval. First, we obtain the text instances from images by employing a text reading system. Then, we combine textual features with salient image regions to exploit the complementary information carried by the two sources. Specifically, we employ a Graph Convolutional Network to perform multi-modal reasoning and obtain relationship-enhanced features by learning a common semantic space between salient objects and text found in an image. By obtaining an enhanced set of visual and textual features, the proposed model greatly outperforms previous state-of-the-art in two different tasks, fine-grained classification and image retrieval in the ConText[23] and Drink Bottle[4] datasets.

引用

页码：4022 / 4032

页数：11

共 50 条

[21] Audio-Visual Scene Classification Based on Multi-modal Graph Fusion
Lei, Han
Chen, Ning
INTERSPEECH 2022, 2022, : 4157 - 4161
[22] Fine-grained image classification method with noisy labels based on retrieval augmentation
Bao, Heng
Deng, Lirui
Zhang, Liang
Chen, Xunxun
Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2024, 50 (07): : 2284 - 2292
[23] Cross-Media Fine-Grained Representation Learning Based on Multi-modal Graph and Adversarial Hash Attention Network
Liang M.
Wang X.
Du J.
Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2022, 35 (03): : 195 - 206
[24] A Multi-modal Approach to Fine-grained Opinion Mining on Video Reviews
Marrese-Taylor, Edison
Rodriguez-Opazo, Cristian
Balazs, Jorge A.
Gould, Stephen
Matsuo, Yutaka
PROCEEDINGS OF THE SECOND GRAND CHALLENGE AND WORKSHOP ON MULTIMODAL LANGUAGE (CHALLENGE-HML), VOL 1, 2020, : 8 - 18
[25] Image and Encoded Text Fusion for Multi-Modal Classification
Gallo, I.
Calefati, A.
Nawaz, S.
Janjua, M. K.
2018 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA), 2018, : 203 - 209
[26] Fine-grained Activities Recognition with Coarse-grained Labeled Multi-modal Data
Hu, Zhizhang
Yu, Tong
Zhang, Yue
Pan, Shijia
UBICOMP/ISWC '20 ADJUNCT: PROCEEDINGS OF THE 2020 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING AND PROCEEDINGS OF THE 2020 ACM INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS, 2020, : 644 - 649
[27] Fine-Grained Text Classification Based on Label Augmentation
Guo, Ruiqiang
Yang, Shilong
Jia, Xiaowen
Wei, Qianqiang
Computer Engineering and Applications, 60 (21): : 134 - 141
[28] COREN: Multi-Modal Co-Occurrence Transformer Reasoning Network for Image-Text Retrieval
Wang, Yaodong
Ji, Zhong
Chen, Kexin
Pang, Yanwei
Zhang, Zhongfei
NEURAL PROCESSING LETTERS, 2023, 55 (05) : 5959 - 5978
[29] COREN: Multi-Modal Co-Occurrence Transformer Reasoning Network for Image-Text Retrieval
Yaodong Wang
Zhong Ji
Kexin Chen
Yanwei Pang
Zhongfei Zhang
Neural Processing Letters, 2023, 55 : 5959 - 5978
[30] Cross-modal subspace learning for fine-grained sketch-based image retrieval
Xu, Peng
Yin, Qiyue
Huang, Yongye
Song, Yi-Zhe
Ma, Zhanyu
Wang, Liang
Xiang, Tao
Kleijn, W. Bastiaan
Guo, Jun
NEUROCOMPUTING, 2018, 278 : 75 - 86

← 1 2 3 4 5 →