Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image Classification and Retrieval

被引:14
|
作者
Mafla, Andres [1 ]
Dey, Sounak [1 ]
Biten, Ali Furkan [1 ]
Gomez, Lluis [1 ]
Karatzas, Dimosthenis [1 ]
机构
[1] UAB, Comp Vis Ctr, Barcelona, Spain
关键词
D O I
10.1109/WACV48630.2021.00407
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scene text instances found in natural images carry explicit semantic information that can provide important cues to solve a wide array of computer vision problems. In this paper, we focus on leveraging multi-modal content in the form of visual and textual cues to tackle the task of fine-grained image classification and retrieval. First, we obtain the text instances from images by employing a text reading system. Then, we combine textual features with salient image regions to exploit the complementary information carried by the two sources. Specifically, we employ a Graph Convolutional Network to perform multi-modal reasoning and obtain relationship-enhanced features by learning a common semantic space between salient objects and text found in an image. By obtaining an enhanced set of visual and textual features, the proposed model greatly outperforms previous state-of-the-art in two different tasks, fine-grained classification and image retrieval in the ConText[23] and Drink Bottle[4] datasets.
引用
收藏
页码:4022 / 4032
页数:11
相关论文
共 50 条
  • [1] Cross-Graph Attention Enhanced Multi-Modal Correlation Learning for Fine-Grained Image-Text Retrieval
    He, Yi
    Liu, Xin
    Cheung, Yiu-ming
    Peng, Shu-Juan
    Yi, Jinhan
    Fan, Wentao
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1865 - 1869
  • [2] Fine-Grained Image Classification Based on Multi-Modal Features and Enhanced Alignment
    Han, Jing
    Zhang, Tianpeng
    Lyu, Xueqiang
    Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2024, 47 (04): : 130 - 135
  • [3] Cross-modal knowledge learning with scene text for fine-grained image classification
    Xiong, Li
    Mao, Yingchi
    Wang, Zicheng
    Nie, Bingbing
    Li, Chang
    IET IMAGE PROCESSING, 2024, 18 (06) : 1447 - 1459
  • [4] GA-SRN: graph attention based text-image semantic reasoning network for fine-grained image classification and retrieval
    Li, Wenhao
    Zhu, Hongqing
    Yang, Suyi
    Wang, Pengyu
    Zhang, Han
    Neural Computing and Applications, 2022, 34 (23) : 21387 - 21401
  • [5] GA-SRN: graph attention based text-image semantic reasoning network for fine-grained image classification and retrieval
    Li, Wenhao
    Zhu, Hongqing
    Yang, Suyi
    Wang, Pengyu
    Zhang, Han
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (23): : 21387 - 21401
  • [6] GA-SRN: graph attention based text-image semantic reasoning network for fine-grained image classification and retrieval
    Wenhao Li
    Hongqing Zhu
    Suyi Yang
    Pengyu Wang
    Han Zhang
    Neural Computing and Applications, 2022, 34 : 21387 - 21401
  • [7] MKTformer: Fine-grained Meter Classification Based on Multi-modal Knowledge Transfer
    Zheng, Zhaoye
    Zhang, Ke
    Shi, Chaojun
    Zheng, Fei
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1564 - 1570
  • [8] Integrating Scene Text and Visual Appearance for Fine-Grained Image Classification
    Bai, Xiang
    Yang, Mingkun
    Lyu, Pengyuan
    Xu, Yongchao
    Luo, Jiebo
    IEEE ACCESS, 2018, 6 : 66322 - 66335
  • [9] Multi-modal hierarchical fusion network for fine-grained paper classification
    Tan Yue
    Yong Li
    Jiedong Qin
    Zonghai Hu
    Multimedia Tools and Applications, 2024, 83 : 31527 - 31543
  • [10] Multi-modal hierarchical fusion network for fine-grained paper classification
    Yue, Tan
    Li, Yong
    Qin, Jiedong
    Hu, Zonghai
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (11) : 31527 - 31543