Deep Multi-Modal Metric Learning with Multi-Scale Correlation for Image-Text Retrieval

被引:3
|
作者
Hua, Yan [1 ]
Yang, Yingyun [1 ]
Du, Jianhe [1 ]
机构
[1] Commun Univ China, Sch Informat & Commun Engn, Beijing 100024, Peoples R China
基金
中国国家自然科学基金;
关键词
deep learning; metric learning; multi-modal correlation; cross-modal retrieval; image-text retrieval;
D O I
10.3390/electronics9030466
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-modal retrieval is a challenge due to heterogeneous gap and a complex semantic relationship between different modal data. Typical research map different modalities into a common subspace with a one-to-one correspondence or similarity/dissimilarity relationship of inter-modal data, in which the distances of heterogeneous data can be compared directly; thus, inter-modal retrieval can be achieved by the nearest neighboring search. However, most of them ignore intra-modal relations and complicated semantics between multi-modal data. In this paper, we propose a deep multi-modal metric learning method with multi-scale semantic correlation to deal with the retrieval tasks between image and text modalities. A deep model with two branches is designed to nonlinearly map raw heterogeneous data into comparable representations. In contrast to binary similarity, we formulate semantic relationship with multi-scale similarity to learn fine-grained multi-modal distances. Inter-modal and intra-modal correlations constructed on multi-scale semantic similarity are incorporated to train the deep model in an end-to-end way. Experiments validate the effectiveness of our proposed method on multi-modal retrieval tasks, and our method outperforms state-of-the-art methods on NUS-WIDE, MIR Flickr, and Wikipedia datasets.
引用
收藏
页数:21
相关论文
共 50 条
  • [31] Multi-scale, multi-modal neural modeling and simulation
    Ishii, Shin
    Diesmann, Markus
    Doya, Kenji
    [J]. NEURAL NETWORKS, 2011, 24 (09) : 917 - 917
  • [32] Multi-modal and multi-scale retinal imaging with angiography
    Shirazi, Muhammad Faizan
    Andilla, Jordi
    Cunquero, Marina
    Lefaudeux, Nicolas
    De Jesus, Danilo Andrade
    Brea, Luisa Sanchez
    Klein, Stefan
    van Walsum, Theo
    Grieve, Kate
    Paques, Michel
    Torm, Marie Elise Wistrup
    Larsen, Michael
    Loza-Alvarez, Pablo
    Levecq, Xavier
    Chateau, Nicolas
    Pircher, Michael
    [J]. INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2021, 62 (08)
  • [33] Multi-modal image fusion technique for enhancing image quality with multi-scale decomposition algorithm
    Sunitha, T. O.
    Rajalakshmi, R.
    [J]. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING-IMAGING AND VISUALIZATION, 2021, 9 (02): : 192 - 204
  • [34] LCEMH: Label Correlation Enhanced Multi-modal Hashing for efficient multi-modal retrieval
    Zheng, Chaoqun
    Zhu, Lei
    Zhang, Zheng
    Duan, Wenjun
    Lu, Wenpeng
    [J]. INFORMATION SCIENCES, 2024, 659
  • [35] Large-Scale Multi-modal Distance Metric Learning with Application to Content-Based Information Retrieval and Image Classification
    Rasheed, Ali Salim
    Zabihzadeh, Davood
    Al-Obaidi, Sumia Abdulhussien Razooqi
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2020, 34 (13)
  • [36] Multi-modal Correlation Modeling and Ranking for Retrieval
    Zhang, Hong
    Meng, Fanlian
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2009, 2009, 5879 : 637 - 646
  • [37] Learning and Integrating Multi-Level Matching Features for Image-Text Retrieval
    Lan, Hong
    Zhang, Pufen
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 374 - 378
  • [38] Multi-modal Remote Sensing Image Registration Based on Multi-scale Phase Congruency
    Cui, Song
    Zhong, Yanfei
    [J]. 2018 10TH IAPR WORKSHOP ON PATTERN RECOGNITION IN REMOTE SENSING (PRRS), 2018,
  • [39] Multi-Modal MRI Image Synthesis via GAN With Multi-Scale Gate Mergence
    Zhan, Bo
    Li, Di
    Wu, Xi
    Zhou, Jiliu
    Wang, Yan
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (01) : 17 - 26
  • [40] Deep multi-scale and multi-modal fusion for 3D object detection
    Guo, Rui
    Li, Deng
    Han, Yahong
    [J]. PATTERN RECOGNITION LETTERS, 2021, 151 : 236 - 242