Deep Multi-Modal Metric Learning with Multi-Scale Correlation for Image-Text Retrieval

被引:3
|
作者
Hua, Yan [1 ]
Yang, Yingyun [1 ]
Du, Jianhe [1 ]
机构
[1] Commun Univ China, Sch Informat & Commun Engn, Beijing 100024, Peoples R China
基金
中国国家自然科学基金;
关键词
deep learning; metric learning; multi-modal correlation; cross-modal retrieval; image-text retrieval;
D O I
10.3390/electronics9030466
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-modal retrieval is a challenge due to heterogeneous gap and a complex semantic relationship between different modal data. Typical research map different modalities into a common subspace with a one-to-one correspondence or similarity/dissimilarity relationship of inter-modal data, in which the distances of heterogeneous data can be compared directly; thus, inter-modal retrieval can be achieved by the nearest neighboring search. However, most of them ignore intra-modal relations and complicated semantics between multi-modal data. In this paper, we propose a deep multi-modal metric learning method with multi-scale semantic correlation to deal with the retrieval tasks between image and text modalities. A deep model with two branches is designed to nonlinearly map raw heterogeneous data into comparable representations. In contrast to binary similarity, we formulate semantic relationship with multi-scale similarity to learn fine-grained multi-modal distances. Inter-modal and intra-modal correlations constructed on multi-scale semantic similarity are incorporated to train the deep model in an end-to-end way. Experiments validate the effectiveness of our proposed method on multi-modal retrieval tasks, and our method outperforms state-of-the-art methods on NUS-WIDE, MIR Flickr, and Wikipedia datasets.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation
    Niu, Yulei
    Lu, Zhiwu
    Wen, Ji-Rong
    Xiang, Tao
    Chang, Shih-Fu
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (04) : 1720 - 1731
  • [2] MULTI-SCALE INTERACTIVE TRANSFORMER FOR REMOTE SENSING CROSS-MODAL IMAGE-TEXT RETRIEVAL
    Wang, Yijing
    Ma, Jingjing
    Li, Mingteng
    Tang, Xu
    Han, Xiao
    Jiao, Licheng
    [J]. 2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 839 - 842
  • [3] Cross-Graph Attention Enhanced Multi-Modal Correlation Learning for Fine-Grained Image-Text Retrieval
    He, Yi
    Liu, Xin
    Cheung, Yiu-ming
    Peng, Shu-Juan
    Yi, Jinhan
    Fan, Wentao
    [J]. SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1865 - 1869
  • [4] Multi-scale Multi-modal Dictionary BERT For Effective Text-image Retrieval in Multimedia Advertising
    Yu, Tan
    Liu, Jie
    Jin, Zhipeng
    Yang, Yi
    Fei, Hongliang
    Li, Ping
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 4655 - 4660
  • [5] Adversarial Attentive Multi-Modal Embedding Learning for Image-Text Matching
    Wei, Kaimin
    Zhou, Zhibo
    [J]. IEEE ACCESS, 2020, 8 : 96237 - 96248
  • [6] An Image-Text Matching Method for Multi-Modal Robots
    Zheng, Ke
    Li, Zhou
    [J]. JOURNAL OF ORGANIZATIONAL AND END USER COMPUTING, 2024, 36 (01)
  • [7] Online Multi-Modal Distance Metric Learning with Application to Image Retrieval
    Wu, Pengcheng
    Hoi, Steven C. H.
    Zhao, Peilin
    Miao, Chunyan
    Liu, Zhi-Yong
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (02) : 454 - 467
  • [8] Dissecting Deep Metric Learning Losses for Image-Text Retrieval
    Xuan, Hong
    Chen, Xi
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2163 - 2172
  • [9] Robust Multi-Scale Multi-modal Image Registration
    Holtzman-Gazit, Michal
    Yavneh, Irad
    [J]. SIGNAL PROCESSING, SENSOR FUSION, AND TARGET RECOGNITION XIX, 2010, 7697
  • [10] Multi-Modal and Multi-Scale Oral Diadochokinesis Analysis using Deep Learning
    Wang, Yang Yang
    Gaol, Ke
    Hamad, Ali
    McCarthy, Brianna
    Kloepper, Ashley M.
    Lever, Teresa E.
    Bunyak, Filiz
    [J]. 2021 IEEE APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP (AIPR), 2021,