Modality-Specific Cross-Modal Similarity Measurement With Recurrent Attention Network

被引:90
|
作者
Peng, Yuxin [1 ]
Qi, Jinwei [1 ]
Yuan, Yuxin [1 ]
机构
[1] Peking Univ, Inst Comp Sci & Technol, Beijing 100871, Peoples R China
基金
中国国家自然科学基金;
关键词
Modality-specific cross-modal similarity measurement; recurrent attention network; attention based joint embedding; adaptive fusion; REPRESENTATION;
D O I
10.1109/TIP.2018.2852503
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, cross-modal retrieval plays an important role to flexibly find useful information across different modalities of data. Effectively measuring the similarity between different modalities of data is the key of cross-modal retrieval. Different modalities, such as image and text, have imbalanced and complementary relationship, and they contain unequal amount of information when describing the same semantics. For example, images often contain more details that cannot be demonstrated by textual descriptions and vice versa. Existing works based on deep neural network mostly construct one common space for different modalities, to find the latent alignments between them, which lose their exclusive modality-specific characteristics. Therefore, we propose modality-specific cross-modal similarity measurement approach by constructing the independent semantic space for each modality, which adopts an end-to-end framework to directly generate the modality-specific cross-modal similarity without explicit common representation. For each semantic space, modality-specific characteristics within one modality are fully exploited by recurrent attention network, while the data of another modality is projected into this space with attention based joint embedding, which utilizes the learned attention weights for guiding the fine-grained cross-modal correlation learning, and captures the imbalanced and complementary relationship between different modalities. Finally, the complementarity between the semantic spaces for different modalities is explored by adaptive fusion of the modality-specific cross-modal similarities to perform the cross-modal retrieval. Experiments on the widely used Wikipedia, Pascal Sentence, and MS-COCO data sets as well as our constructed large-scale XMediaNet data set verify the effectiveness of our proposed approach, outperforming nine state-of-the-art methods.
引用
收藏
页码:5585 / 5599
页数:15
相关论文
共 50 条
  • [1] Cross-modal deactivations during modality-specific selective attention
    Mozolic, Jennifer L.
    Joyner, David
    Hugenschmidt, Christina E.
    Peiffer, Ann M.
    Kraft, Robert A.
    Maldjian, Joseph A.
    Laurienti, Paul J.
    [J]. BMC NEUROLOGY, 2008, 8 (1)
  • [2] Cross-modal deactivations during modality-specific selective attention
    Jennifer L Mozolic
    David Joyner
    Christina E Hugenschmidt
    Ann M Peiffer
    Robert A Kraft
    Joseph A Maldjian
    Paul J Laurienti
    [J]. BMC Neurology, 8
  • [3] Modality-specific and shared generative adversarial network for cross-modal retrieval
    Wu, Fei
    Jing, Xiao-Yuan
    Wu, Zhiyong
    Ji, Yimu
    Dong, Xiwei
    Luo, Xiaokai
    Huang, Qinghua
    Wang, Ruchuan
    [J]. PATTERN RECOGNITION, 2020, 104
  • [4] Modality-specific matrix factorization hashing for cross-modal retrieval
    Haixia Xiong
    Weihua Ou
    Zengxian Yan
    Jianping Gou
    Quan Zhou
    Anzhi Wang
    [J]. Journal of Ambient Intelligence and Humanized Computing, 2022, 13 : 5067 - 5081
  • [5] Modality-specific Adaptive Scaling Method for Cross-modal Retrieval
    Chen, Baitao
    Ke, Xiao
    [J]. 2022 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, COMPUTER VISION AND MACHINE LEARNING (ICICML), 2022, : 202 - 205
  • [6] MODALITY-SPECIFIC STRUCTURE PRESERVING HASHING FOR CROSS-MODAL RETRIEVAL
    Liu, Xingbo
    Nie, Xiushan
    Sun, Haoliang
    Cui, Chaoran
    Yin, Yilong
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 1678 - 1682
  • [7] Modality-specific matrix factorization hashing for cross-modal retrieval
    Xiong, Haixia
    Ou, Weihua
    Yan, Zengxian
    Gou, Jianping
    Zhou, Quan
    Wang, Anzhi
    [J]. JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2020, 13 (11) : 5067 - 5081
  • [8] Neural practice effect during cross-modal selective attention: Supra-modal and modality-specific effects
    Xia, Jing
    Zhang, Wei
    Jiang, Yizhou
    Li, You
    Chen, Qi
    [J]. CORTEX, 2018, 106 : 47 - 64
  • [9] Modeling cross-modal enhancement and modality-specific suppression in multisensory neurons
    Patton, PE
    Anastasio, TJ
    [J]. NEURAL COMPUTATION, 2003, 15 (04) : 783 - 810
  • [10] Cross-modal and modality-specific expectancy effects between pain and disgust
    Sharvit, Gil
    Vuilleumier, Patrik
    Delplanque, Sylvain
    Corradi-Dell'Acqua, Corrado
    [J]. SCIENTIFIC REPORTS, 2015, 5