Probabilistic Embeddings for Cross-Modal Retrieval

被引:100
|
作者
Chun, Sanghyuk [1 ]
Oh, Seong Joon [1 ]
de Rezende, Rafael Sampaio [2 ]
Kalantidis, Yannis [2 ]
Larlus, Diane [2 ]
机构
[1] NAVER AI Lab, Seongnam, South Korea
[2] NAVER Labs Europe, Meylan, France
关键词
D O I
10.1109/CVPR46437.2021.00831
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-modal retrieval methods build a common representation space for samples from multiple modalities, typically from the vision and the language domains. For images and their captions, the multiplicity of the correspondences makes the task particularly challenging. Given an image (respectively a caption), there are multiple captions (respectively images) that equally make sense. In this paper, we argue that deterministic functions are not sufficiently powerful to capture such one-to-many correspondences. Instead, we propose to use Probabilistic Cross-Modal Embedding (PCME), where samples from the different modalities are represented as probabilistic distributions in the common embedding space. Since common benchmarks such as COCO suffer from non-exhaustive annotations for cross-modal matches, we propose to additionally evaluate retrieval on the CUB dataset, a smaller yet clean database where all possible image-caption pairs are annotated. We extensively ablate PCME and demonstrate that it not only improves the retrieval performance over its deterministic counterpart but also provides uncertainty estimates that render the embeddings more interpretable.
引用
收藏
页码:8411 / 8420
页数:10
相关论文
共 50 条
  • [41] Multi-modal and cross-modal for lecture videos retrieval
    Nhu Van Nguyen
    Coustaty, Mickal
    Ogier, Jean-Marc
    [J]. 2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 2667 - 2672
  • [42] Deep Semantic Mapping for Cross-Modal Retrieval
    Wang, Cheng
    Yang, Haojin
    Meinel, Christoph
    [J]. 2015 IEEE 27TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2015), 2015, : 234 - 241
  • [43] Cross-modal retrieval based on shared proxies
    Wei, Yuxin
    Zheng, Ligang
    Qiu, Guoping
    Cai, Guocan
    [J]. INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2024, 13 (01)
  • [44] Deep Relation Embedding for Cross-Modal Retrieval
    Zhang, Yifan
    Zhou, Wengang
    Wang, Min
    Tian, Qi
    Li, Houqiang
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 617 - 627
  • [45] Cross-Modal Topic Correlations for Multimedia Retrieval
    Yu, Jing
    Cong, Yonghui
    Qin, Zengchang
    Wan, Tao
    [J]. 2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 246 - 249
  • [46] Learning Cross-Modal Retrieval with Noisy Labels
    Hu, Peng
    Peng, Xi
    Zhu, Hongyuan
    Zhen, Liangli
    Lin, Jie
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5399 - 5409
  • [47] Cross-Modal Retrieval with Correlation Feature Propagation
    Zhang, Lu
    Cao, Feng
    Liang, Xinyan
    Qian, Yuhua
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2022, 59 (09): : 1993 - 2002
  • [48] The State of the Art for Cross-Modal Retrieval: A Survey
    Zhou, Kun
    Hassan, Fadratul Hafinaz
    Hoon, Gan Keng
    [J]. IEEE ACCESS, 2023, 11 : 138568 - 138589
  • [49] Special issue on cross-modal retrieval and analysis
    Wu, Jianlong
    Hong, Richang
    Tian, Qi
    [J]. INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2022, 11 (04) : 523 - 524
  • [50] Semantic consistency hashing for cross-modal retrieval
    Yao, Tao
    Kong, Xiangwei
    Fu, Haiyan
    Tian, Qi
    [J]. NEUROCOMPUTING, 2016, 193 : 250 - 259