Cross-Modality Personalization for Retrieval

被引:10
|
作者
Murrugarra-Llerena, Nils [1 ]
Kovashka, Adriana [1 ]
机构
[1] Univ Pittsburgh, Dept Comp Sci, Pittsburgh, PA 15260 USA
基金
美国国家科学基金会;
关键词
D O I
10.1109/CVPR.2019.00659
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing captioning and gaze prediction approaches do not consider the multiple facets of personality that affect how a viewer extracts meaning from an image. While there are methods that consider personalized captioning, they do not consider personalized perception across modalities, i.e. how a person's way of looking at an image (gaze) affects the way they describe it (captioning). In this work, we propose a model for modeling cross-modality personalized retrieval. In addition to modeling gaze and captions, we also explicitly model the personality of the users providing these samples. We incorporate constraints that encourage gaze and caption samples on the same image to be close in a learned space; we refer to this as content modeling. We also model style: we encourage samples provided by the same user to be close in a separate embedding space, regardless of the image on which they were provided. To leverage the complementary information that content and style constraints provide, we combine the embeddings from both networks. We show that our combined embeddings achieve better performance than existing approaches for cross-modal retrieval.
引用
收藏
页码:6422 / 6431
页数:10
相关论文
共 50 条
  • [1] Cross-Modality Retrieval by Joint Correlation Learning
    Wang, Shuo
    Guo, Dan
    Xu, Xin
    Zhuo, Li
    Wang, Meng
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (02)
  • [2] CROSS-MODALITY CORRELATION PROPAGATION FOR CROSS-MEDIA RETRIEVAL
    Zhai, Xiaohua
    Peng, Yuxin
    Xiao, Jianguo
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 2337 - 2340
  • [3] Cross-Modality Medical Image Retrieval with Deep Features
    Mbilinyi, Ashery
    Schuldt, Heiko
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 2632 - 2639
  • [4] Sequential Discrete Hashing for Scalable Cross-Modality Similarity Retrieval
    Liu, Li
    Lin, Zijia
    Shao, Ling
    Shen, Fumin
    Ding, Guiguang
    Han, Jungong
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (01) : 107 - 118
  • [5] CROSS-MODALITY MATCHING
    AUERBACH, C
    [J]. QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 1973, 25 (NOV): : 492 - 495
  • [6] A unified framework for cross-modality 3D model retrieval
    Hao, Tong
    Wang, Qian
    Wu, Dan
    Sun, Jin-Sheng
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (19) : 20217 - 20230
  • [7] A unified framework for cross-modality 3D model retrieval
    Tong Hao
    Qian Wang
    Dan Wu
    Jin-Sheng Sun
    [J]. Multimedia Tools and Applications, 2017, 76 : 20217 - 20230
  • [8] Dense Auto-Encoder Hashing for Robust Cross-Modality Retrieval
    Liu, Hong
    Lin, Mingbao
    Zhang, Shengchuan
    Wu, Yongjian
    Huang, Feiyue
    Ji, Rongrong
    [J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 1589 - 1597
  • [9] Image-to-Point Registration via Cross-Modality Correspondence Retrieval
    Bie, Lin
    Li, Siqi
    Cheng, Kai
    [J]. PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 266 - 274
  • [10] Cross-Modality Knowledge Calibration Network for Video Corpus Moment Retrieval
    Chen, Tongbao
    Wang, Wenmin
    Jiang, Zhe
    Li, Ruochen
    Wang, Bingshu
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 3799 - 3813