Multi-Modal Contrastive Pre-training for Recommendation

被引:0
|
作者
Liu, Zhuang [1 ]
Ma, Yunpu [2 ]
Schubert, Matthias [2 ]
Ouyang, Yuanxin [1 ]
Xiong, Zhang [3 ]
机构
[1] Beihang Univ, State Key Lab Software Dev Environm, Beijing, Peoples R China
[2] Ludwig Maximilians Univ Munchen, Lehrstuhl Datenbanksyst & Data Min, Munich, Germany
[3] Beihang Univ, Minist Educ, Engn Res Ctr Adv Comp Applicat Technol, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022 | 2022年
基金
中国国家自然科学基金;
关键词
Recommender system; Multi-modal side information; Contrastive learning; Pre-training model;
D O I
10.1145/3512527.3531378
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Personalized recommendation plays a central role in various online applications. To provide quality recommendation service, it is of crucial importance to consider multi-modal information associated with users and items, e.g., review text, description text, and images. However, many existing approaches do not fully explore and fuse multiple modalities. To address this problem, we propose a multi-modal contrastive pre-training model for recommendation. We first construct a homogeneous item graph and a user graph based on the relationship of co-interaction. For users, we propose intra-modal aggregation and inter-modal aggregation to fuse review texts and the structural information of the user graph. For items, we consider three modalities: description text, images, and item graph. Moreover, the description text and image complement each other for the same item. One of them can be used as promising supervision for the other. Therefore, to capture this signal and better exploit the potential correlation of intra-modalities, we propose a self-supervised contrastive inter-modal alignment task to make the textual and visual modalities as similar as possible. Then, we apply inter-modal aggregation to obtain the multi-modal representation of items. Next, we employ a binary cross-entropy loss function to capture the potential correlation between users and items. Finally, we fine-tune the pre-trained multi-modal representations using an existing recommendation model. We have performed extensive experiments on three real-world datasets. Experimental results verify the rationality and effectiveness of the proposed method.
引用
收藏
页码:99 / 108
页数:10
相关论文
共 50 条
  • [21] Multi-modal Masked Autoencoders for Medical Vision-and-Language Pre-training
    Chen, Zhihong
    Du, Yuhao
    Hu, Jinpeng
    Liu, Yang
    Li, Guanbin
    Wan, Xiang
    Chang, Tsung-Hui
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT V, 2022, 13435 : 679 - 689
  • [22] MMCL-CPI: A multi-modal compound-protein interaction prediction model incorporating contrastive learning pre-training
    Qian, Ying
    Li, Xinyi
    Wu, Jian
    Zhang, Qian
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2024, 112
  • [23] Collaborative denoised graph contrastive learning for multi-modal recommendation
    Xu, Fuyong
    Zhu, Zhenfang
    Fu, Yixin
    Wang, Ru
    Liu, Peiyu
    INFORMATION SCIENCES, 2024, 679
  • [24] Contrastive Adversarial Training for Multi-Modal Machine Translation
    Huang, Xin
    Zhang, Jiajun
    Zong, Chengqing
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (06)
  • [25] Multi-modal Pathological Pre-training via Masked Autoencoders for Breast Cancer Diagnosis
    Lu, Mengkang
    Wang, Tianyi
    Xia, Yong
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VI, 2023, 14225 : 457 - 466
  • [26] A multi-modal pre-training transformer for universal transfer learning in metal–organic frameworks
    Yeonghun Kang
    Hyunsoo Park
    Berend Smit
    Jihan Kim
    Nature Machine Intelligence, 2023, 5 : 309 - 318
  • [27] StEP: Style-based Encoder Pre-training for Multi-modal Image Synthesis
    Meshry, Moustafa
    Ren, Yixuan
    Davis, Larry S.
    Shrivastava, Abhinav
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3711 - 3720
  • [28] Advancing Fundus-Based Retinal Representations Through Multi-Modal Contrastive Pre-training for Detection of Glaucoma-Related Diseases
    Guo, Yawen
    Ng, Michelle
    Yan, Xu
    Hung, Calvin
    Lam, Alexander
    Leung, Christopher Kai-Shun
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2024, 65 (07)
  • [29] MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for speech recognition
    Zhou, Xiaohuan
    Wang, Jiaming
    Cui, Zeyu
    Zhang, Shiliang
    Yan, Zhijie
    Zhou, Jingren
    Zhou, Chang
    INTERSPEECH 2023, 2023, : 4943 - 4947
  • [30] Multi-modal Graph Contrastive Learning for Micro-video Recommendation
    Yi, Zixuan
    Wang, Xi
    Ounis, Iadh
    Macdonald, Craig
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 1807 - 1811