Multi-Modal Contrastive Pre-training for Recommendation

被引：0

作者：

Liu, Zhuang ^{[1
]}

Ma, Yunpu ^{[2
]}

Schubert, Matthias ^{[2
]}

Ouyang, Yuanxin ^{[1
]}

Xiong, Zhang ^{[3
]}

机构：

[1] Beihang Univ, State Key Lab Software Dev Environm, Beijing, Peoples R China

[2] Ludwig Maximilians Univ Munchen, Lehrstuhl Datenbanksyst & Data Min, Munich, Germany

[3] Beihang Univ, Minist Educ, Engn Res Ctr Adv Comp Applicat Technol, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022 | 2022年

基金：

中国国家自然科学基金;

关键词：

Recommender system; Multi-modal side information; Contrastive learning; Pre-training model;

D O I：

10.1145/3512527.3531378

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Personalized recommendation plays a central role in various online applications. To provide quality recommendation service, it is of crucial importance to consider multi-modal information associated with users and items, e.g., review text, description text, and images. However, many existing approaches do not fully explore and fuse multiple modalities. To address this problem, we propose a multi-modal contrastive pre-training model for recommendation. We first construct a homogeneous item graph and a user graph based on the relationship of co-interaction. For users, we propose intra-modal aggregation and inter-modal aggregation to fuse review texts and the structural information of the user graph. For items, we consider three modalities: description text, images, and item graph. Moreover, the description text and image complement each other for the same item. One of them can be used as promising supervision for the other. Therefore, to capture this signal and better exploit the potential correlation of intra-modalities, we propose a self-supervised contrastive inter-modal alignment task to make the textual and visual modalities as similar as possible. Then, we apply inter-modal aggregation to obtain the multi-modal representation of items. Next, we employ a binary cross-entropy loss function to capture the potential correlation between users and items. Finally, we fine-tune the pre-trained multi-modal representations using an existing recommendation model. We have performed extensive experiments on three real-world datasets. Experimental results verify the rationality and effectiveness of the proposed method.

引用

页码：99 / 108

页数：10

共 50 条

[21] Multi-modal Masked Autoencoders for Medical Vision-and-Language Pre-training
Chen, Zhihong
Du, Yuhao
Hu, Jinpeng
Liu, Yang
Li, Guanbin
Wan, Xiang
Chang, Tsung-Hui
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT V, 2022, 13435 : 679 - 689
[22] MMCL-CPI: A multi-modal compound-protein interaction prediction model incorporating contrastive learning pre-training
Qian, Ying
Li, Xinyi
Wu, Jian
Zhang, Qian
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2024, 112
[23] Collaborative denoised graph contrastive learning for multi-modal recommendation
Xu, Fuyong
Zhu, Zhenfang
Fu, Yixin
Wang, Ru
Liu, Peiyu
INFORMATION SCIENCES, 2024, 679
[24] Contrastive Adversarial Training for Multi-Modal Machine Translation
Huang, Xin
Zhang, Jiajun
Zong, Chengqing
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (06)
[25] Multi-modal Pathological Pre-training via Masked Autoencoders for Breast Cancer Diagnosis
Lu, Mengkang
Wang, Tianyi
Xia, Yong
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VI, 2023, 14225 : 457 - 466
[26] A multi-modal pre-training transformer for universal transfer learning in metal–organic frameworks
Yeonghun Kang
Hyunsoo Park
Berend Smit
Jihan Kim
Nature Machine Intelligence, 2023, 5 : 309 - 318
[27] StEP: Style-based Encoder Pre-training for Multi-modal Image Synthesis
Meshry, Moustafa
Ren, Yixuan
Davis, Larry S.
Shrivastava, Abhinav
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3711 - 3720
[28] Advancing Fundus-Based Retinal Representations Through Multi-Modal Contrastive Pre-training for Detection of Glaucoma-Related Diseases
Guo, Yawen
Ng, Michelle
Yan, Xu
Hung, Calvin
Lam, Alexander
Leung, Christopher Kai-Shun
INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2024, 65 (07)
[29] MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for speech recognition
Zhou, Xiaohuan
Wang, Jiaming
Cui, Zeyu
Zhang, Shiliang
Yan, Zhijie
Zhou, Jingren
Zhou, Chang
INTERSPEECH 2023, 2023, : 4943 - 4947
[30] Multi-modal Graph Contrastive Learning for Micro-video Recommendation
Yi, Zixuan
Wang, Xi
Ounis, Iadh
Macdonald, Craig
PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 1807 - 1811

← 1 2 3 4 5 →