Real-time Emotion Pre-Recognition in Conversations with Contrastive Multi-modal Dialogue Pre-training

被引：0

作者：

Ju, Xincheng ^{[1
]}

Zhang, Dong ^{[1
]}

Zhu, Suyang ^{[1
]}

Li, Junhui ^{[1
]}

Li, Shoushan ^{[1
]}

Zhou, Guodong ^{[1
]}

机构：

[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou, Jiangsu, Peoples R China

来源：

PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023 | 2023年

关键词：

multi-modal; emotion pre-recognition; contrastive learning; conversations;

D O I：

10.1145/3583780.3615024

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents our pioneering effort in addressing a newand realistic scenario in multi-modal dialogue systems calledMulti-modal Real-time Emotion Pre-recognition in Conversations (MREPC). The objective is to predict the emotion of a forthcoming target utterance that is highly likely to occur. We believe that this task can enhance the dialogue system's understanding of the interlocutor's state of mind, enabling it to prepare an appropriate response in advance. However, addressing MREPC poses the following challenges: 1) Previous studies on emotion elicitation typically focus on textual modality and perform sentiment forecasting within a fixed contextual scenario. 2) Previous studies on multi-modal emotion recognition aim to predict the emotion of existing utterances, making it difficult to extend these approaches to MREPC due to the absence of the target utterance. To tackle these challenges, we construct two benchmark multi-modal datasets for MREPC and propose a task-specific multi-modal contrastive pre-training approach(1). This approach leverages large-scale unlabeled multi-modal dialogues to facilitate emotion pre-recognition for potential utterances of specific target speakers. Through detailed experiments and extensive analysis, we demonstrate that our proposed multimodal contrastive pre-training architecture effectively enhances the performance of multi-modal real-time emotion pre-recognition in conversations.

引用

页码：1045 / 1055

页数：11

共 50 条

[31] Multi-modal U-Nets with Boundary Loss and Pre-training for Brain Tumor Segmentation
Lorenzo, Pablo Ribalta
Marcinkiewicz, Michal
Nalepa, Jakub
[J]. BRAINLESION: GLIOMA, MULTIPLE SCLEROSIS, STROKE AND TRAUMATIC BRAIN INJURIES (BRAINLES 2019), PT II, 2020, 11993 : 135 - 147
[32] RAMM: Retrieval-augmented Biomedical Visual Question Answering with Multi-modal Pre-training
Yuan, Zheng
Jin, Qiao
Tan, Chuanqi
Zhao, Zhengyun
Yuan, Hongyi
Huang, Fei
Huang, Songfang
[J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 547 - 556
[33] A multi-modal pre-training transformer for universal transfer learning in metal-organic frameworks
Kang, Yeonghun
Park, Hyunsoo
Smit, Berend
Kim, Jihan
[J]. NATURE MACHINE INTELLIGENCE, 2023, 5 (03) : 309 - 318
[34] Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
Su, Weijie
Zhu, Xizhou
Tao, Chenxin
Lu, Lewei
Li, Bin
Huang, Gao
Qiao, Yu
Wang, Xiaogang
Zhou, Jie
Dai, Jifeng
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15888 - 15899
[35] A Multi-view Molecular Pre-training with Generative Contrastive Learning
Liu, Yunwu
Zhang, Ruisheng
Yuan, Yongna
Ma, Jun
Li, Tongfeng
Yu, Zhixuan
[J]. INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2024, 16 (03) : 741 - 754
[36] Structure Aware Multi-Graph Network for Multi-Modal Emotion Recognition in Conversations
Zhang, Duzhen
Chen, Feilong
Chang, Jianlong
Chen, Xiuyi
Tian, Qi
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 3987 - 3997
[37] Vision Language Pre-training by Contrastive Learning with Cross-Modal Similarity Regulation
Jiang, Chaoya
Ye, Wei
Xu, Haiyang
Huang, Songfang
Huang, Fei
Zhang, Shikun
[J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14660 - 14679
[38] GUIDED CONTRASTIVE SELF-SUPERVISED PRE-TRAINING FOR AUTOMATIC SPEECH RECOGNITION
Khare, Aparna
Wu, Minhua
Bhati, Saurabhchand
Droppo, Jasha
Maas, Roland
[J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 174 - 181
[39] Multi-Modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training
Moon, Jong Hak
Lee, Hyungyung
Shin, Woncheol
Kim, Young-Hak
Choi, Edward
[J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (12) : 6070 - 6080
[40] WUKONG- READER: Multi-modal Pre-training for Fine-grained Visual Document Understanding
Bai, Haoli
Liu, Zhiguang
Meng, Xiaojun
Li, Wentao
Liu, Shuang
Luo, Yifeng
Xie, Nian
Zheng, Rongfu
Wang, Liangwei
Hou, Lu
Wei, Jiansheng
Jiang, Xin
Liu, Qun
[J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 13386 - 13401

← 1 2 3 4 5 →