Real-time Emotion Pre-Recognition in Conversations with Contrastive Multi-modal Dialogue Pre-training

被引:0
|
作者
Ju, Xincheng [1 ]
Zhang, Dong [1 ]
Zhu, Suyang [1 ]
Li, Junhui [1 ]
Li, Shoushan [1 ]
Zhou, Guodong [1 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou, Jiangsu, Peoples R China
关键词
multi-modal; emotion pre-recognition; contrastive learning; conversations;
D O I
10.1145/3583780.3615024
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents our pioneering effort in addressing a newand realistic scenario in multi-modal dialogue systems calledMulti-modal Real-time Emotion Pre-recognition in Conversations (MREPC). The objective is to predict the emotion of a forthcoming target utterance that is highly likely to occur. We believe that this task can enhance the dialogue system's understanding of the interlocutor's state of mind, enabling it to prepare an appropriate response in advance. However, addressing MREPC poses the following challenges: 1) Previous studies on emotion elicitation typically focus on textual modality and perform sentiment forecasting within a fixed contextual scenario. 2) Previous studies on multi-modal emotion recognition aim to predict the emotion of existing utterances, making it difficult to extend these approaches to MREPC due to the absence of the target utterance. To tackle these challenges, we construct two benchmark multi-modal datasets for MREPC and propose a task-specific multi-modal contrastive pre-training approach(1). This approach leverages large-scale unlabeled multi-modal dialogues to facilitate emotion pre-recognition for potential utterances of specific target speakers. Through detailed experiments and extensive analysis, we demonstrate that our proposed multimodal contrastive pre-training architecture effectively enhances the performance of multi-modal real-time emotion pre-recognition in conversations.
引用
收藏
页码:1045 / 1055
页数:11
相关论文
共 50 条
  • [31] Multi-modal U-Nets with Boundary Loss and Pre-training for Brain Tumor Segmentation
    Lorenzo, Pablo Ribalta
    Marcinkiewicz, Michal
    Nalepa, Jakub
    [J]. BRAINLESION: GLIOMA, MULTIPLE SCLEROSIS, STROKE AND TRAUMATIC BRAIN INJURIES (BRAINLES 2019), PT II, 2020, 11993 : 135 - 147
  • [32] RAMM: Retrieval-augmented Biomedical Visual Question Answering with Multi-modal Pre-training
    Yuan, Zheng
    Jin, Qiao
    Tan, Chuanqi
    Zhao, Zhengyun
    Yuan, Hongyi
    Huang, Fei
    Huang, Songfang
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 547 - 556
  • [33] A multi-modal pre-training transformer for universal transfer learning in metal-organic frameworks
    Kang, Yeonghun
    Park, Hyunsoo
    Smit, Berend
    Kim, Jihan
    [J]. NATURE MACHINE INTELLIGENCE, 2023, 5 (03) : 309 - 318
  • [34] Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
    Su, Weijie
    Zhu, Xizhou
    Tao, Chenxin
    Lu, Lewei
    Li, Bin
    Huang, Gao
    Qiao, Yu
    Wang, Xiaogang
    Zhou, Jie
    Dai, Jifeng
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15888 - 15899
  • [35] A Multi-view Molecular Pre-training with Generative Contrastive Learning
    Liu, Yunwu
    Zhang, Ruisheng
    Yuan, Yongna
    Ma, Jun
    Li, Tongfeng
    Yu, Zhixuan
    [J]. INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2024, 16 (03) : 741 - 754
  • [36] Structure Aware Multi-Graph Network for Multi-Modal Emotion Recognition in Conversations
    Zhang, Duzhen
    Chen, Feilong
    Chang, Jianlong
    Chen, Xiuyi
    Tian, Qi
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 3987 - 3997
  • [37] Vision Language Pre-training by Contrastive Learning with Cross-Modal Similarity Regulation
    Jiang, Chaoya
    Ye, Wei
    Xu, Haiyang
    Huang, Songfang
    Huang, Fei
    Zhang, Shikun
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14660 - 14679
  • [38] GUIDED CONTRASTIVE SELF-SUPERVISED PRE-TRAINING FOR AUTOMATIC SPEECH RECOGNITION
    Khare, Aparna
    Wu, Minhua
    Bhati, Saurabhchand
    Droppo, Jasha
    Maas, Roland
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 174 - 181
  • [39] Multi-Modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training
    Moon, Jong Hak
    Lee, Hyungyung
    Shin, Woncheol
    Kim, Young-Hak
    Choi, Edward
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (12) : 6070 - 6080
  • [40] WUKONG- READER: Multi-modal Pre-training for Fine-grained Visual Document Understanding
    Bai, Haoli
    Liu, Zhiguang
    Meng, Xiaojun
    Li, Wentao
    Liu, Shuang
    Luo, Yifeng
    Xie, Nian
    Zheng, Rongfu
    Wang, Liangwei
    Hou, Lu
    Wei, Jiansheng
    Jiang, Xin
    Liu, Qun
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 13386 - 13401