Real-time Emotion Pre-Recognition in Conversations with Contrastive Multi-modal Dialogue Pre-training

被引:0
|
作者
Ju, Xincheng [1 ]
Zhang, Dong [1 ]
Zhu, Suyang [1 ]
Li, Junhui [1 ]
Li, Shoushan [1 ]
Zhou, Guodong [1 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou, Jiangsu, Peoples R China
关键词
multi-modal; emotion pre-recognition; contrastive learning; conversations;
D O I
10.1145/3583780.3615024
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents our pioneering effort in addressing a newand realistic scenario in multi-modal dialogue systems calledMulti-modal Real-time Emotion Pre-recognition in Conversations (MREPC). The objective is to predict the emotion of a forthcoming target utterance that is highly likely to occur. We believe that this task can enhance the dialogue system's understanding of the interlocutor's state of mind, enabling it to prepare an appropriate response in advance. However, addressing MREPC poses the following challenges: 1) Previous studies on emotion elicitation typically focus on textual modality and perform sentiment forecasting within a fixed contextual scenario. 2) Previous studies on multi-modal emotion recognition aim to predict the emotion of existing utterances, making it difficult to extend these approaches to MREPC due to the absence of the target utterance. To tackle these challenges, we construct two benchmark multi-modal datasets for MREPC and propose a task-specific multi-modal contrastive pre-training approach(1). This approach leverages large-scale unlabeled multi-modal dialogues to facilitate emotion pre-recognition for potential utterances of specific target speakers. Through detailed experiments and extensive analysis, we demonstrate that our proposed multimodal contrastive pre-training architecture effectively enhances the performance of multi-modal real-time emotion pre-recognition in conversations.
引用
收藏
页码:1045 / 1055
页数:11
相关论文
共 50 条
  • [1] Multi-Modal Contrastive Pre-training for Recommendation
    Liu, Zhuang
    Ma, Yunpu
    Schubert, Matthias
    Ouyang, Yuanxin
    Xiong, Zhang
    [J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 99 - 108
  • [2] MULTI-MODAL PRE-TRAINING FOR AUTOMATED SPEECH RECOGNITION
    Chan, David M.
    Ghosh, Shalini
    Chakrabarty, Debmalya
    Hoffmeister, Bjorn
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 246 - 250
  • [3] TableVLM: Multi-modal Pre-training for Table Structure Recognition
    Chen, Leiyuan
    Huang, Chengsong
    Zheng, Xiaoqing
    Lin, Jinshu
    Huang, Xuanjing
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 2437 - 2449
  • [4] PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts
    Li, Yunshui
    Hui, Binyuan
    Yin, Zhichao
    Yang, Min
    Huang, Fei
    Li, Yongbin
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 13402 - 13416
  • [5] CLAP: Contrastive Language-Audio Pre-training Model for Multi-modal Sentiment Analysis
    Zhao, Tianqi
    Kong, Ming
    Liang, Tian
    Zhu, Qiang
    Kuang, Kun
    Wu, Fei
    [J]. PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 622 - 626
  • [6] MGeo: Multi-Modal Geographic Language Model Pre-Training
    Ding, Ruixue
    Chen, Boli
    Xie, Pengjun
    Huang, Fei
    Li, Xin
    Zhang, Qiang
    Xu, Yao
    [J]. PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 185 - 194
  • [7] Dynamic facial expression recognition with pseudo-label guided multi-modal pre-training
    Yin, Bing
    Yin, Shi
    Liu, Cong
    Zhang, Yanyong
    Xi, Changfeng
    Yin, Baocai
    Ling, Zhenhua
    [J]. IET COMPUTER VISION, 2024, 18 (01) : 33 - 45
  • [8] Multi-modal Masked Pre-training for Monocular Panoramic Depth Completion
    Yan, Zhiqiang
    Li, Xiang
    Wang, Kun
    Zhang, Zhenyu
    Li, Jun
    Yang, Jian
    [J]. COMPUTER VISION - ECCV 2022, PT I, 2022, 13661 : 378 - 395
  • [9] Versatile Multi-Modal Pre-Training for Human-Centric Perception
    Hong, Fangzhou
    Pan, Liang
    Cai, Zhongang
    Liu, Ziwei
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16135 - 16145
  • [10] WenLan: Efficient Large-Scale Multi-Modal Pre-Training on Real World Data
    Song, Ruihua
    [J]. MMPT '21: PROCEEDINGS OF THE 2021 WORKSHOP ON MULTI-MODAL PRE-TRAINING FOR MULTIMEDIA UNDERSTANDING, 2021, : 3 - 3