Real-time Emotion Pre-Recognition in Conversations with Contrastive Multi-modal Dialogue Pre-training

被引：0

作者：

Ju, Xincheng ^{[1
]}

Zhang, Dong ^{[1
]}

Zhu, Suyang ^{[1
]}

Li, Junhui ^{[1
]}

Li, Shoushan ^{[1
]}

Zhou, Guodong ^{[1
]}

机构：

[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou, Jiangsu, Peoples R China

来源：

PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023 | 2023年

关键词：

multi-modal; emotion pre-recognition; contrastive learning; conversations;

D O I：

10.1145/3583780.3615024

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents our pioneering effort in addressing a newand realistic scenario in multi-modal dialogue systems calledMulti-modal Real-time Emotion Pre-recognition in Conversations (MREPC). The objective is to predict the emotion of a forthcoming target utterance that is highly likely to occur. We believe that this task can enhance the dialogue system's understanding of the interlocutor's state of mind, enabling it to prepare an appropriate response in advance. However, addressing MREPC poses the following challenges: 1) Previous studies on emotion elicitation typically focus on textual modality and perform sentiment forecasting within a fixed contextual scenario. 2) Previous studies on multi-modal emotion recognition aim to predict the emotion of existing utterances, making it difficult to extend these approaches to MREPC due to the absence of the target utterance. To tackle these challenges, we construct two benchmark multi-modal datasets for MREPC and propose a task-specific multi-modal contrastive pre-training approach(1). This approach leverages large-scale unlabeled multi-modal dialogues to facilitate emotion pre-recognition for potential utterances of specific target speakers. Through detailed experiments and extensive analysis, we demonstrate that our proposed multimodal contrastive pre-training architecture effectively enhances the performance of multi-modal real-time emotion pre-recognition in conversations.

引用

页码：1045 / 1055

页数：11

共 50 条

[1] Multi-Modal Contrastive Pre-training for Recommendation
Liu, Zhuang
Ma, Yunpu
Schubert, Matthias
Ouyang, Yuanxin
Xiong, Zhang
[J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 99 - 108
[2] MULTI-MODAL PRE-TRAINING FOR AUTOMATED SPEECH RECOGNITION
Chan, David M.
Ghosh, Shalini
Chakrabarty, Debmalya
Hoffmeister, Bjorn
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 246 - 250
[3] TableVLM: Multi-modal Pre-training for Table Structure Recognition
Chen, Leiyuan
Huang, Chengsong
Zheng, Xiaoqing
Lin, Jinshu
Huang, Xuanjing
[J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 2437 - 2449
[4] PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts
Li, Yunshui
Hui, Binyuan
Yin, Zhichao
Yang, Min
Huang, Fei
Li, Yongbin
[J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 13402 - 13416
[5] CLAP: Contrastive Language-Audio Pre-training Model for Multi-modal Sentiment Analysis
Zhao, Tianqi
Kong, Ming
Liang, Tian
Zhu, Qiang
Kuang, Kun
Wu, Fei
[J]. PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 622 - 626
[6] MGeo: Multi-Modal Geographic Language Model Pre-Training
Ding, Ruixue
Chen, Boli
Xie, Pengjun
Huang, Fei
Li, Xin
Zhang, Qiang
Xu, Yao
[J]. PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 185 - 194
[7] Dynamic facial expression recognition with pseudo-label guided multi-modal pre-training
Yin, Bing
Yin, Shi
Liu, Cong
Zhang, Yanyong
Xi, Changfeng
Yin, Baocai
Ling, Zhenhua
[J]. IET COMPUTER VISION, 2024, 18 (01) : 33 - 45
[8] Multi-modal Masked Pre-training for Monocular Panoramic Depth Completion
Yan, Zhiqiang
Li, Xiang
Wang, Kun
Zhang, Zhenyu
Li, Jun
Yang, Jian
[J]. COMPUTER VISION - ECCV 2022, PT I, 2022, 13661 : 378 - 395
[9] Versatile Multi-Modal Pre-Training for Human-Centric Perception
Hong, Fangzhou
Pan, Liang
Cai, Zhongang
Liu, Ziwei
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16135 - 16145
[10] WenLan: Efficient Large-Scale Multi-Modal Pre-Training on Real World Data
Song, Ruihua
[J]. MMPT '21: PROCEEDINGS OF THE 2021 WORKSHOP ON MULTI-MODAL PRE-TRAINING FOR MULTIMEDIA UNDERSTANDING, 2021, : 3 - 3

← 1 2 3 4 5 →