CTNet: Conversational Transformer Network for Emotion Recognition

被引:146
|
作者
Lian, Zheng [1 ,2 ]
Liu, Bin [1 ,2 ]
Tao, Jianhua [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Natl Lab Pattern Recognit, Inst Automat, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100190, Peoples R China
[3] CAS Ctr Excellence Brain Sci & Intelligence Techn, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
Emotion recognition; Context modeling; Feature extraction; Fuses; Speech processing; Data models; Bidirectional control; Context-sensitive modeling; conversational transformer network (CTNet); conversational emotion recognition; multimodal fusion; speaker-sensitive modeling;
D O I
10.1109/TASLP.2021.3049898
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Emotion recognition in conversation is a crucial topic for its widespread applications in the field of human-computer interactions. Unlike vanilla emotion recognition of individual utterances, conversational emotion recognition requires modeling both context-sensitive and speaker-sensitive dependencies. Despite the promising results of recent works, they generally do not leverage advanced fusion techniques to generate the multimodal representations of an utterance. In this way, they have limitations in modeling the intra-modal and cross-modal interactions. In order to address these problems, we propose a multimodal learning framework for conversational emotion recognition, called conversational transformer network (CTNet). Specifically, we propose to use the transformer-based structure to model intra-modal and cross-modal interactions among multimodal features. Meanwhile, we utilize word-level lexical features and segment-level acoustic features as the inputs, thus enabling us to capture temporal information in the utterance. Additionally, to model context-sensitive and speaker-sensitive dependencies, we propose to use the multi-head attention based bi-directional GRU component and speaker embeddings. Experimental results on the IEMOCAP and MELD datasets demonstrate the effectiveness of the proposed method. Our method shows an absolute 2.1 similar to 6.2% performance improvement on weighted average F1 over state-of-the-art strategies.
引用
收藏
页码:985 / 1000
页数:16
相关论文
共 50 条
  • [31] SMIN: Semi-Supervised Multi-Modal Interaction Network for Conversational Emotion Recognition
    Lian, Zheng
    Liu, Bin
    Tao, Jianhua
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 2415 - 2429
  • [32] Fine-tuned RoBERTa Model with a CNN-LSTM Network for Conversational Emotion Recognition
    Luo, Jiachen
    Phan, Huy
    Reiss, Joshua
    INTERSPEECH 2023, 2023, : 2413 - 2417
  • [33] PGIF: A Personality-Guided Iterative Feedback Graph Network for Multimodal Conversational Emotion Recognition
    Xie, Yunhe
    Mao, Rui
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2025,
  • [34] A multiturn complementary generative framework for conversational emotion recognition
    Wang, Lifang
    Li, Ronghan
    Wu, Yuxin
    Jiang, Zejun
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (09) : 5643 - 5671
  • [35] A Unified Biosensor-Vision Multi-Modal Transformer network for emotion recognition
    Ali, Kamran
    Hughes, Charles E.
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 102
  • [36] TC-Net: A Transformer Capsule Network for EEG-based emotion recognition
    Wei, Yi
    Liu, Yu
    Li, Chang
    Cheng, Juan
    Song, Rencheng
    Chen, Xun
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 152
  • [37] A NOVEL END-TO-END SPEECH EMOTION RECOGNITION NETWORK WITH STACKED TRANSFORMER LAYERS
    Wang, Xianfeng
    Wang, Min
    Qi, Wenbo
    Su, Wanqi
    Wang, Xiangqian
    Zhou, Huan
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6289 - 6293
  • [38] Transformer-Based Potential Emotional Relation Mining Network for Emotion Recognition in Conversation
    Shi, Yunwei
    Sun, Xiao
    MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2022, 2023, 1765 : 238 - 251
  • [39] A Transformer Convolutional Network With the Method of Image Segmentation for EEG-Based Emotion Recognition
    Zhang, Xinyiy
    Cheng, Xiankai
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 401 - 405
  • [40] EEG emotion recognition using attention-based convolutional transformer neural network
    Gong, Linlin
    Li, Mingyang
    Zhang, Tao
    Chen, Wanzhong
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 84