CTNet: Conversational Transformer Network for Emotion Recognition

被引:146
|
作者
Lian, Zheng [1 ,2 ]
Liu, Bin [1 ,2 ]
Tao, Jianhua [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Natl Lab Pattern Recognit, Inst Automat, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100190, Peoples R China
[3] CAS Ctr Excellence Brain Sci & Intelligence Techn, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
Emotion recognition; Context modeling; Feature extraction; Fuses; Speech processing; Data models; Bidirectional control; Context-sensitive modeling; conversational transformer network (CTNet); conversational emotion recognition; multimodal fusion; speaker-sensitive modeling;
D O I
10.1109/TASLP.2021.3049898
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Emotion recognition in conversation is a crucial topic for its widespread applications in the field of human-computer interactions. Unlike vanilla emotion recognition of individual utterances, conversational emotion recognition requires modeling both context-sensitive and speaker-sensitive dependencies. Despite the promising results of recent works, they generally do not leverage advanced fusion techniques to generate the multimodal representations of an utterance. In this way, they have limitations in modeling the intra-modal and cross-modal interactions. In order to address these problems, we propose a multimodal learning framework for conversational emotion recognition, called conversational transformer network (CTNet). Specifically, we propose to use the transformer-based structure to model intra-modal and cross-modal interactions among multimodal features. Meanwhile, we utilize word-level lexical features and segment-level acoustic features as the inputs, thus enabling us to capture temporal information in the utterance. Additionally, to model context-sensitive and speaker-sensitive dependencies, we propose to use the multi-head attention based bi-directional GRU component and speaker embeddings. Experimental results on the IEMOCAP and MELD datasets demonstrate the effectiveness of the proposed method. Our method shows an absolute 2.1 similar to 6.2% performance improvement on weighted average F1 over state-of-the-art strategies.
引用
收藏
页码:985 / 1000
页数:16
相关论文
共 50 条
  • [21] Speech emotion recognition using the novel SwinEmoNet (Shifted Window Transformer Emotion Network)
    Ramesh R.
    Prahaladhan V.B.
    Nithish P.
    Mohanaprasad K.
    International Journal of Speech Technology, 2024, 27 (03) : 551 - 568
  • [22] EmoCaps: Emotion Capsule based Model for Conversational Emotion Recognition
    Li, Zaijing
    Tang, Fengxiao
    Zhao, Ming
    Zhu, Yusen
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1610 - 1618
  • [23] Bi-Branch Vision Transformer Network for EEG Emotion Recognition
    Lu, Wei
    Tan, Tien-Ping
    Ma, Hua
    IEEE ACCESS, 2023, 11 : 36233 - 36243
  • [24] BEYOND ISOLATED UTTERANCES: CONVERSATIONAL EMOTION RECOGNITION
    Pappagari, Raghavendra
    Zelasko, Piotr
    Villalba, Jesus
    Moro-Velazquez, Laureano
    Dehak, Najim
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 39 - 46
  • [25] CTNet: a convolutional transformer network for EEG-based motor imagery classification
    Zhao, Wei
    Jiang, Xiaolu
    Zhang, Baocan
    Xiao, Shixiao
    Weng, Sujun
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [26] Topic-Enriched Variational Transformer for Conversational Emotion Detection
    Luo, Jiamin
    Wang, Jingjing
    Zhou, Guodong
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT V, NLPCC 2024, 2025, 15363 : 3 - 15
  • [27] TANTP: Conversational Emotion Recognition Using Tree-Based Attention Networks with Transformer Pre-training
    Liu, Haozhe
    Lin, Hongzhan
    Chen, Guang
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, PT II, 2021, 12713 : 730 - 742
  • [28] CIT-EmotionNet: convolution interactive transformer network for EEG emotion recognition
    Lu, Wei
    Xia, Lingnan
    Tan, Tien Ping
    Ma, Hua
    PeerJ Computer Science, 2024, 10
  • [29] A Transformer based neural network for emotion recognition and visualizations of crucial EEG channels
    Guo, Jia-Yi
    Cai, Qing
    An, Jian-Peng
    Chen, Pei-Yin
    Ma, Chao
    Wan, Jun-He
    Gao, Zhong-Ke
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2022, 603
  • [30] CIT-EmotionNet: convolution interactive transformer network for EEG emotion recognition
    Lu, Wei
    Xia, Lingnan
    Tan, Tien Ping
    Ma, Hua
    PEERJ COMPUTER SCIENCE, 2024, 10