MULTI-STAGE GRAPH REPRESENTATION LEARNING FOR DIALOGUE-LEVEL SPEECH EMOTION RECOGNITION

被引:2
|
作者
Song, Yaodong [1 ]
Liu, Jiaxing [1 ]
Wang, Longbiao [1 ]
Yu, Ruiguo [1 ]
Dang, Jianwu [1 ,2 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Cognit Comp & Applicat, Tianjin, Peoples R China
[2] Japan Adv Inst Sci & Technol, Nomi, Ishikawa, Japan
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Speech emotion recognition; dialogue-level contextual information; utterance-level representation; double-constrained; atmosphere;
D O I
10.1109/ICASSP43922.2022.9746237
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
With the development of speech emotion recognition (SER), most of current research is utterance-level and cannot fit the need of actual scenarios. In this paper, we propose a novel strategy that focuses on capturing dialogue-level contextual information. On the basis of utterance-level representation learned by convolutional neural network (CNN) which is followed by the bidirectional long short-term memory network (BLSTM), the proposed dialogue-level method consists of two modules. The first module is Dialogue Multi-stage Graph Representation Learning Algorithm (DialogMSG). The multi-stage graph that modeling from different dialogue scope is introduced to capture more effective information. The other one is a double-constrained module. This module includes not only an utterance-level classifier but also a dialogue-level graph classifier which is named as Atmosphere. The results of extensive experiments show that the proposed method outperforms the current state of the art on the IEMOCAP benchmark dataset.
引用
收藏
页码:6432 / 6436
页数:5
相关论文
共 50 条
  • [1] Time-Frequency Representation Learning with Graph Convolutional Network for Dialogue-level Speech Emotion Recognition
    Liu, Jiaxing
    Song, Yaodong
    Wang, Longbiao
    Dang, Jianwu
    Yu, Ruiguo
    [J]. INTERSPEECH 2021, 2021, : 4523 - 4527
  • [2] Representation Learning for Speech Emotion Recognition
    Ghosh, Sayan
    Laksana, Eugene
    Morency, Louis-Philippe
    Scherer, Stefan
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3603 - 3607
  • [3] Multi-Stage Recognition of Speech Emotion Using Sequential Forward Feature Selection
    Liogiene, Tatjana
    Tamulevicius, Gintautas
    [J]. ELECTRICAL CONTROL AND COMMUNICATION ENGINEERING, 2016, 10 (01) : 35 - 41
  • [4] Multi-Stage Speech Enhancement for Automatic Speech Recognition
    Lee, Seungyeol
    Lee, Youngwoo
    Cho, Namgook
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2016,
  • [5] Towards Discriminative Representation Learning for Speech Emotion Recognition
    Li, Runnan
    Wu, Zhiyong
    Jia, Jia
    Bu, Yaohua
    Zhao, Sheng
    Meng, Helen
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 5060 - 5066
  • [6] Vector learning representation for generalized speech emotion recognition
    Singkul, Sattaya
    Woraratpanya, Kuntpong
    [J]. HELIYON, 2022, 8 (03)
  • [7] Survey of Deep Representation Learning for Speech Emotion Recognition
    Latif, Siddique
    Rana, Rajib
    Khalifa, Sara
    Jurdak, Raja
    Qadir, Junaid
    Schuller, Bjorn
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (02) : 1634 - 1654
  • [8] A graph based system for multi-stage attacks recognition
    Safaa O.Al-Mamory
    [J]. High Technology Letters, 2008, 14 (02) : 167 - 173
  • [9] Multi-Domain Based Dynamic Graph Representation Learning for EEG Emotion Recognition
    Tang, Hao
    Xie, Songyun
    Xie, Xinzhou
    Cui, Yujie
    Li, Bohan
    Zheng, Dalu
    Hao, Yu
    Wang, Xiangming
    Jiang, Yiye
    Tian, Zhongyu
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (09) : 5227 - 5238
  • [10] IMPROVING SPEECH EMOTION RECOGNITION WITH UNSUPERVISED REPRESENTATION LEARNING ON UNLABELED SPEECH
    Neumann, Michael
    Ngoc Thang Vu
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7390 - 7394