Survey of Deep Representation Learning for Speech Emotion Recognition

被引:45
|
作者
Latif, Siddique [1 ,2 ]
Rana, Rajib [1 ]
Khalifa, Sara [3 ,4 ,5 ]
Jurdak, Raja
Qadir, Junaid [6 ]
Schuller, Bjorn [7 ,8 ]
机构
[1] Univ Southern Queensland USQ, Springfield, Qld 4300, Australia
[2] Data61 CSIRO, Distributed Sensing Syst Grp, Pullenvale, Qld 4069, Australia
[3] Data61 CSIRO, Distributed Sensing Syst Grp, Pullenvale, Qld 4069, Australia
[4] Univ New South Wales, Sydney, NSW 2052, Australia
[5] Univ Queensland, St Lucia, Qld 4072, Australia
[6] Qatar Univ, Coll Engn, Dept Comp Sci & Engn, Doha, Qatar
[7] Imperial Coll London, Grp Language Audio & Mus, London SW7 2BX, England
[8] Univ Augsburg, Embedded Intelligence Hlth Care & Wellbeing, D-86159 Augsburg, Germany
关键词
Speech emotion recognition; multi task learning; representation learning; domain adaptation; unsupervised learning; COMPONENT ANALYSIS; LADDER NETWORKS; FEATURES; CORPUS; ADVERSARIAL; DIMENSIONALITY; ARCHITECTURES; CLASSIFIERS; ALGORITHM; DATABASES;
D O I
10.1109/TAFFC.2021.3114365
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditionally, speech emotion recognition (SER) research has relied on manually handcrafted acoustic features using feature engineering. However, the design of handcrafted features for complex SER tasks requires significant manual effort, which impedes generalisability and slows the pace of innovation. This has motivated the adoption of representation learning techniques that can automatically learn an intermediate representation of the input signal without any manual feature engineering. Representation learning has led to improved SER performance and enabled rapid innovation. Its effectiveness has further increased with advances in deep learning (DL), which has facilitated deep representation learning where hierarchical representations are automatically learned in a data-driven manner. This article presents the first comprehensive survey on the important topic of deep representation learning for SER. We highlight various techniques, related challenges and identify important future areas of research. Our survey bridges the gap in the literature since existing surveys either focus on SER with hand-engineered features or representation learning in the general setting without focusing on SER.
引用
收藏
页码:1634 / 1654
页数:21
相关论文
共 50 条
  • [1] A deep interpretable representation learning method for speech emotion recognition
    Jing, Erkang
    Liu, Yezheng
    Chai, Yidong
    Sun, Jianshan
    Samtani, Sagar
    Jiang, Yuanchun
    Qian, Yang
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (06)
  • [2] Representation Learning for Speech Emotion Recognition
    Ghosh, Sayan
    Laksana, Eugene
    Morency, Louis-Philippe
    Scherer, Stefan
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3603 - 3607
  • [3] Speech Emotion Recognition with Deep Learning
    Harar, Pavol
    Burget, Radim
    Dutta, Malay Kishore
    [J]. 2017 4TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2017, : 137 - 140
  • [4] SPEECH EMOTION RECOGNITION WITH LOCAL-GLOBAL AWARE DEEP REPRESENTATION LEARNING
    Liu, Jiaxing
    Liu, Zhilei
    Wang, Longbiao
    Guo, Lili
    Dang, Jianwu
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7174 - 7178
  • [5] Emotion Recognition in Speech with Deep Learning Architectures
    Erdal, Mehmet
    Kaechele, Markus
    Schwenker, Friedhelm
    [J]. ARTIFICIAL NEURAL NETWORKS IN PATTERN RECOGNITION, 2016, 9896 : 298 - 311
  • [6] Speech Emotion Recognition Using Deep Learning
    Alagusundari, N.
    Anuradha, R.
    [J]. ARTIFICIAL INTELLIGENCE: THEORY AND APPLICATIONS, VOL 1, AITA 2023, 2024, 843 : 313 - 325
  • [7] Speech Emotion Recognition Using Deep Learning
    Ahmed, Waqar
    Riaz, Sana
    Iftikhar, Khunsa
    Konur, Savas
    [J]. ARTIFICIAL INTELLIGENCE XL, AI 2023, 2023, 14381 : 191 - 197
  • [8] Towards Discriminative Representation Learning for Speech Emotion Recognition
    Li, Runnan
    Wu, Zhiyong
    Jia, Jia
    Bu, Yaohua
    Zhao, Sheng
    Meng, Helen
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 5060 - 5066
  • [9] Vector learning representation for generalized speech emotion recognition
    Singkul, Sattaya
    Woraratpanya, Kuntpong
    [J]. HELIYON, 2022, 8 (03)
  • [10] IMPROVING SPEECH EMOTION RECOGNITION WITH UNSUPERVISED REPRESENTATION LEARNING ON UNLABELED SPEECH
    Neumann, Michael
    Ngoc Thang Vu
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7390 - 7394