Survey of Deep Representation Learning for Speech Emotion Recognition

被引:45
|
作者
Latif, Siddique [1 ,2 ]
Rana, Rajib [1 ]
Khalifa, Sara [3 ,4 ,5 ]
Jurdak, Raja
Qadir, Junaid [6 ]
Schuller, Bjorn [7 ,8 ]
机构
[1] Univ Southern Queensland USQ, Springfield, Qld 4300, Australia
[2] Data61 CSIRO, Distributed Sensing Syst Grp, Pullenvale, Qld 4069, Australia
[3] Data61 CSIRO, Distributed Sensing Syst Grp, Pullenvale, Qld 4069, Australia
[4] Univ New South Wales, Sydney, NSW 2052, Australia
[5] Univ Queensland, St Lucia, Qld 4072, Australia
[6] Qatar Univ, Coll Engn, Dept Comp Sci & Engn, Doha, Qatar
[7] Imperial Coll London, Grp Language Audio & Mus, London SW7 2BX, England
[8] Univ Augsburg, Embedded Intelligence Hlth Care & Wellbeing, D-86159 Augsburg, Germany
关键词
Speech emotion recognition; multi task learning; representation learning; domain adaptation; unsupervised learning; COMPONENT ANALYSIS; LADDER NETWORKS; FEATURES; CORPUS; ADVERSARIAL; DIMENSIONALITY; ARCHITECTURES; CLASSIFIERS; ALGORITHM; DATABASES;
D O I
10.1109/TAFFC.2021.3114365
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditionally, speech emotion recognition (SER) research has relied on manually handcrafted acoustic features using feature engineering. However, the design of handcrafted features for complex SER tasks requires significant manual effort, which impedes generalisability and slows the pace of innovation. This has motivated the adoption of representation learning techniques that can automatically learn an intermediate representation of the input signal without any manual feature engineering. Representation learning has led to improved SER performance and enabled rapid innovation. Its effectiveness has further increased with advances in deep learning (DL), which has facilitated deep representation learning where hierarchical representations are automatically learned in a data-driven manner. This article presents the first comprehensive survey on the important topic of deep representation learning for SER. We highlight various techniques, related challenges and identify important future areas of research. Our survey bridges the gap in the literature since existing surveys either focus on SER with hand-engineered features or representation learning in the general setting without focusing on SER.
引用
收藏
页码:1634 / 1654
页数:21
相关论文
共 50 条
  • [21] Speech Emotion Recognition Using Deep Learning Techniques: A Review
    Khalil, Ruhul Amin
    Jones, Edward
    Babar, Mohammad Inayatullah
    Jan, Tariqullah
    Zafar, Mohammad Haseeb
    Alhussain, Thamer
    [J]. IEEE ACCESS, 2019, 7 : 117327 - 117345
  • [22] Deep Learning Based Emotion Recognition from Chinese Speech
    Zhang, Weishan
    Zhao, Dehai
    Chen, Xiufeng
    Zhang, Yuanjie
    [J]. INCLUSIVE SMART CITIES AND DIGITAL HEALTH, 2016, 9677 : 49 - 58
  • [23] Data Augmentation Techniques for Speech Emotion Recognition and Deep Learning
    Antonio Nicolas, Jose
    de Lope, Javier
    Grana, Manuel
    [J]. BIO-INSPIRED SYSTEMS AND APPLICATIONS: FROM ROBOTICS TO AMBIENT INTELLIGENCE, PT II, 2022, 13259 : 279 - 288
  • [24] Feature Fusion of Speech Emotion Recognition Based on Deep Learning
    Liu, Gang
    He, Wei
    Jin, Bicheng
    [J]. PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC), 2018, : 193 - 197
  • [25] Emotion recognition from speech using deep learning on spectrograms
    Li, Xingguang
    Song, Wenjun
    Liang, Zonglin
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (03) : 2791 - 2796
  • [26] Speech Emotion Recognition Using Deep Learning on audio recordings
    Suganya, S.
    Charles, E. Y. A.
    [J]. 2019 19TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER - 2019), 2019,
  • [27] Transfer Learning of Deep Neural Network for Speech Emotion Recognition
    Huang, Ying
    Hu, Mingqing
    Yu, Xianguo
    Wang, Tao
    Yang, Chen
    [J]. PATTERN RECOGNITION (CCPR 2016), PT II, 2016, 663 : 721 - 729
  • [28] Deep Learning Based Emotion Recognition and Visualization of Figural Representation
    Lu, Xiaofeng
    [J]. FRONTIERS IN PSYCHOLOGY, 2022, 12
  • [29] An Attention Pooling based Representation Learning Method for Speech Emotion Recognition
    Li, Pengcheng
    Song, Yan
    McLoughlin, Ian
    Guo, Wu
    Dai, Lirong
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3087 - 3091
  • [30] Unsupervised Representation Learning with Future Observation Prediction for Speech Emotion Recognition
    Lian, Zheng
    Tao, Jianhua
    Liu, Bin
    Huang, Jian
    [J]. INTERSPEECH 2019, 2019, : 3840 - 3844