Survey of Deep Representation Learning for Speech Emotion Recognition

被引：45

作者：

Latif, Siddique ^{[1
,2
]}

Rana, Rajib ^{[1
]}

Khalifa, Sara ^{[3
,4
,5
]}

Jurdak, Raja

Qadir, Junaid ^{[6
]}

Schuller, Bjorn ^{[7
,8
]}

机构：

[1] Univ Southern Queensland USQ, Springfield, Qld 4300, Australia

[2] Data61 CSIRO, Distributed Sensing Syst Grp, Pullenvale, Qld 4069, Australia

[3] Data61 CSIRO, Distributed Sensing Syst Grp, Pullenvale, Qld 4069, Australia

[4] Univ New South Wales, Sydney, NSW 2052, Australia

[5] Univ Queensland, St Lucia, Qld 4072, Australia

[6] Qatar Univ, Coll Engn, Dept Comp Sci & Engn, Doha, Qatar

[7] Imperial Coll London, Grp Language Audio & Mus, London SW7 2BX, England

[8] Univ Augsburg, Embedded Intelligence Hlth Care & Wellbeing, D-86159 Augsburg, Germany

来源：

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING | 2023年 / 14卷 / 02期

关键词：

Speech emotion recognition; multi task learning; representation learning; domain adaptation; unsupervised learning; COMPONENT ANALYSIS; LADDER NETWORKS; FEATURES; CORPUS; ADVERSARIAL; DIMENSIONALITY; ARCHITECTURES; CLASSIFIERS; ALGORITHM; DATABASES;

D O I：

10.1109/TAFFC.2021.3114365

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Traditionally, speech emotion recognition (SER) research has relied on manually handcrafted acoustic features using feature engineering. However, the design of handcrafted features for complex SER tasks requires significant manual effort, which impedes generalisability and slows the pace of innovation. This has motivated the adoption of representation learning techniques that can automatically learn an intermediate representation of the input signal without any manual feature engineering. Representation learning has led to improved SER performance and enabled rapid innovation. Its effectiveness has further increased with advances in deep learning (DL), which has facilitated deep representation learning where hierarchical representations are automatically learned in a data-driven manner. This article presents the first comprehensive survey on the important topic of deep representation learning for SER. We highlight various techniques, related challenges and identify important future areas of research. Our survey bridges the gap in the literature since existing surveys either focus on SER with hand-engineered features or representation learning in the general setting without focusing on SER.

引用

页码：1634 / 1654

页数：21

共 50 条

[21] Speech Emotion Recognition Using Deep Learning Techniques: A Review
Khalil, Ruhul Amin
Jones, Edward
Babar, Mohammad Inayatullah
Jan, Tariqullah
Zafar, Mohammad Haseeb
Alhussain, Thamer
[J]. IEEE ACCESS, 2019, 7 : 117327 - 117345
[22] Deep Learning Based Emotion Recognition from Chinese Speech
Zhang, Weishan
Zhao, Dehai
Chen, Xiufeng
Zhang, Yuanjie
[J]. INCLUSIVE SMART CITIES AND DIGITAL HEALTH, 2016, 9677 : 49 - 58
[23] Data Augmentation Techniques for Speech Emotion Recognition and Deep Learning
Antonio Nicolas, Jose
de Lope, Javier
Grana, Manuel
[J]. BIO-INSPIRED SYSTEMS AND APPLICATIONS: FROM ROBOTICS TO AMBIENT INTELLIGENCE, PT II, 2022, 13259 : 279 - 288
[24] Feature Fusion of Speech Emotion Recognition Based on Deep Learning
Liu, Gang
He, Wei
Jin, Bicheng
[J]. PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC), 2018, : 193 - 197
[25] Emotion recognition from speech using deep learning on spectrograms
Li, Xingguang
Song, Wenjun
Liang, Zonglin
[J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (03) : 2791 - 2796
[26] Speech Emotion Recognition Using Deep Learning on audio recordings
Suganya, S.
Charles, E. Y. A.
[J]. 2019 19TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER - 2019), 2019,
[27] Transfer Learning of Deep Neural Network for Speech Emotion Recognition
Huang, Ying
Hu, Mingqing
Yu, Xianguo
Wang, Tao
Yang, Chen
[J]. PATTERN RECOGNITION (CCPR 2016), PT II, 2016, 663 : 721 - 729
[28] Deep Learning Based Emotion Recognition and Visualization of Figural Representation
Lu, Xiaofeng
[J]. FRONTIERS IN PSYCHOLOGY, 2022, 12
[29] An Attention Pooling based Representation Learning Method for Speech Emotion Recognition
Li, Pengcheng
Song, Yan
McLoughlin, Ian
Guo, Wu
Dai, Lirong
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3087 - 3091
[30] Unsupervised Representation Learning with Future Observation Prediction for Speech Emotion Recognition
Lian, Zheng
Tao, Jianhua
Liu, Bin
Huang, Jian
[J]. INTERSPEECH 2019, 2019, : 3840 - 3844

← 1 2 3 4 5 →