On the Correlation and Transferability of Features between Automatic Speech Recognition and Speech Emotion Recognition

被引：19

作者：

Fayek, Haytham M. ^{[1
]}

Lech, Margaret ^{[1
]}

Cavedon, Lawrence ^{[2
]}

机构：

[1] RMIT Univ, Sch Engn, Melbourne, Vic 3001, Australia

[2] RMIT Univ, Sch Sci, Melbourne, Vic 3001, Australia

来源：

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年

关键词：

deep learning; emotion recognition; neural networks; speech recognition; transfer learning; NEURAL-NETWORKS;

D O I：

10.21437/Interspeech.2016-868

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The correlation between Automatic Speech Recognition (ASR) and Speech Emotion Recognition (SER) is poorly understood. Studying such correlation may pave the way for integrating both tasks into a single system or may provide insights that can aid in advancing both systems such as improving ASR in dealing with emotional speech or embedding linguistic input into SER. In this paper, we quantify the relation between ASR and SER by studying the relevance of features learned between both tasks in deep convolutional neural networks using transfer learning. Experiments are conducted using the TIMIT and IEMOCAP databases. Results reveal an intriguing correlation between both tasks, where features learned in some layers particularly towards initial layers of the network for either task were found to be applicable to the other task with varying degree.

引用

页码：3618 / 3622

页数：5

共 50 条

[31] Speech Emotion Recognition
Lalitha, S.
Madhavan, Abhishek
Bhushan, Bharath
Saketh, Srinivas
2014 INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRONICS, COMPUTERS AND COMMUNICATIONS (ICAECC), 2014,
[32] Informative Speech Features based on Emotion Classes and Gender in Explainable Speech Emotion Recognition
Yildirim, Huseyin Ediz
Iren, Deniz
2023 11TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS, ACIIW, 2023,
[33] ADAPTIVE BOOSTING FEATURES FOR AUTOMATIC SPEECH RECOGNITION
Kham Nguyen
Ng, Tim
Long Nguyen
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4733 - 4736
[34] ADAPTIVE BOOSTING FEATURES FOR AUTOMATIC SPEECH RECOGNITION
Kham Nguyen
Ng, Tim
Long Nguyen
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4733 - 4736
[35] Automatic Speech Recognition using Correlation Analysis
Pramanik, Arnab
Raha, Rajorshee
PROCEEDINGS OF THE 2012 WORLD CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGIES, 2012, : 670 - 674
[36] Efficient features in distributed automatic speech recognition
De Alencar, Vladimir F. S.
Alcaim, Abraham
Controle y Automacao, 2008, 19 (02): : 147 - 154
[37] Integrating Language and Emotion Features for Multilingual Speech Emotion Recognition
Heracleous, Panikos
Mohammad, Yasser
Yoneyama, Akio
HUMAN-COMPUTER INTERACTION. MULTIMODAL AND NATURAL INTERACTION, HCI 2020, PT II, 2020, 12182 : 187 - 196
[38] English speech emotion recognition method based on speech recognition
Liu, Man
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2022, 25 (2) : 391 - 398
[39] English speech emotion recognition method based on speech recognition
Man Liu
International Journal of Speech Technology, 2022, 25 : 391 - 398
[40] Automatic Speech Emotion Recognition: a Systematic Literature Review
Mustafa H.H.
Darwish N.R.
Hefny H.A.
International Journal of Speech Technology, 2024, 27 (1) : 267 - 285

← 1 2 3 4 5 →