On the Correlation and Transferability of Features between Automatic Speech Recognition and Speech Emotion Recognition

被引：19

作者：

Fayek, Haytham M. ^{[1
]}

Lech, Margaret ^{[1
]}

Cavedon, Lawrence ^{[2
]}

机构：

[1] RMIT Univ, Sch Engn, Melbourne, Vic 3001, Australia

[2] RMIT Univ, Sch Sci, Melbourne, Vic 3001, Australia

来源：

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年

关键词：

deep learning; emotion recognition; neural networks; speech recognition; transfer learning; NEURAL-NETWORKS;

D O I：

10.21437/Interspeech.2016-868

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The correlation between Automatic Speech Recognition (ASR) and Speech Emotion Recognition (SER) is poorly understood. Studying such correlation may pave the way for integrating both tasks into a single system or may provide insights that can aid in advancing both systems such as improving ASR in dealing with emotional speech or embedding linguistic input into SER. In this paper, we quantify the relation between ASR and SER by studying the relevance of features learned between both tasks in deep convolutional neural networks using transfer learning. Experiments are conducted using the TIMIT and IEMOCAP databases. Results reveal an intriguing correlation between both tasks, where features learned in some layers particularly towards initial layers of the network for either task were found to be applicable to the other task with varying degree.

引用

页码：3618 / 3622

页数：5

共 50 条

[41] Age Driven Automatic Speech Emotion Recognition System
Verma, Devika
Mukhopadhyay, Debajyoti
2016 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND AUTOMATION (ICCCA), 2016, : 1005 - 1010
[42] Improving Automatic Emotion Recognition from Speech Signals
Bozkurt, Elif
Erzin, Engin
Erdem, Cigdem Eroglu
Erdem, A. Tanju
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 312 - +
[43] SENTIMENT-AWARE AUTOMATIC SPEECH RECOGNITION PRE-TRAINING FOR ENHANCED SPEECH EMOTION RECOGNITION
Ghriss, Ayoub
Yang, Bo
Rozgic, Viktor
Shriberg, Elizabeth
Wang, Chao
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7347 - 7351
[44] Time Window Analysis for Automatic Speech Emotion Recognition
Puterka, Boris
Kacur, Juraj
PROCEEDINGS OF ELMAR-2018: 60TH INTERNATIONAL SYMPOSIUM ELMAR-2018, 2018, : 143 - 146
[45] Prominence features: Effective emotional features for speech emotion recognition
Jing, Shaoling
Mao, Xia
Chen, Lijiang
DIGITAL SIGNAL PROCESSING, 2018, 72 : 216 - 231
[46] Speech production and automatic speech recognition
Acoustics Bulletin, 2000, 25 (02):
[47] AUTOMATIC SPEECH RECOGNITION OF IMPAIRED SPEECH
CARLSON, GS
BERNSTEIN, J
INTERNATIONAL JOURNAL OF REHABILITATION RESEARCH, 1988, 11 (04) : 396 - 398
[48] The Use of Correlation Features in the Problem of Speech Recognition
Andriyanov, Nikita
ALGORITHMS, 2023, 16 (02)
[49] Emotion Recognition in Speech Using MFCC and Wavelet Features
Kishore, K. V. Krishna
Satish, P. Krishna
PROCEEDINGS OF THE 2013 3RD IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2013, : 842 - 847
[50] Speech Emotion Recognition Considering Local Dynamic Features
Guan, Haotian
Liu, Zhilei
Wang, Longbiao
Dang, Jianwu
Yu, Ruiguo
STUDIES ON SPEECH PRODUCTION, 2018, 10733 : 14 - 23

← 1 2 3 4 5 →