On the Correlation and Transferability of Features between Automatic Speech Recognition and Speech Emotion Recognition

被引：19

作者：

Fayek, Haytham M. ^{[1
]}

Lech, Margaret ^{[1
]}

Cavedon, Lawrence ^{[2
]}

机构：

[1] RMIT Univ, Sch Engn, Melbourne, Vic 3001, Australia

[2] RMIT Univ, Sch Sci, Melbourne, Vic 3001, Australia

来源：

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年

关键词：

deep learning; emotion recognition; neural networks; speech recognition; transfer learning; NEURAL-NETWORKS;

D O I：

10.21437/Interspeech.2016-868

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The correlation between Automatic Speech Recognition (ASR) and Speech Emotion Recognition (SER) is poorly understood. Studying such correlation may pave the way for integrating both tasks into a single system or may provide insights that can aid in advancing both systems such as improving ASR in dealing with emotional speech or embedding linguistic input into SER. In this paper, we quantify the relation between ASR and SER by studying the relevance of features learned between both tasks in deep convolutional neural networks using transfer learning. Experiments are conducted using the TIMIT and IEMOCAP databases. Results reveal an intriguing correlation between both tasks, where features learned in some layers particularly towards initial layers of the network for either task were found to be applicable to the other task with varying degree.

引用

页码：3618 / 3622

页数：5

共 50 条

[21] Significance of Phonological Features in Speech Emotion Recognition
Wang, Wei
Watters, Paul A.
Cao, Xinyi
Shen, Lingjie
Li, Bo
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (03) : 633 - 642
[22] Adding dimensional features for emotion recognition on speech
Ben Letaifa, Leila
Ines Torres, Maria
Justo, Raquel
2020 5TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP'2020), 2020,
[23] Speech emotion recognition: Features and classification models
Chen, Lijiang
Mao, Xia
Xue, Yuli
Cheng, Lee Lung
DIGITAL SIGNAL PROCESSING, 2012, 22 (06) : 1154 - 1160
[24] SPEECH EMOTION RECOGNITION WITH ACOUSTIC AND LEXICAL FEATURES
Jin, Qin
Li, Chengxin
Chen, Shizhe
Wu, Huimin
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4749 - 4753
[25] Statistical Evaluation of Speech Features for Emotion Recognition
Iliou, Theodoros
Anagnostopoulos, Christos-Nikolaos
ICDT: 2009 FOURTH INTERNATIONAL CONFERENCE ON DIGITAL TELECOMMUNICATIONS, 2009, : 121 - 126
[26] Hybrid Spectral Features for Speech Emotion Recognition
Shah, Firoz A.
Anto, Babu P.
2017 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION, EMBEDDED AND COMMUNICATION SYSTEMS (ICIIECS), 2017,
[27] Novel acoustic features for speech emotion recognition
Roh Yong-Wan
Kim Dong-Ju
Lee Woo-Seok
Hong Kwang-Seok
SCIENCE IN CHINA SERIES E-TECHNOLOGICAL SCIENCES, 2009, 52 (07): : 1838 - 1848
[28] Voice Quality Features for Speech Emotion Recognition
Idris, Inshirah
Salam, Md Sah Hj
JOURNAL OF INFORMATION ASSURANCE AND SECURITY, 2015, 10 (04): : 183 - 191
[29] Excitation Features of Speech for Emotion Recognition Using Neutral Speech as Reference
Kadin, Sudarsana Reddy
Gangamohan, P.
Gangashetty, Suryakanth, V
Alku, Paavo
Yegnanarayana, B.
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2020, 39 (09) : 4459 - 4481
[30] Excitation Features of Speech for Emotion Recognition Using Neutral Speech as Reference
Sudarsana Reddy Kadiri
P. Gangamohan
Suryakanth V. Gangashetty
Paavo Alku
B. Yegnanarayana
Circuits, Systems, and Signal Processing, 2020, 39 : 4459 - 4481

← 1 2 3 4 5 →