On the Correlation and Transferability of Features between Automatic Speech Recognition and Speech Emotion Recognition

被引：19

作者：

Fayek, Haytham M. ^{[1
]}

Lech, Margaret ^{[1
]}

Cavedon, Lawrence ^{[2
]}

机构：

[1] RMIT Univ, Sch Engn, Melbourne, Vic 3001, Australia

[2] RMIT Univ, Sch Sci, Melbourne, Vic 3001, Australia

来源：

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年

关键词：

deep learning; emotion recognition; neural networks; speech recognition; transfer learning; NEURAL-NETWORKS;

D O I：

10.21437/Interspeech.2016-868

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The correlation between Automatic Speech Recognition (ASR) and Speech Emotion Recognition (SER) is poorly understood. Studying such correlation may pave the way for integrating both tasks into a single system or may provide insights that can aid in advancing both systems such as improving ASR in dealing with emotional speech or embedding linguistic input into SER. In this paper, we quantify the relation between ASR and SER by studying the relevance of features learned between both tasks in deep convolutional neural networks using transfer learning. Experiments are conducted using the TIMIT and IEMOCAP databases. Results reveal an intriguing correlation between both tasks, where features learned in some layers particularly towards initial layers of the network for either task were found to be applicable to the other task with varying degree.

引用

页码：3618 / 3622

页数：5

共 50 条

[1] Automatic speech based emotion recognition using paralinguistics features
Hook, J.
Noroozi, F.
Toygar, O.
Anbarjafari, G.
BULLETIN OF THE POLISH ACADEMY OF SCIENCES-TECHNICAL SCIENCES, 2019, 67 (03) : 479 - 488
[2] Automatic speech emotion recognition using modulation spectral features
Wu, Siqing
Falk, Tiago H.
Chan, Wai-Yip
SPEECH COMMUNICATION, 2011, 53 (05) : 768 - 785
[3] Automatic Speech Emotion Recognition: A Survey
Chandrasekar, Purnima
Chapaneri, Santosh
Jayaswal, Deepak
2014 INTERNATIONAL CONFERENCE ON CIRCUITS, SYSTEMS, COMMUNICATION AND INFORMATION TECHNOLOGY APPLICATIONS (CSCITA), 2014, : 341 - 346
[4] Automatic emotion recognition by the speech signal
Schuller, B
Lang, M
Rigoll, G
6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL IX, PROCEEDINGS: IMAGE, ACOUSTIC, SPEECH AND SIGNAL PROCESSING II, 2002, : 367 - 372
[5] Towards automatic recognition of emotion in speech
Razak, AA
Yusof, MHM
Komiya, R
PROCEEDINGS OF THE 3RD IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY, 2003, : 548 - 551
[6] The Impact of Face Mask and Emotion on Automatic Speech Recognition (ASR) and Speech Emotion Recognition (SER)
Oh, Qi Qi
Seow, Chee Kiat
Yusuff, Mulliana
Pranata, Sugiri
Cao, Qi
2023 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYTICS, ICCCBDA, 2023, : 523 - 531
[7] Speech Databases, Speech Features, and Classifiers in Speech Emotion Recognition: A Review
Dar, G. H. Mohmad
Delhibabu, Radhakrishnan
IEEE ACCESS, 2024, 12 : 151122 - 151152
[8] Topological invariants as speech features for automatic speech recognition
Kacur, Juraj
Chudy, Vladimir
INTERNATIONAL JOURNAL OF SIGNAL AND IMAGING SYSTEMS ENGINEERING, 2014, 7 (04) : 235 - 244
[9] SNR Features for Automatic Speech Recognition
Garner, Philip N.
2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 182 - 187
[10] Automatic Emotion Recognition of Speech Signal in Mandarin
Zhang, Sheng
Ching, P. C.
Kong, Fanrang
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1810 - +

← 1 2 3 4 5 →