On the Correlation and Transferability of Features between Automatic Speech Recognition and Speech Emotion Recognition

被引:19
|
作者
Fayek, Haytham M. [1 ]
Lech, Margaret [1 ]
Cavedon, Lawrence [2 ]
机构
[1] RMIT Univ, Sch Engn, Melbourne, Vic 3001, Australia
[2] RMIT Univ, Sch Sci, Melbourne, Vic 3001, Australia
来源
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年
关键词
deep learning; emotion recognition; neural networks; speech recognition; transfer learning; NEURAL-NETWORKS;
D O I
10.21437/Interspeech.2016-868
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The correlation between Automatic Speech Recognition (ASR) and Speech Emotion Recognition (SER) is poorly understood. Studying such correlation may pave the way for integrating both tasks into a single system or may provide insights that can aid in advancing both systems such as improving ASR in dealing with emotional speech or embedding linguistic input into SER. In this paper, we quantify the relation between ASR and SER by studying the relevance of features learned between both tasks in deep convolutional neural networks using transfer learning. Experiments are conducted using the TIMIT and IEMOCAP databases. Results reveal an intriguing correlation between both tasks, where features learned in some layers particularly towards initial layers of the network for either task were found to be applicable to the other task with varying degree.
引用
收藏
页码:3618 / 3622
页数:5
相关论文
共 50 条
  • [21] Significance of Phonological Features in Speech Emotion Recognition
    Wang, Wei
    Watters, Paul A.
    Cao, Xinyi
    Shen, Lingjie
    Li, Bo
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (03) : 633 - 642
  • [22] Adding dimensional features for emotion recognition on speech
    Ben Letaifa, Leila
    Ines Torres, Maria
    Justo, Raquel
    2020 5TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP'2020), 2020,
  • [23] Speech emotion recognition: Features and classification models
    Chen, Lijiang
    Mao, Xia
    Xue, Yuli
    Cheng, Lee Lung
    DIGITAL SIGNAL PROCESSING, 2012, 22 (06) : 1154 - 1160
  • [24] SPEECH EMOTION RECOGNITION WITH ACOUSTIC AND LEXICAL FEATURES
    Jin, Qin
    Li, Chengxin
    Chen, Shizhe
    Wu, Huimin
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4749 - 4753
  • [25] Statistical Evaluation of Speech Features for Emotion Recognition
    Iliou, Theodoros
    Anagnostopoulos, Christos-Nikolaos
    ICDT: 2009 FOURTH INTERNATIONAL CONFERENCE ON DIGITAL TELECOMMUNICATIONS, 2009, : 121 - 126
  • [26] Hybrid Spectral Features for Speech Emotion Recognition
    Shah, Firoz A.
    Anto, Babu P.
    2017 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION, EMBEDDED AND COMMUNICATION SYSTEMS (ICIIECS), 2017,
  • [27] Novel acoustic features for speech emotion recognition
    Roh Yong-Wan
    Kim Dong-Ju
    Lee Woo-Seok
    Hong Kwang-Seok
    SCIENCE IN CHINA SERIES E-TECHNOLOGICAL SCIENCES, 2009, 52 (07): : 1838 - 1848
  • [28] Voice Quality Features for Speech Emotion Recognition
    Idris, Inshirah
    Salam, Md Sah Hj
    JOURNAL OF INFORMATION ASSURANCE AND SECURITY, 2015, 10 (04): : 183 - 191
  • [29] Excitation Features of Speech for Emotion Recognition Using Neutral Speech as Reference
    Kadin, Sudarsana Reddy
    Gangamohan, P.
    Gangashetty, Suryakanth, V
    Alku, Paavo
    Yegnanarayana, B.
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2020, 39 (09) : 4459 - 4481
  • [30] Excitation Features of Speech for Emotion Recognition Using Neutral Speech as Reference
    Sudarsana Reddy Kadiri
    P. Gangamohan
    Suryakanth V. Gangashetty
    Paavo Alku
    B. Yegnanarayana
    Circuits, Systems, and Signal Processing, 2020, 39 : 4459 - 4481