Semi-Natural and Spontaneous Speech Recognition Using Deep Neural Networks with Hybrid Features Unification

被引:8
|
作者
Amjad, Ammar [1 ]
Khan, Lal [1 ]
Chang, Hsien-Tsung [1 ,2 ,3 ,4 ]
机构
[1] Chang Gung Univ, Dept Comp Sci & Informat Engn, Taoyuan 33302, Taiwan
[2] Chang Gung Mem Hosp, Dept Phys Med & Rehabil, Taoyuan 33302, Taiwan
[3] Chang Gung Univ, Artificial Intelligence Res Ctr, Taoyuan 33302, Taiwan
[4] Chang Gung Univ, Bachelor Program Artificial Intelligence, Taoyuan 33302, Taiwan
关键词
spontaneous database; semi-natural database; speech emotion recognition; multiple feature fusion; support vector machine; EMOTION RECOGNITION; INFORMATION; SPACE;
D O I
10.3390/pr9122286
中图分类号
TQ [化学工业];
学科分类号
0817 ;
摘要
Recently, identifying speech emotions in a spontaneous database has been a complex and demanding study area. This research presents an entirely new approach for recognizing semi-natural and spontaneous speech emotions with multiple feature fusion and deep neural networks (DNN). A proposed framework extracts the most discriminative features from hybrid acoustic feature sets. However, these feature sets may contain duplicate and irrelevant information, leading to inadequate emotional identification. Therefore, an support vector machine (SVM) algorithm is utilized to identify the most discriminative audio feature map after obtaining the relevant features learned by the fusion approach. We investigated our approach utilizing the eNTERFACE05 and BAUM-1s benchmark databases and observed a significant identification accuracy of 76% for a speaker-independent experiment with SVM and 59% accuracy with, respectively. Furthermore, experiments on the eNTERFACE05 and BAUM-1s dataset indicate that the suggested framework outperformed current state-of-the-art techniques on the semi-natural and spontaneous datasets.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Human Emotion Recognition with Electroencephalographic Multidimensional Features by Hybrid Deep Neural Networks
    Li, Youjun
    Huang, Jiajin
    Zhou, Haiyan
    Zhong, Ning
    APPLIED SCIENCES-BASEL, 2017, 7 (10):
  • [32] DEEP NEURAL NETWORK FEATURES AND SEMI-SUPERVISED TRAINING FOR LOW RESOURCE SPEECH RECOGNITION
    Thomas, Samuel
    Seltzer, Michael L.
    Church, Kenneth
    Hermansky, Hynek
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6704 - 6708
  • [33] Speech recognition using neural networks
    Khan, SU
    Sharma, G
    Rao, PRK
    PROCEEDINGS OF IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY 2000, VOLS 1 AND 2, 2000, : 432 - 437
  • [34] SPEECH RECOGNITION USING NEURAL NETWORKS
    Kumar, T. Lalith
    Kumar, T. Kishore
    Rajan, K. Soundar
    PROCEEDINGS OF THE 2009 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING SYSTEMS, 2009, : 248 - +
  • [35] Semi-Supervised Speaker Adaptation for In-Vehicle Speech Recognition with Deep Neural Networks
    Lee, Wonkyum
    Hang, Kyu J.
    Lane, Ian
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3843 - 3847
  • [36] Learning Salient Features for Speech Emotion Recognition Using Convolutional Neural Networks
    Mao, Qirong
    Dong, Ming
    Huang, Zhengwei
    Zhan, Yongzhao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2014, 16 (08) : 2203 - 2213
  • [37] Speech emotion recognition with deep convolutional neural networks
    Issa, Dias
    Demirci, M. Fatih
    Yazici, Adnan
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 59
  • [38] Noisy training for deep neural networks in speech recognition
    Yin, Shi
    Liu, Chao
    Zhang, Zhiyong
    Lin, Yiye
    Wang, Dong
    Tejedor, Javier
    Zheng, Thomas Fang
    Li, Yinguo
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015, : 1 - 14
  • [39] FAST TRAINING OF DEEP NEURAL NETWORKS FOR SPEECH RECOGNITION
    Gong, Guojing
    Kingsbury, Brian
    Yang, Chih-Chieh
    Liu, Tianyi
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6884 - 6888
  • [40] RECURRENT DEEP NEURAL NETWORKS FOR ROBUST SPEECH RECOGNITION
    Weng, Chao
    Yu, Dong
    Watanabe, Shinji
    Juang, Biing-Hwang
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,