Semi-Natural and Spontaneous Speech Recognition Using Deep Neural Networks with Hybrid Features Unification

被引:8
|
作者
Amjad, Ammar [1 ]
Khan, Lal [1 ]
Chang, Hsien-Tsung [1 ,2 ,3 ,4 ]
机构
[1] Chang Gung Univ, Dept Comp Sci & Informat Engn, Taoyuan 33302, Taiwan
[2] Chang Gung Mem Hosp, Dept Phys Med & Rehabil, Taoyuan 33302, Taiwan
[3] Chang Gung Univ, Artificial Intelligence Res Ctr, Taoyuan 33302, Taiwan
[4] Chang Gung Univ, Bachelor Program Artificial Intelligence, Taoyuan 33302, Taiwan
关键词
spontaneous database; semi-natural database; speech emotion recognition; multiple feature fusion; support vector machine; EMOTION RECOGNITION; INFORMATION; SPACE;
D O I
10.3390/pr9122286
中图分类号
TQ [化学工业];
学科分类号
0817 ;
摘要
Recently, identifying speech emotions in a spontaneous database has been a complex and demanding study area. This research presents an entirely new approach for recognizing semi-natural and spontaneous speech emotions with multiple feature fusion and deep neural networks (DNN). A proposed framework extracts the most discriminative features from hybrid acoustic feature sets. However, these feature sets may contain duplicate and irrelevant information, leading to inadequate emotional identification. Therefore, an support vector machine (SVM) algorithm is utilized to identify the most discriminative audio feature map after obtaining the relevant features learned by the fusion approach. We investigated our approach utilizing the eNTERFACE05 and BAUM-1s benchmark databases and observed a significant identification accuracy of 76% for a speaker-independent experiment with SVM and 59% accuracy with, respectively. Furthermore, experiments on the eNTERFACE05 and BAUM-1s dataset indicate that the suggested framework outperformed current state-of-the-art techniques on the semi-natural and spontaneous datasets.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network
    Jiang, Wei
    Wang, Zheng
    Jin, Jesse S.
    Han, Xianfeng
    Li, Chunguang
    SENSORS, 2019, 19 (12)
  • [22] Audio Visual Speech Recognition Using Deep Recurrent Neural Networks
    Thanda, Abhinav
    Venkatesan, Shankar M.
    MULTIMODAL PATTERN RECOGNITION OF SOCIAL SIGNALS IN HUMAN-COMPUTER-INTERACTION, MPRSS 2016, 2017, 10183 : 98 - 109
  • [23] Large Vocabulary Speech Recognition Using Deep Tensor Neural Networks
    Yu, Dong
    Deng, Li
    Seide, Frank
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 6 - 9
  • [24] Speech Enhancement for Speaker Recognition Using Deep Recurrent Neural Networks
    Tkachenko, Maxim
    Yamshinin, Alexander
    Lyubimov, Nikolay
    Kotov, Mikhail
    Nastasenko, Marina
    SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 690 - 699
  • [25] Isolated Word Speech Recognition System Using Deep Neural Networks
    Dhanashri, Dhavale
    Dhonde, S. B.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DATA ENGINEERING AND COMMUNICATION TECHNOLOGY, ICDECT 2016, VOL 1, 2017, 468 : 9 - 17
  • [26] Combining Speech Features for Aggression Detection Using Deep Neural Networks
    Jaafar, Noussaiba
    Lachiri, Zied
    2020 5TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP'2020), 2020,
  • [27] Decoding Imagined Speech using Wavelet Features and Deep Neural Networks
    Panachakel, Jerrin Thomas
    Ramakrishnan, A. G.
    Ananthapadmanabha, T., V
    2019 IEEE 16TH INDIA COUNCIL INTERNATIONAL CONFERENCE (IEEE INDICON 2019), 2019,
  • [28] Efficient deep neural networks for speech synthesis using bottleneck features
    Joo, Young-Sun
    Jun, Won-Suk
    Kang, Hong-Goo
    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [29] Automatic Speech Recognition with Deep Neural Networks for Impaired Speech
    Espana-Bonet, Cristina
    Fonollosa, Jose A. R.
    ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2016, 2016, 10077 : 97 - 107
  • [30] Robust Speech Recognition with Speech Enhanced Deep Neural Networks
    Du, Jun
    Wang, Qing
    Gao, Tian
    Xu, Yong
    Dai, Lirong
    Lee, Chin-Hui
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 616 - 620