Semi-Natural and Spontaneous Speech Recognition Using Deep Neural Networks with Hybrid Features Unification

被引：8

作者：

Amjad, Ammar ^{[1
]}

Khan, Lal ^{[1
]}

Chang, Hsien-Tsung ^{[1
,2
,3
,4
]}

机构：

[1] Chang Gung Univ, Dept Comp Sci & Informat Engn, Taoyuan 33302, Taiwan

[2] Chang Gung Mem Hosp, Dept Phys Med & Rehabil, Taoyuan 33302, Taiwan

[3] Chang Gung Univ, Artificial Intelligence Res Ctr, Taoyuan 33302, Taiwan

[4] Chang Gung Univ, Bachelor Program Artificial Intelligence, Taoyuan 33302, Taiwan

来源：

PROCESSES | 2021年 / 9卷 / 12期

关键词：

spontaneous database; semi-natural database; speech emotion recognition; multiple feature fusion; support vector machine; EMOTION RECOGNITION; INFORMATION; SPACE;

D O I：

10.3390/pr9122286

中图分类号：

TQ [化学工业];

学科分类号：

0817 ;

摘要：

Recently, identifying speech emotions in a spontaneous database has been a complex and demanding study area. This research presents an entirely new approach for recognizing semi-natural and spontaneous speech emotions with multiple feature fusion and deep neural networks (DNN). A proposed framework extracts the most discriminative features from hybrid acoustic feature sets. However, these feature sets may contain duplicate and irrelevant information, leading to inadequate emotional identification. Therefore, an support vector machine (SVM) algorithm is utilized to identify the most discriminative audio feature map after obtaining the relevant features learned by the fusion approach. We investigated our approach utilizing the eNTERFACE05 and BAUM-1s benchmark databases and observed a significant identification accuracy of 76% for a speaker-independent experiment with SVM and 59% accuracy with, respectively. Furthermore, experiments on the eNTERFACE05 and BAUM-1s dataset indicate that the suggested framework outperformed current state-of-the-art techniques on the semi-natural and spontaneous datasets.

引用

页数：16

共 50 条

[41] A NETWORK OF DEEP NEURAL NETWORKS FOR DISTANT SPEECH RECOGNITION
Ravanelli, Mirco
Brakel, Philemon
Omologo, Maurizio
Bengio, Yoshua
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4880 - 4884
[42] Deep Neural Networks for Acoustic Modeling in Speech Recognition
Hinton, Geoffrey
Deng, Li
Yu, Dong
Dahl, George E.
Mohamed, Abdel-rahman
Jaitly, Navdeep
Senior, Andrew
Vanhoucke, Vincent
Patrick Nguyen
Sainath, Tara N.
Kingsbury, Brian
IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 82 - 97
[43] Noisy training for deep neural networks in speech recognition
Shi Yin
Chao Liu
Zhiyong Zhang
Yiye Lin
Dong Wang
Javier Tejedor
Thomas Fang Zheng
Yinguo Li
EURASIP Journal on Audio, Speech, and Music Processing, 2015
[44] INVESTIGATING SPARSE DEEP NEURAL NETWORKS FOR SPEECH RECOGNITION
Pironkov, Gueorgui
Dupont, Stephane
Dutoit, Thierry
2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 124 - 129
[45] Mongolian Speech Recognition Based on Deep Neural Networks
Zhang, Hui
Bao, Feilong
Gao, Guanglai
CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA (CCL 2015), 2015, 9427 : 180 - 188
[46] Human Action Recognition Using Hybrid Deep Evolving Neural Networks
Dasari, Pavan
Zhang, Li
Yu, Yonghong
Huang, Haoqian
Gao, Rong
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[47] END-TO-END SPEECH EMOTION RECOGNITION USING DEEP NEURAL NETWORKS
Tzirakis, Panagiotis
Zhang, Jiehao
Schuller, Bjoern W.
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5089 - 5093
[48] SINGLE-CHANNEL MIXED SPEECH RECOGNITION USING DEEP NEURAL NETWORKS
Weng, Chao
Yu, Dong
Seltzer, Michael L.
Droppo, Jasha
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[49] Improved Speaker Recognition System for Stressed Speech using Deep Neural Networks
Dumpala, Sri Harsha
Kopparapu, Sunil Kumar
2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 1257 - 1264
[50] On Deep and Shallow Neural Networks in Speech Recognition from Speech Spectrum
Zelinka, Jan
Salajka, Petr
Mueller, Ludek
SPEECH AND COMPUTER (SPECOM 2015), 2015, 9319 : 301 - 308

← 1 2 3 4 5 →