Automatic speech recognition of Portuguese phonemes using neural networks ensemble

被引：3

作者：

Nedjah, Nadia ^{[1
]}

Bonilla, Alejandra D. ^{[1
]}

Mourelle, Luiza de Macedo ^{[2
]}

机构：

[1] Univ Estado Rio De Janeiro, Engn Fac, Dept Elect Engn & Telecommun, Rio de Janeiro, RJ, Brazil

[2] Univ Estado Rio De Janeiro, Engn Fac, Dept Syst Engn & Computat, Rio De Janeiro, RJ, Brazil

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2023年 / 229卷

关键词：

Automatic speech recognition; Phonetic recognition; Artificial neural networks; Ensemble; EXPERTS;

D O I：

10.1016/j.eswa.2023.120378

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The automatic speech recognition based on detection of phonemes provides advantages for online recognition of a speech represented by a sound signal. The development of a system for automatic speech recognition is multidisciplinary. It covers several areas of research, such as linguistics, signal processing and computational intelligence. In this work, the process starts with a speech signal pre-processing to extract the main features of the speech signal at a given instant of time. Inspired by the "divide and conquer" principle, we bridge the complexity gap of automatic speech recognition by devising models based on an ensemble of neural network experts, allowing to divide the huge decision space regarding speech recognition so that each expert takes care only of a delimited area of this decision space. This novel application of this strategy improves the precision, sensitivity and accuracy of the recognition process. Each included expert decides regarding each one of the pre-processed input samples. The decision set thus obtained is weighted. So, the expert with the highest weight for the output will determine the sample final classification. After that, a dynamic post-processing step, implemented as a recurrent neural network, is executed. It aims at mitigating the oscillatory effect that occurs during the recognition of classes with similar characteristics. In this work, two ensembles are investigated. The first is based on the clustering of similar phonetics classes while the second takes care of the imbalanced distribution of samples in the training set. The proposed model achieves 7.63% improvement in terms of accuracy with respect to the best so far related model for automatic speech recognition.

引用

页数：23

共 50 条

[1] Speech Recognition System Based On Phonemes Using Neural Networks
Maheswari, N. Uma
Kabilan, A. P.
Venkatesh, R.
[J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2009, 9 (07): : 148 - 153
[2] Automatic Detection of Depression in Speech Using Ensemble Convolutional Neural Networks
Vazquez-Romero, Adrian
Gallardo-Antolin, Ascension
[J]. ENTROPY, 2020, 22 (06)
[3] A comprehensive survey on automatic speech recognition using neural networks
Amandeep Singh Dhanjal
Williamjeet Singh
[J]. Multimedia Tools and Applications, 2024, 83 : 23367 - 23412
[4] A comprehensive survey on automatic speech recognition using neural networks
Dhanjal, Amandeep Singh
Singh, Williamjeet
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (8) : 23367 - 23412
[5] Automatic Recognition of Kazakh Speech Using Deep Neural Networks
Mamyrbayev, Orken
Turdalyuly, Mussa
Mekebayev, Nurbapa
Alimhan, Keylan
Kydyrbekova, Aizat
Turdalykyzy, Tolganay
[J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2019, PT II, 2019, 11432 : 465 - 474
[6] Incorporating Local Environment Information with Ensemble Neural Networks to Robust Automatic Speech Recognition
Hsu, Chia-Yung
Zezario, Ryandhimas E.
Wang, Jia-Ching
Ho, Chin-Wen
Lu, Xugang
Tsao, Yu
[J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
[7] Automatic Speech Recognition Based on Neural Networks
Schlueter, Ralf
Doetsch, Patrick
Golik, Pavel
Kitza, Markus
Menne, Tobias
Irie, Kazuki
Tueske, Zoltan
Zeyer, Albert
[J]. SPEECH AND COMPUTER, 2016, 9811 : 3 - 17
[8] Automatic Naturalness Recognition from Acted Speech Using Neural Networks
Atmaja, Bagus Tris
Sasou, Akira
Akagi, Masato
[J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 731 - 736
[9] Automatic Speech Recognition with Deep Neural Networks for Impaired Speech
Espana-Bonet, Cristina
Fonollosa, Jose A. R.
[J]. ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2016, 2016, 10077 : 97 - 107
[10] DYNAMIC SPARSITY NEURAL NETWORKS FOR AUTOMATIC SPEECH RECOGNITION
Wu, Zhaofeng
Zhao, Ding
Liang, Qiao
Yu, Jiahui
Gulati, Anmol
Pang, Ruoming
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6014 - 6018

← 1 2 3 4 5 →