Automatic speech recognition of Portuguese phonemes using neural networks ensemble

被引:3
|
作者
Nedjah, Nadia [1 ]
Bonilla, Alejandra D. [1 ]
Mourelle, Luiza de Macedo [2 ]
机构
[1] Univ Estado Rio De Janeiro, Engn Fac, Dept Elect Engn & Telecommun, Rio de Janeiro, RJ, Brazil
[2] Univ Estado Rio De Janeiro, Engn Fac, Dept Syst Engn & Computat, Rio De Janeiro, RJ, Brazil
关键词
Automatic speech recognition; Phonetic recognition; Artificial neural networks; Ensemble; EXPERTS;
D O I
10.1016/j.eswa.2023.120378
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The automatic speech recognition based on detection of phonemes provides advantages for online recognition of a speech represented by a sound signal. The development of a system for automatic speech recognition is multidisciplinary. It covers several areas of research, such as linguistics, signal processing and computational intelligence. In this work, the process starts with a speech signal pre-processing to extract the main features of the speech signal at a given instant of time. Inspired by the "divide and conquer" principle, we bridge the complexity gap of automatic speech recognition by devising models based on an ensemble of neural network experts, allowing to divide the huge decision space regarding speech recognition so that each expert takes care only of a delimited area of this decision space. This novel application of this strategy improves the precision, sensitivity and accuracy of the recognition process. Each included expert decides regarding each one of the pre-processed input samples. The decision set thus obtained is weighted. So, the expert with the highest weight for the output will determine the sample final classification. After that, a dynamic post-processing step, implemented as a recurrent neural network, is executed. It aims at mitigating the oscillatory effect that occurs during the recognition of classes with similar characteristics. In this work, two ensembles are investigated. The first is based on the clustering of similar phonetics classes while the second takes care of the imbalanced distribution of samples in the training set. The proposed model achieves 7.63% improvement in terms of accuracy with respect to the best so far related model for automatic speech recognition.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Speech Recognition System Based On Phonemes Using Neural Networks
    Maheswari, N. Uma
    Kabilan, A. P.
    Venkatesh, R.
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2009, 9 (07): : 148 - 153
  • [2] Automatic Detection of Depression in Speech Using Ensemble Convolutional Neural Networks
    Vazquez-Romero, Adrian
    Gallardo-Antolin, Ascension
    [J]. ENTROPY, 2020, 22 (06)
  • [3] A comprehensive survey on automatic speech recognition using neural networks
    Amandeep Singh Dhanjal
    Williamjeet Singh
    [J]. Multimedia Tools and Applications, 2024, 83 : 23367 - 23412
  • [4] A comprehensive survey on automatic speech recognition using neural networks
    Dhanjal, Amandeep Singh
    Singh, Williamjeet
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (8) : 23367 - 23412
  • [5] Automatic Recognition of Kazakh Speech Using Deep Neural Networks
    Mamyrbayev, Orken
    Turdalyuly, Mussa
    Mekebayev, Nurbapa
    Alimhan, Keylan
    Kydyrbekova, Aizat
    Turdalykyzy, Tolganay
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2019, PT II, 2019, 11432 : 465 - 474
  • [6] Incorporating Local Environment Information with Ensemble Neural Networks to Robust Automatic Speech Recognition
    Hsu, Chia-Yung
    Zezario, Ryandhimas E.
    Wang, Jia-Ching
    Ho, Chin-Wen
    Lu, Xugang
    Tsao, Yu
    [J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [7] Automatic Speech Recognition Based on Neural Networks
    Schlueter, Ralf
    Doetsch, Patrick
    Golik, Pavel
    Kitza, Markus
    Menne, Tobias
    Irie, Kazuki
    Tueske, Zoltan
    Zeyer, Albert
    [J]. SPEECH AND COMPUTER, 2016, 9811 : 3 - 17
  • [8] Automatic Naturalness Recognition from Acted Speech Using Neural Networks
    Atmaja, Bagus Tris
    Sasou, Akira
    Akagi, Masato
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 731 - 736
  • [9] Automatic Speech Recognition with Deep Neural Networks for Impaired Speech
    Espana-Bonet, Cristina
    Fonollosa, Jose A. R.
    [J]. ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2016, 2016, 10077 : 97 - 107
  • [10] DYNAMIC SPARSITY NEURAL NETWORKS FOR AUTOMATIC SPEECH RECOGNITION
    Wu, Zhaofeng
    Zhao, Ding
    Liang, Qiao
    Yu, Jiahui
    Gulati, Anmol
    Pang, Ruoming
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6014 - 6018