Automatic speech recognition of Portuguese phonemes using neural networks ensemble

被引:3
|
作者
Nedjah, Nadia [1 ]
Bonilla, Alejandra D. [1 ]
Mourelle, Luiza de Macedo [2 ]
机构
[1] Univ Estado Rio De Janeiro, Engn Fac, Dept Elect Engn & Telecommun, Rio de Janeiro, RJ, Brazil
[2] Univ Estado Rio De Janeiro, Engn Fac, Dept Syst Engn & Computat, Rio De Janeiro, RJ, Brazil
关键词
Automatic speech recognition; Phonetic recognition; Artificial neural networks; Ensemble; EXPERTS;
D O I
10.1016/j.eswa.2023.120378
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The automatic speech recognition based on detection of phonemes provides advantages for online recognition of a speech represented by a sound signal. The development of a system for automatic speech recognition is multidisciplinary. It covers several areas of research, such as linguistics, signal processing and computational intelligence. In this work, the process starts with a speech signal pre-processing to extract the main features of the speech signal at a given instant of time. Inspired by the "divide and conquer" principle, we bridge the complexity gap of automatic speech recognition by devising models based on an ensemble of neural network experts, allowing to divide the huge decision space regarding speech recognition so that each expert takes care only of a delimited area of this decision space. This novel application of this strategy improves the precision, sensitivity and accuracy of the recognition process. Each included expert decides regarding each one of the pre-processed input samples. The decision set thus obtained is weighted. So, the expert with the highest weight for the output will determine the sample final classification. After that, a dynamic post-processing step, implemented as a recurrent neural network, is executed. It aims at mitigating the oscillatory effect that occurs during the recognition of classes with similar characteristics. In this work, two ensembles are investigated. The first is based on the clustering of similar phonetics classes while the second takes care of the imbalanced distribution of samples in the training set. The proposed model achieves 7.63% improvement in terms of accuracy with respect to the best so far related model for automatic speech recognition.
引用
收藏
页数:23
相关论文
共 50 条
  • [21] Speech Recognition Using Scaly Neural Networks
    Othman, Akram M.
    Riadh, May H.
    [J]. PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 28, 2008, 28 : 253 - +
  • [22] Emotion recognition in speech using neural networks
    Nicholson, J
    Takahashi, K
    Nakatsu, R
    [J]. AFFECTIVE MINDS, 2000, : 215 - 220
  • [23] Speech recognition using Elman neural networks
    Rothkrantz, LJM
    Nollen, D
    [J]. TEXT, SPEECH AND DIALOGUE, 1999, 1692 : 146 - 151
  • [24] Emotion Recognition in Speech Using Neural Networks
    J. Nicholson
    K. Takahashi
    R. Nakatsu
    [J]. Neural Computing & Applications, 2000, 9 : 290 - 296
  • [25] Speech recognition using artificial neural networks
    Lim, CP
    Woo, SC
    Loh, AS
    Osman, R
    [J]. PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS ENGINEERING, VOL I, 2000, : 419 - 423
  • [26] Recognition System for Nasal, Lateral and Trill Arabic Phonemes using Neural Networks
    Abdul-Kadir, Nurul Ashikin
    Sudirman, Rubita
    Mahmood, Nasrul Humaimi
    [J]. 2012 IEEE STUDENT CONFERENCE ON RESEARCH AND DEVELOPMENT (SCORED), 2012,
  • [27] Automatic target recognition using neural networks
    Wang, LC
    Der, S
    Nasrabadi, NM
    Rizvi, SA
    [J]. ALGORITHMS, DEVICES, AND SYSTEMS FOR OPTICAL INFORMATION PROCESSING, 1998, 3466 : 278 - 289
  • [28] Automatic face recognition using neural networks
    El-Bakry, HM
    Abo-Elsoud, MA
    Kamel, MS
    [J]. ICM'99: ELEVENTH INTERNATIONAL CONFERENCE ON MICROELECTRONICS - PROCEEDINGS, 1999, : 105 - 108
  • [29] Automatic target recognition using neural networks
    Clarkson, Trevor
    [J]. Neural Network World, 1995, 5 (06) : 861 - 871
  • [30] Recognition of Partial Discharges using an Ensemble of Neural Networks
    Mas'ud, A. Abubakar
    Stewart, B. G.
    McMeekin, S. G.
    Nesbitt, A.
    [J]. 2011 ANNUAL REPORT CONFERENCE ON ELECTRICAL INSULATION AND DIELECTRIC PHENOMENA, VOLS 1 AND 2, 2011, : 497 - 500