Advanced Convolutional Neural Network-Based Hybrid Acoustic Models for Low-Resource Speech Recognition

被引:7
|
作者
Fantaye, Tessfu Geteye [1 ]
Yu, Junqing [1 ,2 ]
Hailu, Tulu Tilahun [1 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan 430074, Peoples R China
[2] Huazhong Univ Sci & Technol, Ctr Network & Computat, Wuhan 430074, Peoples R China
关键词
speech recognition; low-resource languages; acoustic models; neural network models; CONNECTIONS; LSTM;
D O I
10.3390/computers9020036
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Deep neural networks (DNNs) have shown a great achievement in acoustic modeling for speech recognition task. Of these networks, convolutional neural network (CNN) is an effective network for representing the local properties of the speech formants. However, CNN is not suitable for modeling the long-term context dependencies between speech signal frames. Recently, the recurrent neural networks (RNNs) have shown great abilities for modeling long-term context dependencies. However, the performance of RNNs is not good for low-resource speech recognition tasks, and is even worse than the conventional feed-forward neural networks. Moreover, these networks often overfit severely on the training corpus in the low-resource speech recognition tasks. This paper presents the results of our contributions to combine CNN and conventional RNN with gate, highway, and residual networks to reduce the above problems. The optimal neural network structures and training strategies for the proposed neural network models are explored. Experiments were conducted on the Amharic and Chaha datasets, as well as on the limited language packages (10-h) of the benchmark datasets released under the Intelligence Advanced Research Projects Activity (IARPA) Babel Program. The proposed neural network models achieve 0.1-42.79% relative performance improvements over their corresponding feed-forward DNN, CNN, bidirectional RNN (BRNN), or bidirectional gated recurrent unit (BGRU) baselines across six language collections. These approaches are promising candidates for developing better performance acoustic models for low-resource speech recognition tasks.
引用
收藏
页数:27
相关论文
共 50 条
  • [1] Advanced recurrent network-based hybrid acoustic models for low resource speech recognition
    Jian Kang
    Wei-Qiang Zhang
    Wei-Wei Liu
    Jia Liu
    Michael T. Johnson
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2018
  • [2] Advanced recurrent network-based hybrid acoustic models for low resource speech recognition
    Kang, Jian
    Zhang, Wei-Qiang
    Liu, Wei-Wei
    Liu, Jia
    Johnson, Michael T.
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2018,
  • [3] GATED CONVOLUTIONAL NETWORKS BASED HYBRID ACOUSTIC MODELS FOR LOW RESOURCE SPEECH RECOGNITION
    Kang, Jian
    Zhang, Wei-Qiang
    Liu, Jia
    [J]. 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 157 - 164
  • [4] Convolutional Maxout Neural Networks for Low-Resource Speech Recognition
    Cai, Meng
    Shi, Yongzhe
    Kang, Jian
    Liu, Jia
    Su, Tengrong
    [J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 133 - +
  • [5] Multilingual acoustic models for speech recognition in low-resource devices
    Garcia, Enrique Gil
    Mengusoglu, Erhan
    Janke, Eric
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 981 - +
  • [6] Residual Convolutional Neural Network-Based Dysarthric Speech Recognition
    Kumar, Raj
    Tripathy, Manoj
    Anand, R. S.
    Kumar, Niraj
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2024,
  • [7] A hybrid acoustic model based on PDP coding for resolving articulation differences in low-resource speech recognition
    Zhu, Wenbo
    Jin, Hao
    Chen, Jianwen
    Luo, Lufeng
    Wang, Jinhai
    Lu, Qinghua
    Li, Aiyuan
    [J]. APPLIED ACOUSTICS, 2022, 192
  • [8] Ensemble of jointly trained deep neural network-based acoustic models for reverberant speech recognition
    Lee, Moa
    Lee, Jeehye
    Chang, Joon-Hyuk
    [J]. DIGITAL SIGNAL PROCESSING, 2019, 85 : 1 - 9
  • [9] Acoustic Modeling Based on Deep Learning for Low-Resource Speech Recognition: An Overview
    Yu, Chongchong
    Kang, Meng
    Chen, Yunbing
    Wu, Jiajia
    Zhao, Xia
    [J]. IEEE ACCESS, 2020, 8 : 163829 - 163843
  • [10] Acoustic Modeling for Hindi Speech Recognition in Low-Resource Settings
    Dey, Anik
    Zhang, Weibin
    Fung, Pascale
    [J]. 2014 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), VOLS 1-2, 2014, : 891 - 894