Deep Learning in Acoustic Modeling for Automatic Speech Recognition and Understanding - An Overview -

被引:0
|
作者
Gavat, Inge [1 ]
Militaru, Diana [1 ]
机构
[1] Univ POLITEHN, Dept Elect Telecommun & Informat Technol, Bucharest, Romania
关键词
ASRU; LVCSR; deep learning; restricted Bolzmann machine; autoencoder; deep belief network; convolutional neural network; continuous speech recognition;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper will discuss the progress made in Automatic Speech Recognition and Understanding (ASRU) by applying Deep Learning (DL) in the frame of acoustic modeling. After explaining the concept of DL, specific algorithms like Restricted Bolzmann Machine (RBM), Convolutional Neural Network (CNN), Autoencoder (AE), Deep Belief Network (DBN), will be presented and evaluated. Experiments in the academic research but also in the industry with DL structures concerning Phone Recognition and Large Vocabulary Continuous Speech Recognition (LVCSR) will be highlighted, confirming the usefulness of the DL framework in ASRU. Some considerations about the future of this new and effective machine learning paradigm will conclude the paper.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Acoustic Modeling Based on Deep Learning for Low-Resource Speech Recognition: An Overview
    Yu, Chongchong
    Kang, Meng
    Chen, Yunbing
    Wu, Jiajia
    Zhao, Xia
    [J]. IEEE ACCESS, 2020, 8 : 163829 - 163843
  • [2] FEDERATED ACOUSTIC MODELING FOR AUTOMATIC SPEECH RECOGNITION
    Cui, Xiaodong
    Lu, Songtao
    Kingsbury, Brian
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6748 - 6752
  • [3] Prosody modeling for automatic speech recognition and understanding
    Shriberg, E
    Stolcke, A
    [J]. MATHEMATICAL FOUNDATIONS OF SPEECH AND LANGUAGE PROCESSING, 2004, 138 : 105 - 114
  • [4] Graph-Based Semisupervised Learning for Acoustic Modeling in Automatic Speech Recognition
    Liu, Yuzong
    Kirchhoff, Katrin
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (11) : 1946 - 1956
  • [5] A Speaker-Dependent Deep Learning Approach to Joint Speech Separation and Acoustic Modeling for Multi-Talker Automatic Speech Recognition
    Tu, Yan-Hui
    Du, Jun
    Dai, Li-Rung
    Lee, Chin-Hui
    [J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [6] Improved Acoustic Modeling for Automatic Dysarthric Speech Recognition
    Sriranjani, R.
    Reddy, M. Ramasubba
    Umesh, S.
    [J]. 2015 TWENTY FIRST NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2015,
  • [7] Selection of acoustic modeling unit for Tibetan speech recognition based on deep learning
    Gong, Baojia
    Cai, Rangzhuoma
    Cai, Zhijie
    Ding, Yuntao
    Peng, Maozhaxi
    [J]. 2020 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE COMMUNICATION AND NETWORK SECURITY (CSCNS2020), 2021, 336
  • [8] Deep Neural Networks for Acoustic Modeling in Speech Recognition
    Hinton, Geoffrey
    Deng, Li
    Yu, Dong
    Dahl, George E.
    Mohamed, Abdel-rahman
    Jaitly, Navdeep
    Senior, Andrew
    Vanhoucke, Vincent
    Patrick Nguyen
    Sainath, Tara N.
    Kingsbury, Brian
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 82 - 97
  • [9] DISTRIBUTED DEEP LEARNING STRATEGIES FOR AUTOMATIC SPEECH RECOGNITION
    Zhang, Wei
    Cui, Xiaodong
    Finkler, Ulrich
    Kingsbury, Brian
    Saon, George
    Kung, David
    Picheny, Michael
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5706 - 5710
  • [10] JOINT ACOUSTIC FACTOR LEARNING FOR ROBUST DEEP NEURAL NETWORK BASED AUTOMATIC SPEECH RECOGNITION
    Kundu, Souvik
    Mantena, Gautam
    Qian, Yanmin
    Tan, Tian
    Delcroix, Marc
    Sim, Khe Chai
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5025 - 5029