Deep Learning in Acoustic Modeling for Automatic Speech Recognition and Understanding - An Overview -

被引：0

作者：

Gavat, Inge ^{[1
]}

Militaru, Diana ^{[1
]}

机构：

[1] Univ POLITEHN, Dept Elect Telecommun & Informat Technol, Bucharest, Romania

来源：

2015 INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED) | 2015年

关键词：

ASRU; LVCSR; deep learning; restricted Bolzmann machine; autoencoder; deep belief network; convolutional neural network; continuous speech recognition;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper will discuss the progress made in Automatic Speech Recognition and Understanding (ASRU) by applying Deep Learning (DL) in the frame of acoustic modeling. After explaining the concept of DL, specific algorithms like Restricted Bolzmann Machine (RBM), Convolutional Neural Network (CNN), Autoencoder (AE), Deep Belief Network (DBN), will be presented and evaluated. Experiments in the academic research but also in the industry with DL structures concerning Phone Recognition and Large Vocabulary Continuous Speech Recognition (LVCSR) will be highlighted, confirming the usefulness of the DL framework in ASRU. Some considerations about the future of this new and effective machine learning paradigm will conclude the paper.

引用

页数：8

共 50 条

[1] Acoustic Modeling Based on Deep Learning for Low-Resource Speech Recognition: An Overview
Yu, Chongchong
Kang, Meng
Chen, Yunbing
Wu, Jiajia
Zhao, Xia
[J]. IEEE ACCESS, 2020, 8 : 163829 - 163843
[2] FEDERATED ACOUSTIC MODELING FOR AUTOMATIC SPEECH RECOGNITION
Cui, Xiaodong
Lu, Songtao
Kingsbury, Brian
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6748 - 6752
[3] Prosody modeling for automatic speech recognition and understanding
Shriberg, E
Stolcke, A
[J]. MATHEMATICAL FOUNDATIONS OF SPEECH AND LANGUAGE PROCESSING, 2004, 138 : 105 - 114
[4] Graph-Based Semisupervised Learning for Acoustic Modeling in Automatic Speech Recognition
Liu, Yuzong
Kirchhoff, Katrin
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (11) : 1946 - 1956
[5] A Speaker-Dependent Deep Learning Approach to Joint Speech Separation and Acoustic Modeling for Multi-Talker Automatic Speech Recognition
Tu, Yan-Hui
Du, Jun
Dai, Li-Rung
Lee, Chin-Hui
[J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
[6] Improved Acoustic Modeling for Automatic Dysarthric Speech Recognition
Sriranjani, R.
Reddy, M. Ramasubba
Umesh, S.
[J]. 2015 TWENTY FIRST NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2015,
[7] Selection of acoustic modeling unit for Tibetan speech recognition based on deep learning
Gong, Baojia
Cai, Rangzhuoma
Cai, Zhijie
Ding, Yuntao
Peng, Maozhaxi
[J]. 2020 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE COMMUNICATION AND NETWORK SECURITY (CSCNS2020), 2021, 336
[8] Deep Neural Networks for Acoustic Modeling in Speech Recognition
Hinton, Geoffrey
Deng, Li
Yu, Dong
Dahl, George E.
Mohamed, Abdel-rahman
Jaitly, Navdeep
Senior, Andrew
Vanhoucke, Vincent
Patrick Nguyen
Sainath, Tara N.
Kingsbury, Brian
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 82 - 97
[9] DISTRIBUTED DEEP LEARNING STRATEGIES FOR AUTOMATIC SPEECH RECOGNITION
Zhang, Wei
Cui, Xiaodong
Finkler, Ulrich
Kingsbury, Brian
Saon, George
Kung, David
Picheny, Michael
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5706 - 5710
[10] JOINT ACOUSTIC FACTOR LEARNING FOR ROBUST DEEP NEURAL NETWORK BASED AUTOMATIC SPEECH RECOGNITION
Kundu, Souvik
Mantena, Gautam
Qian, Yanmin
Tan, Tian
Delcroix, Marc
Sim, Khe Chai
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5025 - 5029

← 1 2 3 4 5 →