Automatic Speech Recognition Based on Neural Networks

被引:1
|
作者
Schlueter, Ralf [1 ]
Doetsch, Patrick [1 ]
Golik, Pavel [1 ]
Kitza, Markus [1 ]
Menne, Tobias [1 ]
Irie, Kazuki [1 ]
Tueske, Zoltan [1 ]
Zeyer, Albert [1 ]
机构
[1] Rhein Westfal TH Aachen, Lehrstuhl Informat 6, D-52074 Aachen, Germany
来源
SPEECH AND COMPUTER | 2016年 / 9811卷
关键词
FEATURE COMBINATION; FEATURES; CLASSIFICATION; LSTM;
D O I
10.1007/978-3-319-43958-7_1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In automatic speech recognition, as in many areas of machine learning, stochastic modeling relies on neural networks more and more. Both in acoustic and language modeling, neural networks today mark the state of the art for large vocabulary continuous speech recognition, providing huge improvements over former approaches that were solely based on Gaussian mixture hidden markov models and count-based language models. We give an overview of current activities in neural network based modeling for automatic speech recognition. This includes discussions of network topologies and cell types, training and optimization, choice of input features, adaptation and normalization, multitask training, as well as neural network based language modeling. Despite the clear progress obtained with neural network modeling in speech recognition, a lot is to be done, yet to obtain a consistent and self-contained neural network based modeling approach that ties in with the former state of the art. We will conclude by a discussion of open problems as well as potential future directions w.r.t. to neural network integration into automatic speech recognition systems.
引用
收藏
页码:3 / 17
页数:15
相关论文
共 50 条
  • [1] Automatic Speech Recognition with Deep Neural Networks for Impaired Speech
    Espana-Bonet, Cristina
    Fonollosa, Jose A. R.
    [J]. ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2016, 2016, 10077 : 97 - 107
  • [2] DYNAMIC SPARSITY NEURAL NETWORKS FOR AUTOMATIC SPEECH RECOGNITION
    Wu, Zhaofeng
    Zhao, Ding
    Liang, Qiao
    Yu, Jiahui
    Gulati, Anmol
    Pang, Ruoming
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6014 - 6018
  • [3] DEEP NEURAL NETWORKS BASED AUTOMATIC SPEECH RECOGNITION FOR FOUR ETHIOPIAN LANGUAGES
    Abate, Solomon Teferra
    Tachbelie, Martha Ylfiru
    Schultz, Tanja
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8274 - 8278
  • [4] A comprehensive survey on automatic speech recognition using neural networks
    Amandeep Singh Dhanjal
    Williamjeet Singh
    [J]. Multimedia Tools and Applications, 2024, 83 : 23367 - 23412
  • [5] A comprehensive survey on automatic speech recognition using neural networks
    Dhanjal, Amandeep Singh
    Singh, Williamjeet
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (8) : 23367 - 23412
  • [6] Automatic Recognition of Kazakh Speech Using Deep Neural Networks
    Mamyrbayev, Orken
    Turdalyuly, Mussa
    Mekebayev, Nurbapa
    Alimhan, Keylan
    Kydyrbekova, Aizat
    Turdalykyzy, Tolganay
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2019, PT II, 2019, 11432 : 465 - 474
  • [7] Automatic Detection Technique for Speech Recognition based on Neural Networks Inter-Disciplinary
    Al-Rababah, Mohamad A. A.
    Al-Marghilani, Abdusamad
    Hamarshi, Akram Aref
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (03) : 179 - 184
  • [8] Speech Recognition Based on Convolutional Neural Networks
    Du Guiming
    Wang Xia
    Wang Guangyan
    Zhang Yan
    Li Dan
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP), 2016, : 708 - 711
  • [9] Automatic Image and Speech Recognition Based on Neural Network
    Krol, Dariusz
    Szlachetko, Boguslaw
    [J]. JOURNAL OF INFORMATION TECHNOLOGY RESEARCH, 2010, 3 (02) : 1 - 17
  • [10] Fast speaker adaptation of artificial neural networks for automatic speech recognition
    Dupont, S
    Cheboub, L
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1795 - 1798