Towards End-to-End Speech Recognition with Deep Multipath Convolutional Neural Networks

被引:9
|
作者
Zhang, Wei [1 ,3 ]
Zhai, Minghao [1 ,3 ]
Huang, Zilong [1 ,3 ]
Liu, Chen [1 ,3 ]
Li, Wei [2 ]
Cao, Yi [1 ,3 ]
机构
[1] Jiangnan Univ, Sch Mech Engn, Wuxi 214122, Jiangsu, Peoples R China
[2] Suzhou Vocat Inst Ind Technol, Suzhou 215104, Jiangsu, Peoples R China
[3] Jiangsu Key Lab Adv Food Mfg Equipment & Technol, Wuxi 214122, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Automatic Speech Recognition (ASR); Acoustic Model (AM); MCNN-CTC; Connectionist Temporal Classification (CTC);
D O I
10.1007/978-3-030-27529-7_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Approaches to deep learning have been used all over in connection to Automatic Speech Recognition (ASR), where they have achieved a high level of accuracy. This has mostly been seen in Convolutional Neural Network (CNN) which has recently been investigated in ASR. Due to the fact that CNN has an increased network's depth on one branch, and may not be wide enough to work on capturing adequate features on signals of human speech. We focus on a proposal for an architecture that is deep and wide in CNN referred to as Multipath Convolutional Neural Network (MCNN). MCNN-CTC combines three additional paths with Connectionist Temporal Classification (CTC) objective function, and can be defined as an end-to-end system that has the ability to fully exploit spectral and temporal structures related to speech signals simultaneously. Results from the experiments show that the newly proposed MCNN-CTC structure enables a reduction in the error rate arising from the construction of end-to-end acoustic model. In the absence of a Language Model (LM), our proposed MCNN-CTC acoustic model has a relative reduction of 1.10%-12.08% comparing to the traditional HMM-based or DCNN-CTC-based models with strong generalization performance.
引用
收藏
页码:332 / 341
页数:10
相关论文
共 50 条
  • [41] An End-to-End Compression Framework Based on Convolutional Neural Networks
    Tao, Wen
    Jiang, Feng
    Zhang, Shengping
    Ren, Jie
    Shi, Wuzhen
    Zuo, Wangmeng
    Guo, Xun
    Zhao, Debin
    [J]. 2017 DATA COMPRESSION CONFERENCE (DCC), 2017, : 463 - 463
  • [42] FURCAX: END-TO-END MONAURAL SPEECH SEPARATION BASED ON DEEP GATED (DE)CONVOLUTIONAL NEURAL NETWORKS WITH ADVERSARIAL EXAMPLE TRAINING
    Shi, Ziqiang
    Lin, Huibin
    Liu, Liu
    Liu, Rujie
    Hayakawa, Shoji
    Han, Jiqing
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6985 - 6989
  • [43] An End-to-End Steel Strip Surface Defects Recognition System Based on Convolutional Neural Networks
    Yi, Li
    Li, Guangyao
    Jiang, Mingming
    [J]. STEEL RESEARCH INTERNATIONAL, 2017, 88 (02) : 176 - 187
  • [44] FACE DETECTION AND RECOGNITION FOR HOME SERVICE ROBOTS WITH END-TO-END DEEP NEURAL NETWORKS
    Jiang, Wei
    Wang, Wei
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2232 - 2236
  • [45] Toward End-to-End Car License Plate Detection and Recognition With Deep Neural Networks
    Li, Hui
    Wang, Peng
    Shen, Chunhua
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2019, 20 (03) : 1126 - 1136
  • [46] End-to-End Deep Learning Speech Recognition Model for Silent Speech Challenge
    Kimura, Naoki
    Su, Zixiong
    Saeki, Takaaki
    [J]. INTERSPEECH 2020, 2020, : 1025 - 1026
  • [47] Towards End-to-End ECG Classification With Raw Signal Extraction and Deep Neural Networks
    Xu, Sean Shensheng
    Mak, Man-Wai
    Cheung, Chi-Chung
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2019, 23 (04) : 1574 - 1584
  • [48] Remote Sensing Airport Detection Based on End-to-End Deep Transferable Convolutional Neural Networks
    Li, Shuai
    Xu, Yuelei
    Zhu, Mingming
    Ma, Shiping
    Tang, Hong
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2019, 16 (10) : 1640 - 1644
  • [49] Feature map size selection for fMRI classification on end-to-end deep convolutional neural networks
    Suhaimi, Farahana
    Htike, Zaw Zaw
    [J]. INTERNATIONAL JOURNAL OF ADVANCED AND APPLIED SCIENCES, 2018, 5 (08): : 95 - 103
  • [50] End-to-end Speech Intelligibility Prediction Using Time-Domain Fully Convolutional Neural Networks
    Pedersen, Mathias B.
    Kolbaek, Morten
    Andersen, Asger H.
    Jensen, Soren H.
    Jensen, Jesper
    [J]. INTERSPEECH 2020, 2020, : 1151 - 1155