EXPLORING A ZERO-ORDER DIRECT HMM BASED ON LATENT ATTENTION FOR AUTOMATIC SPEECH RECOGNITION

被引:0
|
作者
Bahar, Parnia [1 ,2 ]
Makarovi, Nikita [1 ]
Zeyer, Albert [1 ,2 ]
Schlueter, Ralf [1 ,2 ]
Ney, Hermann [1 ,2 ]
机构
[1] Rhein Westfal TH Aachen, Human Language Technol & Pattern Recognit Grp, Comp Sci Dept, D-52074 Aachen, Germany
[2] AppTek GmbH, D-52062 Aachen, Germany
基金
欧洲研究理事会;
关键词
End-to-end speech recognition; Latent models; direct HMM; Attention; Transformer; LSTM; MODELS;
D O I
10.1109/icassp40776.2020.9054545
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we study a simple yet elegant latent variable attention model for automatic speech recognition (ASR) which enables an integration of attention sequence modeling into the direct hidden Markov model (HMM) concept. We use a sequence of hidden variables that establishes a mapping from output labels to input frames. Inspired by the direct HMM model, we assume a decomposition of the label sequence posterior into emission and transition probabilities using zero-order assumption and incorporate both Transformer and LSTM attention models into it. The method keeps the explicit alignment as part of the stochastic model and combines the ease of the end-to-end training of the attention model as well as an efficient and simple beam search. To study the effect of the latent model, we qualitatively analyze the alignment behavior of the different approaches. Our experiments on three ASR tasks show promising results in WER with more focused alignments in comparison to the attention models.
引用
收藏
页码:7854 / 7858
页数:5
相关论文
共 50 条
  • [41] DIRECT PRODUCT BASED DEEP BELIEF NETWORKS FOR AUTOMATIC SPEECH RECOGNITION
    Fousek, Petr
    Rennie, Steven
    Dognin, Pierre
    Goel, Vaibhava
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 3148 - 3152
  • [42] Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition
    Zhao, Chendong
    Wang, Jianzong
    Wei, Wenqi
    Qu, Xiaoyang
    Wang, Haoqian
    Xiao, Jing
    2022 IEEE 9TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2022, : 173 - 180
  • [43] Speech recognition based on genetic algorithm for training HMM
    Sun, F
    Hu, GR
    ELECTRONICS LETTERS, 1998, 34 (16) : 1563 - 1564
  • [44] English Speech Recognition System Based on HMM in Matlab
    Yang Xiaocui
    Sun Lihua
    PROCEEDINGS OF 2009 CONFERENCE ON COMMUNICATION FACULTY, 2009, : 337 - 341
  • [45] Peripheral features for HMM-based speech recognition
    Fukuda, T
    Takigawa, M
    Nitta, T
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 129 - 132
  • [46] HMM BASED RECOGNITION OF CHINESE TONES IN CONTINUOUS SPEECH
    Zhao Li (Department of Radio Engineering
    Journal of Electronics(China), 2000, (01) : 9 - 14
  • [47] Knowledge-based parameters for HMM speech recognition
    Bitar, NN
    EspyWilson, CY
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 29 - 32
  • [48] Using teager energy cepstrum and HMM distances in automatic speech recognition and analysis of unvoiced speech
    Heracleous, Panikos
    World Academy of Science, Engineering and Technology, 2009, 35 : 633 - 639
  • [49] A Research on HMM based Speech Recognition in Spoken English
    Wang, Na
    Zhang, Xiaohong
    Sharma, Ashutosh
    RECENT ADVANCES IN ELECTRICAL & ELECTRONIC ENGINEERING, 2021, 14 (06) : 617 - 626
  • [50] Speech emotion recognition based on a hybrid of HMM/ANN
    Mao, Xia
    Zhang, Bing
    Luo, Yi
    PROCEEDINGS OF THE 7TH WSEAS INTERNATIONAL CONFERENCE ON APPLIED INFORMATICS AND COMMUNICATIONS, 2007, : 369 - 372