EXPLORING A ZERO-ORDER DIRECT HMM BASED ON LATENT ATTENTION FOR AUTOMATIC SPEECH RECOGNITION

被引:0
|
作者
Bahar, Parnia [1 ,2 ]
Makarovi, Nikita [1 ]
Zeyer, Albert [1 ,2 ]
Schlueter, Ralf [1 ,2 ]
Ney, Hermann [1 ,2 ]
机构
[1] Rhein Westfal TH Aachen, Human Language Technol & Pattern Recognit Grp, Comp Sci Dept, D-52074 Aachen, Germany
[2] AppTek GmbH, D-52062 Aachen, Germany
基金
欧洲研究理事会;
关键词
End-to-end speech recognition; Latent models; direct HMM; Attention; Transformer; LSTM; MODELS;
D O I
10.1109/icassp40776.2020.9054545
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we study a simple yet elegant latent variable attention model for automatic speech recognition (ASR) which enables an integration of attention sequence modeling into the direct hidden Markov model (HMM) concept. We use a sequence of hidden variables that establishes a mapping from output labels to input frames. Inspired by the direct HMM model, we assume a decomposition of the label sequence posterior into emission and transition probabilities using zero-order assumption and incorporate both Transformer and LSTM attention models into it. The method keeps the explicit alignment as part of the stochastic model and combines the ease of the end-to-end training of the attention model as well as an efficient and simple beam search. To study the effect of the latent model, we qualitatively analyze the alignment behavior of the different approaches. Our experiments on three ASR tasks show promising results in WER with more focused alignments in comparison to the attention models.
引用
收藏
页码:7854 / 7858
页数:5
相关论文
共 50 条
  • [1] Latent correlation analysis of HMM parameters for speech recognition
    Ou, Zhijian
    Luo, Jun
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 753 - +
  • [2] Development of HMM Based Automatic Speech Recognition System For Indian English
    Garud, Anushri
    Bang, Arti
    Joshi, Shrikant
    2018 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2018,
  • [3] DNN-HMM based Automatic Speech Recognition for HRI Scenarios
    Novoa, Jose
    Wuth, Jorge
    Pablo Escudero, Juan
    Fredes, Josue
    Mahu, Rodrigo
    Becerra Yoma, Nestor
    HRI '18: PROCEEDINGS OF THE 2018 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2018, : 150 - 159
  • [4] Incorporating the voicing information into HMM-based automatic speech recognition
    Jancovic, Peter
    Koekueer, Muenevver
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 42 - 46
  • [5] A window attention based Transformer for Automatic Speech Recognition
    Feng, Zhao
    Li, Yongming
    2024 5TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATION, ICCEA 2024, 2024, : 449 - 454
  • [6] Exploring attention mechanisms based on summary information for end-to-end automatic speech recognition
    Xue, Jiabin
    Zheng, Tieran
    Han, Jiqing
    NEUROCOMPUTING, 2021, 465 : 514 - 524
  • [7] Automatic speech segmentation based on HMM
    Kroul, Martin
    RADIOENGINEERING, 2007, 16 (02) : 56 - 61
  • [8] Generative factor analyzed HMM for automatic speech recognition
    Yao, KS
    Paliwal, KK
    Lee, TW
    SPEECH COMMUNICATION, 2005, 45 (04) : 435 - 454
  • [9] HMM AUTOMATIC SPEECH RECOGNITION SYSTEM OF ARABIC ALPHADIGITS
    Alghamdi, Mansour M.
    Alotaibi, Yousef Ajami
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2010, 35 (2C) : 137 - 155
  • [10] HMM-based Vowel and Consonant Automatic Recognition in Cued Speech for French
    Heracleous, Panikos
    Aboutabit, Noureddine
    Beautemps, Denis
    2009 IEEE INTERNATIONAL CONFERENCE ON VIRTUAL ENVIRONMENTS, HUMAN-COMPUTER INTERFACES AND MEASUREMENT SYSTEMS, 2009, : 33 - 37