EXPLORING A ZERO-ORDER DIRECT HMM BASED ON LATENT ATTENTION FOR AUTOMATIC SPEECH RECOGNITION

被引:0
|
作者
Bahar, Parnia [1 ,2 ]
Makarovi, Nikita [1 ]
Zeyer, Albert [1 ,2 ]
Schlueter, Ralf [1 ,2 ]
Ney, Hermann [1 ,2 ]
机构
[1] Rhein Westfal TH Aachen, Human Language Technol & Pattern Recognit Grp, Comp Sci Dept, D-52074 Aachen, Germany
[2] AppTek GmbH, D-52062 Aachen, Germany
基金
欧洲研究理事会;
关键词
End-to-end speech recognition; Latent models; direct HMM; Attention; Transformer; LSTM; MODELS;
D O I
10.1109/icassp40776.2020.9054545
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we study a simple yet elegant latent variable attention model for automatic speech recognition (ASR) which enables an integration of attention sequence modeling into the direct hidden Markov model (HMM) concept. We use a sequence of hidden variables that establishes a mapping from output labels to input frames. Inspired by the direct HMM model, we assume a decomposition of the label sequence posterior into emission and transition probabilities using zero-order assumption and incorporate both Transformer and LSTM attention models into it. The method keeps the explicit alignment as part of the stochastic model and combines the ease of the end-to-end training of the attention model as well as an efficient and simple beam search. To study the effect of the latent model, we qualitatively analyze the alignment behavior of the different approaches. Our experiments on three ASR tasks show promising results in WER with more focused alignments in comparison to the attention models.
引用
收藏
页码:7854 / 7858
页数:5
相关论文
共 50 条
  • [21] Incorporating the voicing information into HMM-based automatic speech recognition in noisy environments
    Jancovic, Peter
    Koekueer, Muenevver
    SPEECH COMMUNICATION, 2009, 51 (05) : 438 - 451
  • [22] A Novel Model Characteristics for Noise-Robust Automatic Speech Recognition Based on HMM
    Rafieee, M. Saadeq
    Khazaei, Ali Akbar
    2010 IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND INFORMATION SECURITY (WCNIS), VOL 2, 2010, : 215 - 218
  • [23] Automatic speech segmentation for Chinese speech database based on HMM
    Tao, JH
    Hain, HU
    2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 481 - 484
  • [24] Speech emotion recognition based on HMM and SVM
    Lin, YL
    Wei, G
    PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 4898 - 4901
  • [25] MONOTONIC SEGMENTAL ATTENTION FOR AUTOMATIC SPEECH RECOGNITION
    Zeyer, Albert
    Schmitt, Robin
    Zhou, Wei
    Schlueter, Ralf
    Ney, Hermann
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 229 - 236
  • [26] The realization of speech recognition system based on HMM
    Yiao, Mingming
    2008 PROCEEDINGS OF INFORMATION TECHNOLOGY AND ENVIRONMENTAL SYSTEM SCIENCES: ITESS 2008, VOL 4, 2008, : 24 - 29
  • [27] A Study on HMM based Speech Recognition System
    Boruah, Saptarshi
    Basishtha, Subhash
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2013, : 153 - 157
  • [28] The Teaching Experiment of Speech Recognition based on HMM
    An, Mingjia
    Yu, Zhengtao
    Guo, Jianyi
    Gao, Shengxiang
    Xian, Yantuan
    26TH CHINESE CONTROL AND DECISION CONFERENCE (2014 CCDC), 2014, : 2416 - 2420
  • [29] The application of Speech Recognition Technology based on HMM
    Yan, Guilin
    PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INFORMATION SCIENCES, MACHINERY, MATERIALS AND ENERGY (ICISMME 2015), 2015, 126 : 676 - 679
  • [30] A HMM speech recognition system based on FPGA
    Ke, Sujuan
    Hou, Yibin
    Huang, Zhangqin
    Li, Hui
    CISP 2008: FIRST INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOL 5, PROCEEDINGS, 2008, : 305 - 309