EXPLORING A ZERO-ORDER DIRECT HMM BASED ON LATENT ATTENTION FOR AUTOMATIC SPEECH RECOGNITION

被引：0

作者：

Bahar, Parnia ^{[1
,2
]}

Makarovi, Nikita ^{[1
]}

Zeyer, Albert ^{[1
,2
]}

Schlueter, Ralf ^{[1
,2
]}

Ney, Hermann ^{[1
,2
]}

机构：

[1] Rhein Westfal TH Aachen, Human Language Technol & Pattern Recognit Grp, Comp Sci Dept, D-52074 Aachen, Germany

[2] AppTek GmbH, D-52062 Aachen, Germany

来源：

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年

基金：

欧洲研究理事会;

关键词：

End-to-end speech recognition; Latent models; direct HMM; Attention; Transformer; LSTM; MODELS;

D O I：

10.1109/icassp40776.2020.9054545

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we study a simple yet elegant latent variable attention model for automatic speech recognition (ASR) which enables an integration of attention sequence modeling into the direct hidden Markov model (HMM) concept. We use a sequence of hidden variables that establishes a mapping from output labels to input frames. Inspired by the direct HMM model, we assume a decomposition of the label sequence posterior into emission and transition probabilities using zero-order assumption and incorporate both Transformer and LSTM attention models into it. The method keeps the explicit alignment as part of the stochastic model and combines the ease of the end-to-end training of the attention model as well as an efficient and simple beam search. To study the effect of the latent model, we qualitatively analyze the alignment behavior of the different approaches. Our experiments on three ASR tasks show promising results in WER with more focused alignments in comparison to the attention models.

引用

页码：7854 / 7858

页数：5

共 50 条

[21] Incorporating the voicing information into HMM-based automatic speech recognition in noisy environments
Jancovic, Peter
Koekueer, Muenevver
SPEECH COMMUNICATION, 2009, 51 (05) : 438 - 451
[22] A Novel Model Characteristics for Noise-Robust Automatic Speech Recognition Based on HMM
Rafieee, M. Saadeq
Khazaei, Ali Akbar
2010 IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND INFORMATION SECURITY (WCNIS), VOL 2, 2010, : 215 - 218
[23] Automatic speech segmentation for Chinese speech database based on HMM
Tao, JH
Hain, HU
2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 481 - 484
[24] Speech emotion recognition based on HMM and SVM
Lin, YL
Wei, G
PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 4898 - 4901
[25] MONOTONIC SEGMENTAL ATTENTION FOR AUTOMATIC SPEECH RECOGNITION
Zeyer, Albert
Schmitt, Robin
Zhou, Wei
Schlueter, Ralf
Ney, Hermann
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 229 - 236
[26] The realization of speech recognition system based on HMM
Yiao, Mingming
2008 PROCEEDINGS OF INFORMATION TECHNOLOGY AND ENVIRONMENTAL SYSTEM SCIENCES: ITESS 2008, VOL 4, 2008, : 24 - 29
[27] A Study on HMM based Speech Recognition System
Boruah, Saptarshi
Basishtha, Subhash
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2013, : 153 - 157
[28] The Teaching Experiment of Speech Recognition based on HMM
An, Mingjia
Yu, Zhengtao
Guo, Jianyi
Gao, Shengxiang
Xian, Yantuan
26TH CHINESE CONTROL AND DECISION CONFERENCE (2014 CCDC), 2014, : 2416 - 2420
[29] The application of Speech Recognition Technology based on HMM
Yan, Guilin
PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON INFORMATION SCIENCES, MACHINERY, MATERIALS AND ENERGY (ICISMME 2015), 2015, 126 : 676 - 679
[30] A HMM speech recognition system based on FPGA
Ke, Sujuan
Hou, Yibin
Huang, Zhangqin
Li, Hui
CISP 2008: FIRST INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOL 5, PROCEEDINGS, 2008, : 305 - 309

← 1 2 3 4 5 →